# Quick Start Guide

The [NeXus Data Format](https://www.nexusformat.org/) is typically used to structure HDF5 files.
An HDF5 file is a container for *datasets* and *groups*.
Groups are folder-like and work like Python dictionaries.
Datasets work like NumPy arrays.
In addition, groups and datasets have a dictionary of *attributes*.

NeXus extends this with the following:

- Definitions for attributes for datasets, in particular a `units` attribute.
  In NeXus, datasets are referred to as *field*.
- Definitions for attributes and structure of groups.
  This includes:
  - An `NX_class` attribute, identifying a group as an instance of a particular NeXus class such as [NXdata](https://manual.nexusformat.org/classes/base_classes/NXdata.html) or [NXlog](https://manual.nexusformat.org/classes/base_classes/NXlog.html).
  - Attributes that identify which fields contained in the group hold signal values, and which hold axis labels.
  
In the following we use a file from the [POWGEN](https://neutrons.ornl.gov/powgen) instrument at SNS.
It is bundled with scippnexus and will be downloaded automatically using [pooch](https://pypi.org/project/pooch/) if it is not cached already:

In [None]:
from scippnexus import data
filename = data.get_path('PG3_4844_event.nxs')

Given such a NeXus file, we first need to open it.
Wherever possible this should be done using a context manager as follows:

In [None]:
import scippnexus as snx
with snx.File(filename) as f:
    print(list(f.keys()))

Unfortunately working with a context manager in a Jupyter Notebook is cumbersome, so for the following we open the file directly instead:

In [None]:
f = snx.File(filename)

Above we saw that the file contains a single key, `'entry'`.
When we access it we can see that it belongs to the class [NXentry](https://manual.nexusformat.org/classes/base_classes/NXentry.html) which is found on the top level in any NeXus file:

In [None]:
entry = f['entry']
entry

We could continue inspecting keys, until we find a group we are interested in.
For this example we use the `'proton_charge'` log found within `'DASlogs'`:

In [None]:
proton_charge = entry['DASlogs']['proton_charge']
proton_charge

This group is an [NXlog](https://manual.nexusformat.org/classes/base_classes/NXlog.html), which typically contains 1-D data with a time axis.
Since scippnexus knows about NXlog, it knows how to identify its shape:

In [None]:
proton_charge.shape

<div class="alert alert-info">
    <b>Note:</b>

This is in contrast to plain HDF5 where groups do *not* have a shape.
Note that not all NeXus classes have a defined shape.

</div>

We read the NXlog from the file using the slicing notation.
To read the entire group, use ellipses (or an empty tuple):

In [None]:
proton_charge[...]

Above, scippnexus automatically dealt with:

- Loading the data field (signal value dataset and its `'units'` attribute).
- Identifying the dimension labels (here: `'time'`).
- Other fields in the group were loaded as coordinates, including:
  - Units of the fields.
  - Uncertainties of the fields (here for `'average_value'`).
  
This structure is compatible with a `scipp.DataArray` and is returned as such.

We may also load an individual field instead of an entire group.
A field corresponds to a `scipp.Variable`, i.e., similar to how h5py represents datasets as NumPy arrays but with an added unit and dimension labels (if applicable).
For example, we may load only the `'value'` dataset:

In [None]:
proton_charge['value'][...]

Attributes of datasets or groups are accessed just like in h5py:

In [None]:
proton_charge['value'].attrs['units']

A subset of the group (and its datasets) can be loaded by selecting only a slice.
We can also plot this directly using the `plot` method of `scipp.DataArray`:

In [None]:
proton_charge['time', 193000:197000].plot()

As another example, consider the following [NXdata](https://manual.nexusformat.org/classes/base_classes/NXdata.html) group:

In [None]:
bank = f['entry/bank103']
print(bank.shape, bank.dims)

This can be loaded and plotted as above.
In this case the resulting data array is 2-D:

In [None]:
da = bank[...]
da

In [None]:
da.plot()