# Introduction 

`peaks` is a collection of analysis tools for the loading, processing and display of spectroscopic and diffraction data, with a core focus on tools for angle-resolved photoemission, LEED, RHEED, and some other related techniques. It also includes various functions for efficient log keeping. 

:::{tip}
`peaks` builds heavily on :class:`xarray`. The user is *strongly* recommended to consult the extensive documentation and tutorials available on the [xarray website](https://docs.xarray.dev/en/stable/). 
::: 

These user guides give a brief introduction to the use of `peaks`. More extensive documentation and examples can be accessed via the docstrings of the relevant functions. 

:::{tip}
The docstrings of the relevant `peaks` functions can be viewed in the [package documentation](https://research.st-andrews.ac.uk/kinggroup/peaks/_apidoc/peaks.html), or accessed in the notebook by calling help on the relevant function, e.g.:
```python
help(pks.load)
```
or using the `?` shortcut:
```
pks.load?
```
In Jupyter Lab, pressing `TAB` will bring some auto-complete options, while `SHIFT`-`TAB` can be used to show the expected function arguments.

A quick look at the source code can be achieved using `??`, e.g.:
```
pks.load??
```
:::

## Importing `peaks`
The recommended way to import peaks is: 

In [None]:
import peaks as pks

This loads a set of core functions (mostly from `peaks.core` and `peaks.utils`) into the `pks` namespace, and adds several functions as accessors to the relevant :class:`xarray` objects.

It is often useful to also import a number of other related packages. In addition, it can be useful to set some [global options](https://docs.xarray.dev/en/stable/generated/xarray.set_options.html#xarray.set_options) for [`xarray`](xarray) and e.g. [`matplotlib`](matplotlib). A complete import may therefore look like, e.g.:

In [None]:
# Import packages
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import peaks as pks
import os

# Set default options
xr.set_options(cmap_sequential='Purples', keep_attrs=True)
%matplotlib inline
%config InlineBackend.figure_format='retina'

:::{tip}
You may find it useful to include the above in an IPython startup script, so that this is pre-loaded each time you start a new notebook (or any IPython session) from this environment. 

Locate the IPython Profile Directory:
```python
import IPython
IPython.paths.locate_profile()
```

This will give you the path to the default profile directory, typically something like `/home/username/.ipython/profile_default`. Within the `starup` directory of the above profile directory, place one or more python files defining the scripts you want to run.  These are executed in alphabetical order, so if you have multiple scripts, you can control the order of execution by naming them accordingly, like `00-script.py`, `01-script.py`, ...

E.g. to reproduce the above, make a file e.g. `00-startup.py` as:
```python
# Import packages
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import peaks as pks
import os

# Set default xarray options
xr.set_options(cmap_sequential='Purples', keep_attrs=True)

# Get the IPython interactive shell
from IPython import get_ipython
ipython = get_ipython()

# Run the notebook magic commands
ipython.run_line_magic('matplotlib', 'inline')
ipython.run_line_magic('config', "InlineBackend.figure_format='retina'")
```
Note the different way that the jupyter magics must be run here. This should now be run the next time you launch an IPython session.
:::

## Loading data

:::{attention}
For this tutorial, a few sample data sets will be downloaded to a temporary folder, and we will use the class to access the relevant file path of the temporary folder. Make sure to clean up the temporary folder when finished following the instructions below.

In the following, `sample_data.fpath` provides the file path of the temporary folder where the files have been placed. When loading your own data, replace `sample_data.fpath` in the following with a string providing the folder path where your data is located.
:::

In [None]:
# Download the example data
from peaks.core.utils.sample_data import get_tutorial1_data
sample_data = get_tutorial1_data()

Load data using the `load` function. `peaks` attempts to identify the relevant data source (beamline, lab system, etc.) automatically, although it can be manually specified with the `loc=` option. 

:::{tip}
To see available `loc` options, call:

```python
pks.locs()
```
:::

A single file can be loaded specifying the full path:

In [None]:
disp1 = pks.load(os.path.join(sample_data.fpath,'i05-59819.nxs'))

### Setting default fileIO options

To ease loading of data and provide a cleaner syntax, the file path (optionally including the first part of the file name) and optionally also the file extension and location identifier can be specified using `peaks` options, `pks.opts`. To show currently set options for FileIO: 

In [None]:
pks.opts.FileIO

:::{tip}
`FileIO` are one of the option sets for `peaks`. To see current settings of other options, call `pks.opts`.
:::

These can be set as persistant values for the session:

In [None]:
pks.opts.FileIO.path = os.path.join(sample_data.fpath,'i05-59')
pks.opts.FileIO.ext = 'nxs'

The file can then be loaded using any identifier (string or object that can be parsed as a string) that would uniuqely identify the file.

In [None]:
disp1=pks.load(819)

Standard wildcards are supported:

| Wildcard | Description | 
|----------|-------------|
| `*`      | Matches any number of characters (except path separator `/`) |,
| `?`      | Matches a single character (except `/`)                      |,
| `[...]`  | Matches any one character inside the brackets                |,
| `[!...]` | Matches any one character **not** inside the brackets        |,





In [None]:
disp1=pks.load('8?9')

You can pass a list of options to allow e.g. automatic loading of multiple file types, and set and reset individual entries as, e.g.:

In [None]:
# Set ext and loc
pks.opts.FileIO.ext = ['nxs','ibw']
pks.opts.FileIO.loc = 'MAXIV_Bloch_A'

# Reset the path
del pks.opts.FileIO.path

# Reset all options within FileIO
pks.opts.FileIO.reset()

# Set multiple options at once
pks.opts.FileIO.set(ext='nxs', loc='Diamond_I05_ARPES')

In [None]:
pks.opts.FileIO

`FileIO` is part of a more general options class, accessed via `pks.opts`, which can also be used as a context manager to temporarily set options

In [None]:
# Reset all existing file options
pks.opts.FileIO.reset()
print(f"Before:\n{pks.opts.FileIO}")

with pks.opts as opts:
    # Temporarily set the relevant options
    opts.FileIO.path = os.path.join(sample_data.fpath,'i05-59')
    opts.FileIO.ext = 'nxs'
    disp1 = pks.load(819)

# Now display the options, which should have no FileIO optios
print(f"After:\n{pks.opts.FileIO}")

Multiple scans can be loaded at once passing a list of full filenames or filename fragments. A list of names can be passed as scan identifiers. If not defined, these are automatically parsed from the file names.

In [None]:
# Set default file options
pks.opts.FileIO.path = os.path.join(sample_data.fpath,'i05-59')
pks.opts.FileIO.ext = 'nxs'

multiple_disp = pks.load([819,853], names=['disp1','gold'])

## Data format

The base data format of most loaded data is an `xarray:DataArray`. This can be inspected by typing the variable in the interactive shell: 

In [None]:
disp1

The co-ordinates give the dimension scales, while the raw data itself can be accessed via the accesssor `.data`. Where possible, `peaks` attempts to treat the data (and also associated metadata) keeping track of the units. Therefore the data will typically be composed of a :class:`xarray.DataArray` wrapping a :class:`pint` array, which is itself wrapping a :class:`numpy` array:

In [None]:
type(disp1.data)

In [None]:
disp1.data.magnitude

In [None]:
disp1.data.units

:::{tip}
The methods of :class:`pint-xarray` can be used to perform unit-aware selections on the resulting :class:`xarray.DataArray` via the `.pint` accessor. The data can also be converted back into a classic :class:`numpy`-backed :class:`xarray.DataArray` using the `.dequantify()` method. See the [documentation](https://pint-xarray.readthedocs.io/en/stable/) and this [blog](https://xarray.dev/blog/introducing-pint-xarray).
:::

### Lazy loading of large data
If the data being loaded is large, it may be loaded lazily into a [`dask`](https://www.dask.org) format. This allows the loading of files that are too large to fit in memory, but also can lead to some advantages for paralellisation of processing etc. even on smaller datasets. If a file is loaded with a dask-based array, subsequent operations are also lazy until `.compute()` is called, allowing developing sophisticated pipelines. See the [`xarray` user guide](https://docs.xarray.dev/en/latest/user-guide/dask.html). 

The underlying array can also be loaded into memory using `.persist()`. If the array fits in memory, this will substantially speed up subsequent calculations and plotting (otherwise the data needs to be effectively loaded from disk each time). Alternatively, you can pass `lazy=False` in the data loading to load directly as a standary `numpy`-based array. Be careful, however, if the data is very large as this may lead to a crash.

:::{note}
Not all file loaders support lazy loading. If the underlying data type is in a suitable format however (e.g. HDF5, Zarr store etc.), then the data will be automatically loaded in a lazy manner if the underlying array size is larger than the threshold set in `pks.opts.FileIO.lazy_size`. This is 1 GB by default, but can be set by the user. Manual lazy loading can be triggered by passing a `Boolean` to the `pks.load()` method.
:::

In [None]:
%%time
with pks.opts as opts:
    opts.FileIO.lazy_size = int(1e8)
    FS1 = pks.load(818)

Now a view of the loaded `xr.DataArray` now shows the chunked structure associated with a :class:`dask` array

:::{tip}
Even if a loader does not support lazy loading, the loaded data can be converted to a :class:`dask`-backed array using the `.chunk()` method [see `xarray` documentation](https://docs.xarray.dev/en/stable/user-guide/dask.html).
:::

In [None]:
FS1

The usual `xarray` commands (see later) still work, and will typically be evaluated lazily, making them extremely quick, but note no computation has yet been performed. The actual computation only be performed when required, and can be triggered by calling `.compute()`. 

In [None]:
%time FS1.sum('eV')

In [None]:
%time FS1.sum('eV').compute()

Some function calls will automatically trigger the computation where required, e.g. for plotting the data:

In [None]:
FS1.sum('eV').plot()

### Grouping scans

If multiple scans are loaded simultaneously, these are loaded into a :class:`xarray.DataTree` structure

In [None]:
multiple_disp

This can be considered as a tree-like structure, where the stored data are the leaves of the tree. A quick view of the tree structure (and included scans) can be accessed via the method `.view()`. This prints a tree structure, where items containing data are coloured in green. 

In [None]:
multiple_disp.view()

Or a more detailed view via a simple `print()`

In [None]:
print(multiple_disp)

The scan data can then be accessed using dictionary methods or an object-oriented approach

In [None]:
multiple_disp.disp1.data.plot()

:::{attention}
Note, the :class:`xarray.DataTree` structure requires data in the leaves of the tree to be in a :class:`xarray.Dataset` format, rather than a :class:`xarray.DataArray`. Unless the data is already in :class:`xarray.Dataset` form, `peaks` convention is to convert the :class:`xarray.DataArray` to a :class:`xarray.Dataset` with a single data variable `data`. Note the additional `.data` used above the access the underlying :class:`xarray.DataArray`. This :class:`xarray.DataArray` now has the name `data` rather than the default scan name. The original scan (file) name is still accessible via the [metadata](#metadata)
```python
multiple_disp.disp.scan.name
```
:::

#### Extending tree structure
A new scan group (branch of the :class:`xarray.DataTree` can be added to the tree via the helper method `.add_scan_group()`, optionally specifying a name for the group. If no name is specified a default `scan_group_#` name will be used where # is a number to make it unique. 

In [None]:

multiple_disp = pks.load([819,853], names=['disp1','gold'])

In [None]:
multiple_disp.add_scan_group('FS')

In [None]:
multiple_disp.view()

Add a file to the :class:`xarray.DataTree` by passing an already loaded :class:`xarray.DataArray`, optionally providing a name for the entry in the :class:`xarray.DataTree`. Note that this is being inserted into the new scan group just created. `peaks` convention is that the :class:`xarray.DataTree` should be hollow, i.e. that data can only be added as *leafs* and not on *branches*  

In [None]:
multiple_disp['FS'].add(FS1, name='FS1')

In [None]:
multiple_disp.view()

The contents of another :class:`xarray.DataTree` can be added to the original tree. Here, each scan of the new tree is added at the root level of the `multiple_disp` tree, specified by passing `add_at_root=True`. Note, this is inserted at the root level of the tree that you call `.add()` on; it may be that that actually has a parent still.

In [None]:
# Load data with specific names for the scans
dt2 = pks.load([819,853], names=['disp_copy','gold_copy'])
# Insert these scans all into the original DataTree
multiple_disp.add(dt2, add_at_root=True)

In [None]:
multiple_disp.view()

Alternatively, they can be added as a new scan block, either with a specified name, or with an automatically generated name:

In [None]:
dt3 = pks.load([819,853])
multiple_disp.add(dt3)

In [None]:
multiple_disp.view()

Data can also be loaded directly into the :class:`xarray.DataTree` by passing the arguments that are passed to :class:`peaks.load`

In [None]:
multiple_disp.add([819,853], name='loaded_directly', names=['disp_copy2','gold_copy2'], lazy=True)

In [None]:
multiple_disp.view()

Individual entries or sections of the tree can be removed by a simple `del`:

In [None]:
del multiple_disp['scan_group_0']

In [None]:
multiple_disp.view()

:::{attention}
Apart from working with hollow trees, `peaks` does not enforce a particular data structure for the :class:`xarray.DataTree` and it is at the user's discretion how to best organise their data this way. Recommended practice is to use this as a broad structure to facilitate grouping data together and for batch processing. It also provides a convenient method for saving groups of data following data processing. But the user should take care to ensure a transparent record of what processing has occured, and the underlying :class:`xarray.DataArray` (or sometimes :class:`xarray.Dataset`) remains the fundamental data unit. 
:::

## Metadata

The relevant metadata for the data is stored in the :class:`xarray.Dataarray` attributes. 
:::{tip}
While this can be modified using standard :class:`xarray` methods, in general the user should not modify the metadata directly, but rather using the provided methods described below. This has several advantages:
- it will ensure that the metadata complies with the expected `peaks` metadata structure
- it adds a transpart record to the analysis history when metadata is manually changed (see [user guide on analysis history](./4_analysis_history.ipynb)).
- it provides simple methods for updating metadata across multiple scans when stored within :class:`xarray.DataTree`'s.
:::

To show the complete set of the current metadata, simply call the `.metadata` attribute

In [None]:
disp1.metadata

The relevant keys for the metadata entries can be returned with `.keys()`:

In [None]:
disp1.metadata.keys()

Metadata can be shown for individual subgroups using a dot notation:

In [None]:
disp1.metadata.photon

and set using a simple assignment:

In [None]:
disp1.metadata.photon.hv = 110

In [None]:
disp1.metadata.photon.hv

In [None]:
disp1.metadata.photon

Units are maintained where already existing, or can be set explicitly by passing a :class:`pint.Quantity` where the unit registry is available in the `peaks` namespace as `pks.ureg`:

In [None]:
disp1.metadata.photon.exit_slit = 10 * pks.ureg('um')

In [None]:
disp1.metadata.photon

Metadata can also be updated by passing a dictionary which respects the structure of the metadata groups starting from the level where called:

In [None]:
disp1.metadata.analyser.deflector({'parallel': {'local_name': 'deflector_1'}, 'perp': {'local_name': 'deflector_2'}})

Special methods for setting normal emission metadata are discussed in the [data processing guide](./3_data_processing.ipynb)

### Batch updating metadata for a DataTree
For data stored within a :class:`xarray.DataTree` structure, you can apply the `.metadata` method to all scans in the subtree below the passed tree node. This requires the metadata to be updated to be passed in a dictionary with the same nested structure as the relevant metadata structure, and requires this structure to be the same for all data within the tree. Either a single dictionary can be passed, or keyword arguments can be used to specify the metadata group to be updated. For example, for updating a single attribute of `scan` metadata:

In [None]:
multiple_disp.disp1.data.metadata.scan

Updating with keyword arguments

In [None]:
multiple_disp.metadata(scan={'scan_command': 'Example of updating the metadata'})

In [None]:
multiple_disp.disp1.data.metadata.scan

Or updating with a single `dict`:

In [None]:
multiple_disp.metadata({'scan': {'scan_command': 'Example of updating the metadata again'}})

In [None]:
multiple_disp.disp1.data.metadata.scan

### Setting manipulator normal emissions

Additional helper methods exist for setting normal emission values.

In [None]:
FS = FS1.MDC(105.05,0.05).compute()

In [None]:
# Rough estimate of normal emission
FS.plot()
plt.axvline(0.05)
plt.axhline(1.5)

To set the relevant axis reference values, we can use the `.metadata.set_normal_emission()` method, passing our determined values as either a dictionary or with keyword arguments:

In [None]:
FS1.metadata.set_normal_emission(polar=1.5, theta_par=0.05)

Now reference values are set for the relevant polar and tilt axes

In [None]:
FS1.metadata.manipulator

To set normal emission values of a scan (or an entire :class:`xarray.DataTree`) to be like another scan, use the `set_normal_emission_like` method:

In [None]:
disp1.metadata.set_normal_emission_like(FS1)

In [None]:
disp1.metadata.manipulator

### Analysis history
To enhance the reproducibility and data provenance for data analysed using `peaks`, we attempt to keep track of data operations performed using in-built `peaks` functions (including the name of the calling function), and store these in a JSON-type record within the data attributes of the :class:`xarray.DataArray`, under the entry `analysis_history`. While a well-organised Jupyter notenbook is a good start, the user has to be careful to only execute cells in order, and a typical analysis workflow may involve loading and processing some data and saving an intermediate step, before further post-processing in another notebook, necessating a richer storing of data analysis history. Saving and loading using in-built peaks methods will attempt to keep a persistent data analysis record in tact. The analysis history can also show up where some additional methods (e.g. automatic Fermi level estimation) have been called as part of the pipeline.


In [None]:
disp2 = pks.load(819)

An initial history record is made upon loading, which can be accessed via the `.history()` accessor. This attempts to display the history in a formatted table: 

In [None]:
disp2.history

`peaks` functions typically add brief history records, including when metadata is updated using the methods discussed [above](#metadata).

In [None]:
disp2.metadata.scan.name = 'test example'

In [None]:
disp2.history

Some records cam get a little complicated, so a single record can be printed in a clearer format by calling the history accessor with an index of the record to display (defaults to the last record if nothing is passed)

In [None]:
disp2.history(0)

The corresponding dictionary can be returned, rather than just printed, if calling with the `.get(index)` method, or a list of all entries can be returned if calling with no specified index:

In [None]:
disp2.history.get()

#### Saving analysis history
A json string of the entire analysis history record can be returned using the `history.json` method, and saved using `history.save()`. It can then be opened in the standard way, or e.g. viewed in a web browser or other software. The metadata record is also saved as part of the data if saving using the :class:`peaks` :class:`peaks.save` and :class:`peaks.load` functions

In [None]:
# Save the history metadata only
disp2.history.save('saving_analysis_metadata.json')

#### Manually adding analysis history

Most `peaks`-specific functions aim to add a related history record, which is one reason to use these functions even where they are otherwise only thin wrappers around e.g. existing :class:`xarray functions`. Operating on a :class:`xarray.DataArray` with e.g. a built in :class:`xarray` method or other non `peaks`-specific function will not lead to the analysis record being updated. In such cases, you should manaully update the analysis record using `.history.add()`

In [None]:
# Manually change one of the dispersions
disp2 *= 10

# Add the history record
disp2.history.add('Data intensity multiplied by 10', fn_name='Manual record')

In [None]:
disp2.history

This methodology can be used within a custom function, where `peaks` will attempt to also capture the name of the calling function. The data can either be updated in place (behaviour when calling with the `.add` accessor) or a copy of the data with modified history metadata can be returned using the `.assign()` accessor, to aid in chaining methods together (similar to the :class:`xarray.DataArray.assign_attrs` and :class:`xarray.DataArray.assign_coords` methods).

:::{warning}
It is easy to accidentally update the analysis history of the underlying :class:`xarray.DataArray` that was passed to the function. This should be carefully checked to ensure the desired operation.
:::

In [None]:
def add_one(data):
    data = data.history.assign('Added one to the original data')
    return data + 1*pks.ureg('count/s')

In [None]:
disp3 = add_one(disp2)

In [None]:
disp3.history

For simple cases like the above, where e.g. no parameters to pass to the history string need to be determined during the function execution, a decorator method can be used:

In [None]:
from peaks.core.metadata.history import update_history_decorator

@update_history_decorator('Added two to the data')
def add_two(data):
    return data + 2*pks.ureg('count/s')

In [None]:
disp3 = add_two(disp2)

In [None]:
disp3.history

## Saving data
Data can be saved using the `.save()` accessor. A single :class:`xarray.DataArray` or :class:`xarray.Dataset` is saved in a netCDF file (extension `.nc`) and a :class:`xarray.DataTree` is stored in a Zarr file (extension `.zarr`). Any metadata attributes stored using `peaks` methods should be serialised and re-openeded correctly, at least if using the same version of `peaks`. If other metadata has been stored in the attributes, this may not be parsed properly depending on the data type.

In [None]:
disp1.save('disp1.nc')

In [None]:
multiple_disp.save('dt_example.zarr')

These can then be loaded again using the regular :class:`peaks.load` method

In [None]:
new_disp1 = pks.load('disp1.nc')

In [None]:
new_dt = pks.load('dt_example.zarr')

## Cleaning up
Run the following cell to clean up the temporary files downloaded for use during this tutorial. This would not be required in normal usage with your own data.

In [None]:
sample_data.cleanup()

Run the following cell to clean up the files saved during this tutorial.

In [None]:
from pathlib import Path
import shutil

for file_path in [Path("disp1.nc"),Path("dt_example.zarr")]:
    if file_path.exists() and file_path.is_dir():
        shutil.rmtree(file_path)
    elif file_path.exists():
        file_path.unlink()