# Working with lazy signals in HyperSpy

Requires **HyperSpy 1.7.0 or above**

This tutorial introduces to the processing of large dataset - which can not fit into memory - using HyperSpy. It introduce the concept of out-of-core computation algorithms (also refer as lazy processing) and the main difference between lazy and non-lazy processing as well as technicalities you need to be aware of to optimise performance.
The corresponding section of the HyperSpy documentation is [the big data section](https://hyperspy.readthedocs.io/en/stable/user_guide/big_data.html).

### Credits and changes

* 10/05/2022 Magnus Nord. Update to use the new functionality in HyperSpy 1.7.
* 12/04/2021 Magnus Nord. Change to using 4D-STEM dataset, instead of the EELS map.
* 29/07/2019 Eric Prestat. Add more details and introduction for the M&M Sunday short course.
* 15/03/2019 Francisco de la Peña. Create tutorial for the HyperSpy workshop at ePSIC.

## Introduction to lazy processing

Lazy processing refers to the use of [out-of-core computation algorithms](https://en.wikipedia.org/wiki/External_memory_algorithm) to process very large data, which are usually too large to fit into the computer's memory at one time. The main idea is to chunk the data in pieces, small enough, that can be processed in memory.

HyperSpy internally uses the [dask library](https://docs.dask.org/en/latest/index.html), which implements the numpy interface to larger-than-memory or distributed environments. The typically workflow for processing data lazily which is on a disk:
1. "load" data from disk with a defined chunking
2. schedule operations
3. do the computation
 
Lets try this with a simple example:

"Load" the data, by generating a big image with random data

Schedule operation, firsty taking the square root, then summing

**Steps 1 and 2 are very fast**, because nothing is actually done, other than initialising and scheduling the tasks to be performed.

Do the actual calculation, using `compute`

**Step 3 is slow**, because all the computation is performed at this stage. Most of the time, this is signficantly slower than in-memory processing, because the chunks of data needs to be read and written from/to disk on request of the scheduler.

This type of processing is very powerful when working with large datasets, but requires some knowledge to use properly.

For more information about dask and its principle see http://matthewrocklin.com/slides/plotcon-2016.html. However, we're jumping onto the next step: how you can use this type of functionality in HyperSpy.

## Loading data lazily

As usual, we start by setting up the matplotlib backend and importing hyperspy.

For this tutorial we are going to start by loading a 4D-STEM dataset, `lazy_dataset.zspy`. Note that its size is reduced quite a bit, to make it easier to download. The dataset is `(440 x 128)` probe positions with `(256 x 256)` detector pixels, acquired at ePSIC a couple of years ago. The full dataset can be found at the Zenodo deposit, https://zenodo.org/record/4727847. The file itself: https://zenodo.org/record/4727847/files/011_big_film_512x512_updated.hspy?download=1.

Let's check what sort of object we have stored in the ``s`` variable

This is a scanning diffraction dataset with `(440 x 128)` probe positions, and `(256 x 256)` detector pixels.

Use the "nbytes" attribute of the numpy array `s.data` to calculate the size on disk.

That is about 7.4 GB of data, which could load into memory and process "non-lazily" if you have about 16 GB of RAM. However, we'll use this to show how lazy processing can be done in HyperSpy.

If you want to try this on a much bigger dataset after the workshop, you can check out this [Zenodo deposit](https://zenodo.org/record/4312960), specifically the [largest file](https://zenodo.org/record/4312960/files/fe60al40_stripe_pattern.hspy?download=1), which is a magnetic [STEM-DPC](https://en.wikipedia.org/wiki/Scanning_transmission_electron_microscopy#Differential_phase_contrast) dataset.

#### The `.zspy` file format

A fairly recent addition to HyperSpy is support for the `zarr` file format. I will not go into the details of this format, but it is open source, and allows us to work with big datasets _much_ faster than the `.hspy` (HDF5) format.

Comparing on my own computer, summing the full dataset:

- `.hspy`: 25 seconds
- `.zspy`: 1.5 second

Thus: if you want to work with large datasets I really recommend using `.zspy`

## Plotting lazily 

To have a look at the data, we use `s.plot`, just as a non-lazy signal.

Moving the navigator is done in a couple of ways:

- Use Ctrl + arrow keys
- Hold shift + left mouse button
- Click and drag the red navigator box (increase the size pressing the + key)

To create the navigation image, just the center part of the diffraction pattern is used. This to reduce the amount of time it takes to generate the navigation image.

This navigator is stored in `s.navigator`:

If we rather want a more annular dark-field (ADF) like contrast, we can utilize the region of interest functionality. Here, we use the `CircleROI` with an inner radius.

We can then make a new signal, `s_adf_sum` utilizing the `adf_roi`, the `nansum` function, and `.T`

In [None]:
s_adf = adf_roi(s, axes=(2, 3))
s_adf_sum = s_adf.nansum(axis=(2, 3), rechunk=False)
s_adf_sum = s_adf_sum.T

Notice that all of these operations are instantaneous, to actually do the calculations, use `.compute()`.

Thanks to the lazy processing, we never have to load the full dataset into memory. So you can potentially do this to datasets which are much larger than your available memory.

Now we can set it as the `navigator` for `s`

## Chunking

An important aspect of lazy processing is **chunking**. This is how the data is organized inside files, like `lazy_dataset.hspy`.

For our 4-dimensional dataset here, the data is split into many smaller 4-dimensional chunks. To see this structure, we use `s.data`

The important part is the chunk shape `(64, 64, 64, 64)`, which means each chunk consist of `64 x 64` probe positions, and `64 x 64` detector pixels. Each time we want to access something inside a chunk, we need to load the whole chunk into memory.

So for example, if we want to see what the value is for a single detector pixel at one specific probe position, we need to really get the full chunk. For example:

Requires just as much reading from the harddrive as reading the full chunk:

Chunking is quite tricky, with there not being an "ideal" chunking strategy. There are always trade-offs. For now, we'll have a look to why this file is chunked this way.

It makes it very easy to use transpose (`T`) to flip the navigation dimensions, utilizing the same file. This means we can easily navigate the dataset as a function of detector pixels, instead of as a function of probe positions.

Then plot this transposed signal

## Data reduction through rebinning

One common way of exploring these large datasets, is through reducing their size so that they can fit inside the memory. One easy way of doing this is through `rebin`. By using `scale=(2, 2, 2, 2,)`, we reduce the number of probe positions by 4, and reduce the number of detector pixels by 4.

Then have a look at this new signal

The dataset is now about 2 GB, which is due to reducing the number of data points 16 times, and increasing the bit depth to avoid losing information.

However, the bit depth has been increased too much! We should reduce it to usigned integer 32 (`uint32`), via `change_dtype`.

Now, it is about 1 GB!

We can finally compute it, to load the reduced dataset into memory

`s_rebin` is now a non-lazy signal, with its data loaded into memory.

Or look at the transpose

## Processing the data using `s.map`

To process the data, we can use the `s.map` function, which can apply arbitrary functions to each probe positions.

Lets try to extract some more information from the diffraction patterns, by using center of mass. Here, we can utilize scipy. For example: `scipy.ndimage.center_of_mass`.

We can pass this function directly to `map`. `inplace=True` would replace the signal `s`, with the output from `map`. So we use `inplace=False`, to make a new signal.

Note that output from `map` is lazy here. If `s` is a lazy signal, so will the output. If `s` had not been lazy, the output would not have been lazy. This can be overridden by the `lazy_output` parameter.

For actually do the center of mass calculation, we need to use `compute()` on our new signal.

Note that the `map` function automatically figured out that the output from `center_of_mass` has one dimension, with a size of 2: (x, y).

If the output has variable size, for example when using peak finding, you must use the `ragged=True` parameter in `map`.

For now, lets plot the center of mass results

This wasn't very interesting, a better way of visualizing this, is by transposing the dataset.

#### Making custom function

In many situations, already existing functions work well enough. However, sometimes we want to custom-make the functions we apply to our data.

For example: lets add a threshold for our center of mass data, so that only values above certain value is used for the center of mass calculation.

In [None]:
def center_of_mass_function(image, threshold):
    bool_image = image > threshold
    com = center_of_mass(image * bool_image)
    return com

Remember to use the function name of your custom function. Also, here we use `lazy_output=False` to calculate directly.

#### Cropping data

Sometimes, we do not need the whole signal dimension for our processing. In those cases, the processing can be sped up by cropping out the parts of the signal dimensions you don't need.

This is due to how `map` works: it sends the full signal dimension to the function you specify. This can lead to a lot of unecessary data being read and processed.

So one possibility, is to use `isig` to "crop" parts of the signal dimensions. Note: to get the most performance improvements, remove as many **chunks** as possible. Remember that if you include one value from a chunk, the whole chunk needs to be read into memory. 

For example: if we have a chunking `(64, 64, 64, 64)`, a datashape `(256, 256, 256, 256)` in our dataset, and we use `isig[64:192, 64:192]`, we only need to load `25%` of the dataset into memory. Since we only grab 4 of the 16 chunks.

If we instead use `isig[63:193, 63:193]`, we will now include all the chunks (16), `100%` of the data!

So keep the chunking size in mind when doing cropping!

Lets compare this, one where we crop using `isig[64:192, 64:192]`, which gives us the 4 signal (diffraction) chunks in the middle. And one where we crop using `s.isig[63:193, 63:193]`, where we get all the chunks.

## Model fitting

You can also do model fitting with lazy signals, it works just as non-lazy signals.

As an example, lets get a line profile through the center of the detector, so we can fit Gaussian to a `Signal1D`.

Get the line profile by using `isig` through the center position of the diffraction pattern. Make a signal from this called `s_line`.

Then have a look at this new signal, to see its dimensions and size

Plot it

Now, we make a model from this new line signal using the `create_model` function in `s_line`

Then lets make Gaussian component, where we set some initial values. This object is found in `hs.model.components1D.Gaussian`. Use its docstring to see what parameters it has.

This Gaussian is appended to the model, by using `append`

To see these initial values look like, we plot the model.

Then we fit, using `multifit`. **Note** that this might take a while.

Plot the model to see how well the fitting worked.

We can now visualize how the Gaussian components changes as a function of probe position.

## Summary

Most operations can be performed *lazily* in HyperSpy:
1. Visualisation
2. Slicing and indexing
3. Generic mathematical operations
4. Machine learning
5. Curve fitting

See [the big data section](https://hyperspy.readthedocs.io/en/stable/user_guide/big_data.html#limitations) of the HyperSpy documentation for more information and to learn about the main difference between lazy and non-lazy signal.