# Working with Datasets
This notebook demonstrates how to create artificial datasets, load datasets and manipulate them.

## Index
1. [Import the corresponding libraries](#Import-the-corresponding-libraries)
2. [Create an artificial dataset of spherical images](#Create-an-artificial-dataset-of-spherical-images)
3. [Visualizing](#Visualizing)
4. [Make video](#Make-video)
5. [Register the images](#Register-the-images)
6. [Check the correction of the registered images](#Check-the-correction-of-the-registered-images)
7. [Make videos](#Make-videos)


## Formats of Input Datasets

Usually datasets are found in the following formats:

 - **Separate files distributed over folders**. Usually, datasets coming from a microscope machine have the following structure:

        - dataset
            - ch1
                - file_t1.tif
                - file_t2.tif
                - ...
            - ch2
                - file_t1.tif
                - file_t2.tif
                - ...

 - **Numpy Arrays**: Used for small datasets. 
 - **[zarr](https://zarr.readthedocs.io/en/stable/)**: Format for efficient batch loading and distribution of big datasets.
 - **h5/h5f**: Similar to zarr, it is a format for efficient manipulation of datasets.

In the following we will create an artificial dataset and see how to open it.

## Creating an artificial dataset

We are going to generate an artificial dataset and then we will load it. 
 - `registration_tools.data` contains functions to generate artificial datasets to test.
 - `registation_tools.dataset` contains functions to load datasets.

In [1]:
import registration_tools.data as rt_data #For generating artificial datasets
import registration_tools.dataset as rt_dataset #For generating artificial datasets

If we provide a folder, the dataset will generate a folder structure in separated files.

In [2]:
# dataset = rt_data.sphere(
#     out='dataset_sphere',
#     num_channels=3,
#     image_size=100,  #This indicates to make an image of size image_size x image_size x image_size
#     stride=(1,1,2),  #This to downsample the image by a factor of stride per dimension
# )

We can visualize the structure of our dataset:

In [3]:
rt_dataset.show_dataset_structure('dataset_sphere')

|-- channel_0
    |-- sphere_00.tiff
    |-- sphere_01.tiff
    |-- sphere_02.tiff
    |-- ...
    |-- sphere_07.tiff
    |-- sphere_08.tiff
    |-- sphere_09.tiff
|-- channel_1
    |-- sphere_00.tiff
    |-- sphere_01.tiff
    |-- sphere_02.tiff
    |-- ...
    |-- sphere_07.tiff
    |-- sphere_08.tiff
    |-- sphere_09.tiff
|-- channel_2
    |-- sphere_00.tiff
    |-- sphere_01.tiff
    |-- sphere_02.tiff
    |-- ...
    |-- sphere_07.tiff
    |-- sphere_08.tiff
    |-- sphere_09.tiff


And we can see the metainformation of the dataset. 

In [4]:
# dataset

## Loading a dataset of separated files

Now we can load this folder structure as an object Dataset.

In [5]:
dataset = rt_dataset.Dataset(
    [
        "dataset_sphere/channel_0/sphere_{:02d}.tiff",
        "dataset_sphere/channel_1/sphere_{:02d}.tiff",
        "dataset_sphere/channel_2/sphere_{:02d}.tiff",
    ],
    axis_data="CT",
    axis_files="XYZ",
    scale=(1,1,2)      # Scale of the dataset, is the same as the stride in the generation
)

dataset

Dataset(shape=(3, 10, 100, 100, 50), axis=CTXYZ, scale=(1, 1, 2))

## Converting to zarr



## Create an artificial dataset of spherical images
We will create an artifitcial dataset of 10 spherical images with 3 channels.

By default, if a path is not specified, it will return a [zarr array](https://zarr.readthedocs.io/en/stable/user-guide/). This data type is great for storing big datasets in an eficient way and allow to upload data in memory by batches.

In [2]:
# Create a dataset of spherical images
dataset = rt_data.sphere(
    out = None, #If not specified, a new dataset is created and stored in RAM
    num_images=10,
    image_size=50,
    num_channels=3,
    min_radius=5,
    max_radius=5,
    jump=2,
    stride=(1, 1, 1)
)
print("Type: ", type(dataset))
print("Shape: ", dataset.attrs["axis"])
print("Scale: ", dataset.attrs["scale"])
print("Shape: ", dataset.shape)

Type:  <class 'zarr.core.array.Array'>
Shape:  TCZYX
Scale:  (1, 1, 1)
Shape:  (10, 3, 50, 50, 50)


### Visualizing

Initialize the napari viewer to visualize the dataset.

In [3]:
# Initialize the napari viewer
viewer = napari.Viewer()

and load the dataset to napari. The package provides some helpful functions to plot images generated during the pipeline.

In [4]:
# Plot the images in the dataset
viewer.layers.clear() #Clear the viewer of other layers that may be present
rt_vis.add_image_difference(viewer, dataset)
rt_vis.add_image_difference(viewer, dataset)
# viewer.add_image(dataset, scale=dataset.attrs["scale"]) #This would have been equivalent to the above line.
viewer.dims.ndisplay = 3

### Make video

There is a convenient function to create videos from the current display in the napari viewer.

In [5]:
# Make video
rt_vis.make_video(
    viewer=viewer,
    save_file='sphere_dataset.gif',
    fps=1,
)

## Register the images
Register the images in the dataset to correct for any misalignments.

First we generate the registration model. There are a few parameters to take into account:

 1. Pyramid levels. This indicates the coarse graining (block sizes) that will be used to register the images. The lower the number the smaller the blocks (0 will be pixel size). Smaller numbers will capture finer details but will be slower too. 
 2. Type of transformation (translation, registration, vectorfield...)
 3. Direction of registration (backward means finding a tranformation from future to past t+1 -> t)
 4. If to perform global transformation. That is, generate a set of transformations between t -> 0. In our case, since we just want to correct for the movement, we will set it to True.

In [6]:
# Register the video
registration = rt_reg.Registration(
    pyramid_highest_level=3,           #Higher pyramid level
    pyramid_lowest_level=0,            #Lower pyramid level
    registration_type='translation',   #Type of registration
    registration_direction='backward', #Direction of registration
    perfom_global_trnsf=True           #Whether to perform global transformation
)

And then we apply it to the dataset. We use fit_apply to:

 1. Get the transformations.
 2. Apply the transformations to the dataset. 

The registration method can only work with one channel. If the data has different channels, you should specify which channel it should use for registering the data.

In [7]:
dataset_registered = registration.fit_apply(    
    dataset=dataset,
    use_channel=0,     #Which channel to use for the registration.
    out=None,          #If not specified, a new dataset is created and stored in RAM
)

Registering images using channel 0: 100%|██████████| 9/9 [00:00<00:00, 141.06/s]
Applying registration to images: 100%|██████████| 9/9 [00:00<00:00, 192.57/s]


### Check the correction of the registered images

For that, we plot the original dataset and the registered one.

In [8]:
# Make video projections
viewer.layers.clear() # Clear the viewer from past images
rt_vis.add_image(viewer, dataset[0], name="t0")
rt_vis.add_image(viewer, dataset, name="dataset", colormap="red")
rt_vis.add_image(viewer, dataset_registered, opacity=0.5, colormap='green', name="dataset_registered")
viewer.dims.ndisplay = 3

### Make videos

Now that we checked that the correction is satisfactory, we can save the video.

In [9]:
# Make video registered
rt_vis.make_video(
    viewer=viewer,
    save_file='sphere_dataset_registered.gif',
    fps=1,
)