# Using Ilastik object classifiers with SpatialData

## Prerequisites

Environment: harpy

### Import packages

In [6]:
# Automatically reload imported packages
# (This makes it easy to quickly test changes to our own code
# external to this notebook without restarting the kernel.)
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
import spatialdata as sd

In [8]:
import sys
sys.path.append(r"C:\Users\julienm\Documents\repos_github\ilastik_spatialdata")
import sdata_to_ilastik

### Read sdata

In [9]:
sdata = sd.read_zarr(r"d:\Data\Ilastik test\sdata.zarr")
sdata

# NOTE: The SpatialData object needs to contain the image channels and a corresponding segmentation mask. 
# To add the results of Ilastik back to the SpatialData object, it needs to contain a table with the centroid coordinates calculated per cell.

SpatialData object, with associated Zarr store: D:\Data\Ilastik test\sdata.zarr
├── Images
│     └── 'raw_image': SpatialImage[cyx] (2, 6935, 6706)
├── Labels
│     └── 'segmentation_mask': SpatialImage[yx] (6935, 6706)
└── Tables
      ├── 'table': AnnData (29, 2)
      └── 'table_intensities': AnnData (29, 2)
with coordinate systems:
    ▸ 'global', with elements:
        raw_image (Images), segmentation_mask (Labels)

### Object classifiers

There are a variety of diffent object classifiers that can be created in ilastik.

Some options:
- You can train on a single image or on a combination of images
- You can train on any set of features that is most appropriate
- You can have as many classes as is needed
- You can classify for anything as long as you can visually recognize it yourself on an image (or a set of images)

For example:
- You can use the DAPI channel to classify cells in good segmentations and bad segmentations.
- You can train for each channel a classifier that classifies cells in positive and negative for that marker.
- You can train on multiple channels (e.g. CD45 and Sox10) simultaneously to, for example, classify cells in tumor, immune and other.

##### Save raw images and corresponding segmentation mask for training ilastik object classifiers.

In [10]:
sdata_to_ilastik.export_h5(
    sdata = sdata,
    img_layer = "raw_image",
    channels = ["CD11b", "Ly6G"],
    output = "D:/Data/Ilastik test/raw_images.h5"
)

In [11]:
sdata_to_ilastik.export_h5(
    sdata = sdata,
    labels_layer = "segmentation_mask",
    output = "D:/Data/Ilastik test/segmentation_mask.h5"
)

##### Train Object classifiers in Ilastik

For each object classifier, do the following:

Open Ilastik (v.1.4.0) and create an new project: Object Classification [Inputs: Raw Data, Segmentation]

**1. Input Data** </br>
To load in a separate channel, you need to do the following:

- Under the tab `Raw Data` from `1. Input Data`, you would click `Add...` and select `Add separate Image(s)...`.
- Select the .h5 file (which contains the raw images) and you will need to specify which channel you want to work on from the drop-down menu and click `OK` (this specifies the internal path in the h5 file).

To load in multiple channels, you need to specify a correct pattern that also includes the internal path (i.e. in the h5 file) to the correct images.

For example:
- Under the tab 'Raw Data' from '1. Input Data', you would click 'Add...' and select 'Add a single 3D/4D Volume from Sequence...'.
- Under `Specify Pattern`, you enter a patterns that specifies the images of interest. For example: `D:/Data/2023-09-ChristosGkemisis-ChMa/processed/CG23-003_4/CG23-003_4_Scan2/tissue4/annotation/ilastik/tissue4.h5/CD45; D:/Data/2023-09-ChristosGkemisis-ChMa/processed/CG23-003_4/CG23-003_4_Scan2/tissue4/annotation/ilastik/tissue4.h5/Sox10` and click `Apply`. It is important that the path is specified correctly, multiple paths should be separated by a semicolon and the internal path in the h5 file should be specified as well in the path.
- Select `Stack Across: C` before clicking `OK`.

After adding the Raw Data and the Segmentation Image to the ilastik project, it is useful to right-click on them, go to `Edit properties...` and make sure `Storage:` is set to `Copy into project file`. This makes sure, you can move around the ilastik project file (on your computer or even to other computers) without losing the link to the files the project was trained on (and risk losing your training). It is also useful to set `Nickname:` to something informative such as the name of the tissue/sample/replicate/etc (to keep track of which image is which).

**2. Object Feature Selection** </br>
To select features for training, it would be recommended to not select all of them, but be mindful of which you want to train on. Although ilastik itself describes computing many features at once as computationally cheap, it can still really add up to calculate all features since there are a lot of cells in each image. Additionally, by, for example, only working on intensity-related features, it becomes more explanable what the model was trained on how it can be interpreted.

For most cases, I would recommend to follow these steps:
- Under the tab `2. Object Feature Selection`, click `Select Features` and click all boxes under `Intensity Distribution`. You can add other features that you know are relevant, if needed (such as `Size in pixels` or `Diameter`). In case you want create a classifier to distinguish good segmentations from bad segmentation, it would make sense to select all features.
- After clicking `OK`, wait until all features have been computed before moving on to the next step.

**3. Object Classification** </br>
Here, you can create multiple classes for your classifier and train them until you are satisfied with the results. In general, it is useful to initially add a good amount of labels for the different classes for different regions of the image (or even already over multiple images) to capture the variation that is in the data and click `Live Update` to see the prediction results. Subsequently, you can focus more on the mistakes that are being made and add labels to correct those. When labeling, I prefer to unclick `Live Update` to avoid waiting time. It can be useful to use the `Uncertainty` layer to see which objects are still not robustly trained for. When training, ilastik does not make it easy to change the visualization, but if you right-click on `Raw Input` and select `Adjust Thresholds`, you have some options to change the display.

**4. Object Information Export** </br>
To export the results, click on `Configure Feature Table Export` and export as either `HDF (.h5)` or`CSV (.csv)`. 

Under `Choose File`, you can specify where and under what name you want to export the results. By default, this is set to `{dataset_dir}/{nickname}.csv`, with `{dataset_dir}` refering to the directory the input images .h5 file is in and `{nickname}` refering to the name specified in the `1. Input Data` tab. Preferably, remove `{nickname}` and put a unique name in its place for each ilastik classifier. This is important since it will allow you to keep track of which object classifier the results come from when adding everything together in SpatialData.

Note that, by default, the object predictions will be saved as images as well, while these files will not be used to add the ilastik results to the SpatialData. Unfortunately, there does not seem to be a way to avoid saving these files.


##### Load in ilastik output and add to sdata

In [31]:
sdata = sdata_to_ilastik.add_ilastik_to_sdata(
    sdata = sdata,
    input_path = r"d:\Data\Ilastik test\raw_images-CD11b.h5",
    table_layer = "table",
    labels_layer = "segmentation_mask",
    centroid_column_x = "centroid-1",
    centroid_column_y = "centroid-0", 
    suffix = "test",
)



##### Creating new ilastik columns based on specified conditions

In [32]:
sdata = sdata_to_ilastik.assign_ilastik_cell_types(
    sdata, 
    table_layer = "table",
    labels_layer = "segmentation_mask",
    annotation_table_path = r"d:\Data\Ilastik test\annotation_matrix.csv",
    output_column = "ilastik_cell_types", 
    default_value = "other")

