# Interactive napari cell annotation on spatial proteomics

Harpy works with [SpatialData](https://spatialdata.scverse.org/en/stable/) and [AnnData](https://anndata.readthedocs.io/en/latest/) objects. This allows for interoperability with other libraries in the [scverse ecosystem](https://scverse.org/) that also work with these objects. 

Anndata objects can be converted to pandas DataFrames, which can be used to store features in a [napari Labels layer](https://napari.org/stable/gallery/add_labels_with_features.html). These features can be used in a wide range of napari plugins, which can be used to perform interactive cell type labeling.

In this notebook, we load an artificial example dataset and perform annotation using [napari-clusters-plotter](https://github.com/BiAPoL/napari-clusters-plotter). The clustering result can be visualized within napari and saved back to the SpatialData object within the notebook.

In [8]:
# Install napari-clusters-plotter as shown here https://github.com/BiAPoL/napari-clusters-plotter/tree/main?tab=readme-ov-file#installation
# e.g. conda install -c conda-forge napari-clusters-plotter

In [9]:
# load some example SpatialData
from sparrow.datasets import multisample_blobs

sdata = multisample_blobs(n_samples=1)
sdata

[34mINFO    [0m no axes information specified in the object, setting `dims` to: [1m([0m[32m'c'[0m, [32m'y'[0m, [32m'x'[0m[1m)[0m                           
[34mINFO    [0m no axes information specified in the object, setting `dims` to: [1m([0m[32m'y'[0m, [32m'x'[0m[1m)[0m                                


  return convert_region_column_to_categorical(adata)
  self._check_key(key, self.keys(), self._shared_keys)


SpatialData object
├── Images
│     └── 'sample_0_image': DataArray[cyx] (11, 512, 512)
├── Labels
│     └── 'sample_0_labels': DataArray[yx] (512, 512)
├── Points
│     └── 'sample_0_points': DataFrame with shape: (<Delayed>, 2) (2D points)
└── Tables
      ├── 'sample_0_table': AnnData (20, 11)
      └── 'table': AnnData (20, 11)
with coordinate systems:
    ▸ 'sample_0', with elements:
        sample_0_image (Images), sample_0_labels (Labels), sample_0_points (Points)

In [2]:
table = sdata["sample_0_table"]
table

AnnData object with n_obs × n_vars = 20 × 11
    obs: 'instance_id', 'region', 'fov_labels', 'cell_ID', 'phenotype', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'centroid-0', 'centroid-1', 'convex_area', 'equivalent_diameter', '_major_minor_axis_ratio', '_perim_square_over_area', '_major_axis_equiv_diam_ratio', '_convex_hull_resid', '_centroid_dif'
    var: 'cycle'
    uns: 'spatialdata_attrs'

Here we create the DataFrame of features to be used in napari. The DataFrame should have a label and index column refering back to the cell instances in the label mask. This is the index of the AnnData table, so we set this index as the column 'label' and 'index' in the DataFrame. The DataFrame should also contain all the features you want to visualize in napari and cluster on.

In [3]:
df = table.to_df()
df["label"] = df.index.astype(int)
df["index"] = df.index.astype(int)
df

Unnamed: 0_level_0,nucleus,lineage_0,lineage_1,lineage_2,lineage_3,lineage_4,lineage_5,lineage_6,lineage_7,lineage_8,lineage_9,label,index
cells,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,130579.074583,73143.384718,90176.329291,78887.342227,83547.398858,0.0,85104.954042,89203.618952,71052.452962,166453.784109,79783.723973,1,1
2,97747.25707,44127.352051,54403.315415,47592.677537,50404.086393,0.0,51343.758329,103611.315408,42865.894408,48557.293269,48133.464008,2,2
3,129783.693347,72871.92034,89159.713324,77997.994298,82605.515103,0.0,132335.765591,88197.968976,70251.432797,79581.243727,78884.270554,3,3
4,126967.536598,69764.494608,86010.594981,159287.142678,79687.89084,0.0,81180.370216,85082.819414,67770.154343,76768.146443,76776.118424,4,4
5,107214.353348,52460.254002,64676.705321,56579.963133,124573.992918,0.0,61039.388909,63979.053282,50960.585762,57726.734557,57222.870407,5,5
6,118023.115808,137276.241101,76316.189314,66762.324339,70706.128294,0.0,72387.349397,75492.984966,60131.660871,71197.15481,67520.931832,6,6
7,122155.172414,140294.370188,79746.467902,69763.173487,73884.244499,0.0,75261.651691,78886.261965,62834.473349,71177.144253,75451.055432,7,7
8,132338.563601,152777.165022,92088.042864,80559.732355,85318.580915,0.0,86909.155844,91094.776083,72558.745575,82192.529441,81475.117149,8,8
9,127852.325618,70794.839797,183547.310962,76354.365832,80864.793725,0.0,82372.337711,86339.399487,68771.045311,77901.927901,77221.965853,9,9
10,121525.452585,67926.259433,80034.961874,70015.551499,74151.53106,0.0,75534.839518,79171.644022,63061.785821,154850.113126,70811.124788,10,10


We add image and label mask as a napari layer and add the features to the Labels layer. Then we run napari, which will open in a new window. The workflow can be found [here](https://github.com/BiAPoL/napari-clusters-plotter?tab=readme-ov-file#plotting) and is as follows:

- In the napari window, open the Plotter widget via `Plugins > napari-clusters-plotter > Plotte Widget`.
- In the new widget window on the right, select as labels the labels layer with the features dataframe. You can select a feature for both the x-axis and y-axis and click `Plot` to visualize the cells in a scatter plot.
- By drawing around groups of cells in the scatter plot, you can assign cell types to the cells.
    - By holding SHIFT, you can add create new clusters.

In [4]:
import napari

viewer = napari.view_image(sdata["sample_0_image"], name="image")
labels_layer = viewer.add_labels(sdata["sample_0_labels"], name="labels", features=df)



In [5]:
napari.run()
# Do annotation in napari window

After annotation, the labels will be saved in the DataFrame and can be saved back to the SpatialData object, as shown in the code below.

In [6]:
# Here we add a dummy labeling to the feature table to simulate annotation
# COMMENT OUT THIS LINE IF YOU ARE DOING REAL ANNOTATION
labels_layer.features["MANUAL_CLUSTER_ID"] = labels_layer.features["label"] + 1
labels_layer.features

Unnamed: 0,nucleus,lineage_0,lineage_1,lineage_2,lineage_3,lineage_4,lineage_5,lineage_6,lineage_7,lineage_8,lineage_9,label,index,MANUAL_CLUSTER_ID
0,130579.074583,73143.384718,90176.329291,78887.342227,83547.398858,0.0,85104.954042,89203.618952,71052.452962,166453.784109,79783.723973,1,1,2
1,97747.25707,44127.352051,54403.315415,47592.677537,50404.086393,0.0,51343.758329,103611.315408,42865.894408,48557.293269,48133.464008,2,2,3
2,129783.693347,72871.92034,89159.713324,77997.994298,82605.515103,0.0,132335.765591,88197.968976,70251.432797,79581.243727,78884.270554,3,3,4
3,126967.536598,69764.494608,86010.594981,159287.142678,79687.89084,0.0,81180.370216,85082.819414,67770.154343,76768.146443,76776.118424,4,4,5
4,107214.353348,52460.254002,64676.705321,56579.963133,124573.992918,0.0,61039.388909,63979.053282,50960.585762,57726.734557,57222.870407,5,5,6
5,118023.115808,137276.241101,76316.189314,66762.324339,70706.128294,0.0,72387.349397,75492.984966,60131.660871,71197.15481,67520.931832,6,6,7
6,122155.172414,140294.370188,79746.467902,69763.173487,73884.244499,0.0,75261.651691,78886.261965,62834.473349,71177.144253,75451.055432,7,7,8
7,132338.563601,152777.165022,92088.042864,80559.732355,85318.580915,0.0,86909.155844,91094.776083,72558.745575,82192.529441,81475.117149,8,8,9
8,127852.325618,70794.839797,183547.310962,76354.365832,80864.793725,0.0,82372.337711,86339.399487,68771.045311,77901.927901,77221.965853,9,9,10
9,121525.452585,67926.259433,80034.961874,70015.551499,74151.53106,0.0,75534.839518,79171.644022,63061.785821,154850.113126,70811.124788,10,10,11


In [7]:
table.obs["manual_clustering"] = labels_layer.features["MANUAL_CLUSTER_ID"]
table

AnnData object with n_obs × n_vars = 20 × 11
    obs: 'instance_id', 'region', 'fov_labels', 'cell_ID', 'phenotype', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'centroid-0', 'centroid-1', 'convex_area', 'equivalent_diameter', '_major_minor_axis_ratio', '_perim_square_over_area', '_major_axis_equiv_diam_ratio', '_convex_hull_resid', '_centroid_dif', 'manual_clustering'
    var: 'cycle'
    uns: 'spatialdata_attrs'