# DSA Annotation

Digital Slide Archive (DSA) is an open-source web application where users can annotate regional and point annotations on the high power slide viewer. Luna Pathology CLIs pull the different annotation types from DSA, and save the annotations in GeoJSON format along with metadata. In this notebook, we will review:

- Project setup on DSA
- Create annotations on DSA
- Run regional annotation ETL
- Run point annotation ETL

DSA provides an excellent [video tutorial](https://www.youtube.com/watch?v=HTvLMyKYyGs&ab_channel=DigitalSlideArchive%2FHistomicsTK) that covers platform features. For the first two points on DSA, the information below is an abridged version of the tutorial for your reference.

## Project setup on DSA

Digital Slide Archive (DSA) is a platform that provides the ability to store, manage, visualize and annotate large imaging data sets. The DSA consists of an interface to visualize slides and manage annotations (HistomicsUI), and a web-server that provides a rich API and data management tools (using Girder). This system can:

- Organize images from a variety of assetstores, such as local files systems and S3.
- Provide user access controls.
- Image annotation and review.

HistomicsUI is a web-based application for examining, annotating, and processing histology images to extract both low and high level features (e.g. cellular structure, feature types).
Concepts

- **Collections** correspond to a project. Collections are at the top level objects in the data organization hierarchy.
- **Folders** help organize slides under a project. e.g. hne_slides
- **Items** correspond to a slide. An item can have metadata, annotations and files associated with it.
- **Annotation** is a single rectangle, point, or polygon
- **Annotation Document** is a set of annotations, created by the pathologist.
- **Annotation Style** is a predefined set of labels (morphology like tumor, stroma, necrosis etc) and colors.

Create a collection for your project.
Your images can be organized in a folder.
In this example, we have a `pathology-tutorial` collection with `slides` folder where we organized the images.

<img src="../img/dsa-collection-screenshot.png" alt="DSA Collection" width="600px" />


## Create annotations on DSA

Please see this [video tutorial](https://youtu.be/HTvLMyKYyGs?t=369) for creating and viewing annotations. The information below is an abridged version of the tutorial for your reference. 

**1. To navigate to HistomicsUI, go to the Actions → Open in HistomicsUI on the upper right side. HistomicsUI will open a new tab in your browser.**

<img src="../img/dsa-histomicsui-screenshot.png" alt="DSA Collection" width="400px" />

**2. Create an annotation document**
- Click on the + New button on the Annotation panel. This will bring up a Create annotation modal.
- Name you annotation document **regional** or **point**. These are the two types of annotations we support. The annotation document name will be used in the ETL, it is important to standardize your document names so the ETL can download all documents for the annotation type.
- Optionally add a description, then click save.
    
<img src="../img/dsa-document-screenshot.png" alt="DSA Document" width="400px" />

**3. Create annotations**

- Select a label (e.g. regional_tumor)
- Click on **Point** or **Polygon**. When an annotation shape is highlighted, then your cursor on the slide area will look like a +
- For Point annotation, zoom to an appropriate magnification and click on the cell. The annotation will appear as a circle.
- For Polygon annotation, click and drag your mouse. As you drag the area will be highlighted. Try to meet the starting point, or double click to close the polygon.

<img src="../img/dsa-annotation-screenshot.png" alt="DSA Annotation" width="200px" />


**Note**: Using standardized annotation styles is recommended. A uniform annotation style json can be created and shared among the pathologists making annotations.

## Run regional annotation ETL


In [1]:
import os
HOME = os.environ['HOME']

Once you have created annotations on DSA, we can run the annotation ETL CLI! This ETL will download the annotations, convert them to GeoJSON format, and create a parquet table to make the annotations and metadata queryable.

For details of the data and app configuration, please refer to the example configurations.

First, let's look at the CLI arguments, by running `--help`

In [2]:
!dsa_annotation --help

2023-04-04 20:25:04,972 - INFO - root - Initalized logger, log file at: luna.log
Usage: dsa_annotation [OPTIONS] INPUT_DSA_ENDPOINT

  A cli tool

  Inputs:
      input_dsa_endpoint: Path to the DSA endpoint like http://localhost:8080/dsa/api/v1
  
  Outputs:
      slide_annotation_dataset
  
  Example:
      export DSA_USERNAME=username
      export DSA_PASSWORD=password
      dsa_annotation_etl http://localhost:8080/dsa/api/v1
          --collection-name tcga-data
          --annotation-name TumorVsOther
          -o /data/annotations/

Options:
  -o, --output_dir TEXT         path to output directory to save results
  -c, --collection-name TEXT    name of the collection to pull data from in
                                DSA
  -a, --annotation-name TEXT    name of the annotations to pull from DSA (same
                                annotation name for all slides)
  -u, --username TEXT           DSA username, can be inferred from
                       

In [3]:
# ingest annotations
!dsa_annotation http://girder:8080/api/v1 \
--output_dir ../dsa_annotations \
--collection-name 'TCGA collection' \
--annotation-name ov_regional \
--num_cores 1 \
--username admin --password password1

2023-04-04 20:25:06,400 - INFO - root - Initalized logger, log file at: luna.log
2023-04-04 20:25:06,403 - INFO - luna.common.utils - Started CLI Runner wtih <function dsa_annotation_etl at 0x7fac766f4f70>
2023-04-04 20:25:06,405 - INFO - luna.common.utils - Validating params...
2023-04-04 20:25:06,408 - INFO - luna.common.utils -  -> Set input_dsa_endpoint (<class 'str'>) = http://girder:8080/api/v1
2023-04-04 20:25:06,410 - INFO - luna.common.utils -  -> Set collection_name (<class 'str'>) = TCGA collection
2023-04-04 20:25:06,413 - INFO - luna.common.utils -  -> Set annotation_name (<class 'str'>) = ov_regional
2023-04-04 20:25:06,415 - INFO - luna.common.utils -  -> Set num_cores (<class 'int'>) = 1
2023-04-04 20:25:06,417 - INFO - luna.common.utils -  -> Set username (<class 'str'>) = *****
2023-04-04 20:25:06,418 - INFO - luna.common.utils -  -> Set password (<class 'str'>) = *****
2023-04-04 20:25:06,420 - INFO - luna.common.utils -  -> Set output_dir (<class 'str'>) = ../dsa_an

2023-04-04 20:25:08,595 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((26095 7561, 25925 7569, 25819 ...
2023-04-04 20:25:08,599 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((13850 41438, 13762 41478, 1368...
2023-04-04 20:25:08,601 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((21846 9151, 21863 9224, 21863 ...
2023-04-04 20:25:08,603 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((88952 17667, 88912 17647, 8884...
2023-04-04 20:25:08,605 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((38411 18675, 38419 18637, 3843...
2023-04-04 20:25:08,607 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((86674 27291, 86674 27326, 8666...
2023-04-04 20:25:08,610 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((38390 23685, 38362 23701, 3832...
2023-04-04 20:25:08,612 - INFO - dsa_annotation_etl - 	Created geometry POLYGON ((96999 26604, 97030 26610, 9706...
2023-04-04 20:25:08,614 - INFO - dsa_annotation_etl - 	Created geometry 

In [4]:
# metadata, geojson, parquet table output
!ls -lh ../dsa_annotations/

total 156K
-rw-r--r-- 1 limr limr 14K Apr  4 20:25  01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.annotation.geojson
-rw-r--r-- 1 limr limr 11K Apr  4 20:25  01OV002-ed65cf94-8bc6-492b-9149-adc16f.annotation.geojson
-rw-r--r-- 1 limr limr 14K Apr  4 20:25  01OV007-9b90eb78-2f50-4aeb-b010-d642f9.annotation.geojson
-rw-r--r-- 1 limr limr 15K Apr  4 20:25  01OV008-308ad404-7079-4ff8-8232-12ee2e.annotation.geojson
-rw-r--r-- 1 limr limr 17K Apr  4 20:25  01OV008-7579323e-2fae-43a9-b00f-a15c28.annotation.geojson
drwxr-xr-x 4 limr limr 128 Apr  4 18:48  bitmask
drwxr-xr-x 4 limr limr 128 Apr  4 20:20  heatmap
-rw-r--r-- 1 limr limr 320 Apr  4 20:25  metadata.yml
drwxr-xr-x 4 limr limr 128 Apr  4 20:01  quppath
-rw-r--r-- 1 limr limr 71K Apr  4 20:25 'slide_annotation_dataset_TCGA collection_ov_regional.parquet'
drwxr-xr-x 4 limr limr 128 Apr  4 18:48  stardist_cell
drwxr-xr-x 4 limr limr 128 Apr  4 18:48  stardist_polygon


Annotations are saved in a parquet format, where 1 row represents an annotation element.

We collect metadata about the annotation such as created timestamp and user.
Note that different annotation types (point, regional) can be ingested using the same CLI

In [5]:
# check annotation metadata table
import pyarrow.parquet as pq

annotation_table = pq.read_table(r'../dsa_annotations/slide_annotation_dataset_TCGA collection_ov_regional.parquet').to_pandas()
print(annotation_table.columns)
annotation_table

Index(['_id', 'baseParentId', 'baseParentType', 'created', 'creatorId',
       'description', 'folderId', 'largeImage', 'lowerName', 'name', 'size',
       'updated', 'annotation_girder_id', '_modelType', '_version',
       'createdannotation', 'creatorIdannotation', 'public',
       'updatedannotation', 'updatedId', 'groups', 'element_count',
       'element_details', 'annidx', 'elementidx', 'element_girder_id', 'type',
       'group_name', 'label', 'color', 'xmin', 'xmax', 'ymin', 'ymax',
       'bbox_area', 'x_coords', 'y_coords', 'slide_geojson', 'collection_name',
       'annotation_name'],
      dtype='object')


Unnamed: 0_level_0,_id,baseParentId,baseParentType,created,creatorId,description,folderId,largeImage,lowerName,name,...,xmin,xmax,ymin,ymax,bbox_area,x_coords,y_coords,slide_geojson,collection_name,annotation_name
slide_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,25250.0,28661.0,40529.0,44372.0,13108473.0,"[28211, 28328, 28587, 28630, 28655, 28661, 286...","[42225, 42546, 43126, 43261, 43379, 43607, 437...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,31532.0,33932.0,35713.0,39793.0,9792000.0,"[32252, 32252, 32220, 32164, 32140, 32108, 320...","[37097, 37001, 36897, 36745, 36713, 36689, 366...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,23557.0,25542.0,18922.0,21735.0,5583805.0,"[24180, 24136, 24063, 23980, 23874, 23813, 237...","[18972, 18978, 19017, 19078, 19161, 19217, 192...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,26951.0,30651.0,23202.0,26990.0,14015600.0,"[30133, 30073, 30005, 29951, 29904, 29857, 298...","[26411, 26465, 26506, 26532, 26553, 26566, 265...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,13533.0,17365.0,26776.0,29672.0,11097472.0,"[16525, 16597, 16701, 16781, 16845, 16901, 169...","[27088, 26992, 26896, 26840, 26816, 26808, 267...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,21459.0,23500.0,16457.0,19848.0,6921031.0,"[23389, 23435, 23463, 23481, 23500, 23500, 234...","[18929, 19021, 19103, 19186, 19250, 19425, 194...",,TCGA collection,ov_regional
01OV002-bd8cdc70-3d46-40ae-99c4-90ef77,642b13a033dd668f85bbc1f0,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:57:52.704000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13a833dd668f85bbc1f2', 'source...",01ov002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs,...,,,,,,,,../dsa_annotations/01OV002-bd8cdc70-3d46-40ae-...,TCGA collection,ov_regional
01OV002-ed65cf94-8bc6-492b-9149-adc16f,642b13a933dd668f85bbc1fc,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:58:01.930000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13b133dd668f85bbc1fe', 'source...",01ov002-ed65cf94-8bc6-492b-9149-adc16f.svs,01OV002-ed65cf94-8bc6-492b-9149-adc16f.svs,...,20217.0,25783.0,4477.0,8874.0,24473702.0,"[24143, 23829, 23663, 23611, 23532, 23436, 233...","[4477, 4477, 4503, 4521, 4556, 4591, 4608, 464...",,TCGA collection,ov_regional
01OV002-ed65cf94-8bc6-492b-9149-adc16f,642b13a933dd668f85bbc1fc,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:58:01.930000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13b133dd668f85bbc1fe', 'source...",01ov002-ed65cf94-8bc6-492b-9149-adc16f.svs,01OV002-ed65cf94-8bc6-492b-9149-adc16f.svs,...,14683.0,19778.0,4925.0,8728.0,19376285.0,"[19647, 19691, 19735, 19778, 19778, 19770, 197...","[5125, 5370, 5605, 5876, 6076, 6190, 6268, 634...",,TCGA collection,ov_regional
01OV002-ed65cf94-8bc6-492b-9149-adc16f,642b13a933dd668f85bbc1fc,642b13a033dd668f85bbc1ee,collection,2023-04-03T17:58:01.930000+00:00,642b139d3e6bab4e4d6a30a1,,642b13a033dd668f85bbc1ef,"{'fileId': '642b13b133dd668f85bbc1fe', 'source...",01ov002-ed65cf94-8bc6-492b-9149-adc16f.svs,01OV002-ed65cf94-8bc6-492b-9149-adc16f.svs,...,10240.0,12487.0,28794.0,34122.0,11972016.0,"[10388, 10374, 10334, 10294, 10253, 10240, 102...","[30395, 30328, 30207, 30059, 29938, 29830, 296...",,TCGA collection,ov_regional
