You can download and run this notebook locally, or you can run it for free in a cloud environment using Colab or Sagemaker Studio Lab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb)

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb)

# Summary

Access to large, high-quality datasets is essential for researchers to understand disease and precision medicine pathways, especially in cancer. However, HIPAA constraints make sharing medical images outside an individual institution complex. [The Cancer Imaging Archive (TCIA)](https://www.cancerimagingarchive.net/) is a public service funded by the National Cancer Institute that addresses this challenge by providing hosting and de-identification services that take major burdens of data sharing off researchers.

**This notebook is focused on steps to identify an example segmentation file, find the corresponding reference series and visualize them together.**  If you're interested in additional TCIA notebooks and coding examples, check out the tutorials at https://github.com/kirbyju/TCIA_Notebooks.

# 1 Learn about Available Collections on the TCIA Website

[Browsing Collections](https://www.cancerimagingarchive.net/collections) and viewing [Analysis Results](https://www.cancerimagingarchive.net/tcia-analysis-results/) of TCIA datasets are the easiest ways to become familiar with what is available. These pages will help you quickly identify datasets of interest, find valuable supporting data that are not available via our APIs (e.g. clinical spreadsheets and non-DICOM segmentation data), and answer the most common questions you might have about the datasets.  

# 2 Setup

The following installs and imports **[tcia_utils](https://pypi.org/project/tcia-utils/)**, which contains a variety of useful functions for accessing TCIA via Python and Juptyter Notebooks.  It also ensures that the necessary imports are performed and logging settings are adjusted for Google Colab.

In [None]:
import sys

# install tcia utils
!{sys.executable} -m pip install --upgrade -q tcia_utils

In [None]:
import requests
import pandas as pd
from tcia_utils import nbia

# set logging level to INFO in Google Colab (not necessary in Jupyter)
if 'google.colab' in sys.modules:
  import logging

  for handler in logging.root.handlers[:]:
      logging.root.removeHandler(handler)

  # Set handler with level = info
  logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s',
                      level=logging.INFO)

  print("Google Colab Logging set to INFO")

# 3 Download and visualize a sample DICOM SEG
Here we'll walk through some steps to identify an example segmentation file, find the corresponding reference series and visualize them together in the notebook.

First, let's pull a list of segmentation series UIDs of interest.  We'll use the [C4KC-KiTS](https://doi.org/10.7937/TCIA.2019.IX49E8NX) collection as an example, which contains CT scans and segmentations from subjects from the training set of the [2019 Kidney and Kidney Tumor Segmentation Challenge (KiTS19)](https://kits19.grand-challenge.org/) in DICOM SEG format.  

We can get an inventory of all scans in the collection using **nbia.getSeries()**.

In [None]:
df = nbia.getSeries(collection = "C4KC-KiTS", format = "df")
sorted = df.sort_values(["PatientID", "SeriesDescription"])
sorted.head(4)

Here we can see that patient KiTS-00000 has 3 CT series and one SEG series.  How do we know which one of the CTs goes with the SEG?  In many cases you can figure this out by looking at the Reference Series UID tag in the segmentation series.  Let's try it by saving the SEG series UID to a variable.

In [None]:
segSeries = sorted.loc[df['Modality'] == 'SEG', 'SeriesInstanceUID'].iloc[0]

print(segSeries)

Next, let's determine the Reference Series Instance UID of the CT scan that goes with the segmentation using **nbia.getSegRefSeries()**.

In [None]:
refSeries = nbia.getSegRefSeries(segSeries)

print(refSeries)

Now let's download these two series.

In [None]:
nbia.downloadSeries([refSeries, segSeries], input_type= "list", format = "df")

Finally, we can look at the images and segmentation together.  You can move the slider to flip through the images and toggle the segmentation layer on/off.  

**Tip:** Once the slider is selected, sometimes it's easier to move between images using the left/right arrow keys on your keyboard than to use your mouse.

In [None]:
nbia.viewSeriesAnnotation(seriesUid = refSeries, annotationUid = segSeries)

# 4 Download and visualize a sample DICOM RTSTRUCT
RTSTRUCT is another common format used to save segmentations.  Let's take a look at the [Annotations for The Clinical Proteomic Tumor Analysis Consortium Pancreatic Ductal Adenocarcinoma Collection (CPTAC-PDA-Tumor-Annotations) dataset](https://doi.org/10.7937/BW9V-BX61) as an example.  This [Analysis Result](https://www.cancerimagingarchive.net/tcia-analysis-results/) dataset analyzed images from the [CPTAC-PDA](https://doi.org/10.7937/K9/TCIA.2018.SC20FO18) collection.

This time around, let's use the **modality** parameter in getSeries() to only return the RTSTRUCT series.

In [None]:
df = nbia.getSeries(collection = "CPTAC-PDA", modality = "RTSTRUCT", format = "df")
display(df)

If you look at the Series Description column you'll note that in some cases these RTSTRUCT series are listed as "seed point" or "no finding".  These ones would not be particularly useful to visualize so let's make sure to avoid those. You can update the code below to use any of the other series UIDs you prefer, but let's start with **1.2.826.0.1.534147.667.2747872357.2023429821032.4** which has a description of **"Pre-dose, PANCREAS - 1"**.

In [None]:
segSeries = "1.2.826.0.1.534147.667.2747872357.2023429821032.4"

Next, let's determine the Reference Series Instance UID of the CT scan that goes with the segmentation.

In [None]:
refSeries = nbia.getSegRefSeries(segSeries)

print(refSeries)

Now let's download these two series.  

In [None]:
nbia.downloadSeries([refSeries, segSeries], input_type= "list", format = "df")

Finally, we can look at the images and segmentation together.  You can move the slider to flip through the images and toggle the segmentation layer on/off.

In [None]:
nbia.viewSeriesAnnotation(seriesUid = refSeries, annotationUid = segSeries)