<a href="https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/ACNS0332.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Accessing DICOM images and annotations from the ACNS0332 dataset hosted on TCIA

This notebook is focused on accessing the **"Annotations for Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332)"** dataset hosted on [The Cancer Imaging Archive(TCIA)](https://cancerimagingarchive.net). 


# 1 Learn about and request access to the datasets

The images, annotations (tumor segmentation and seed point labels), and clinical data associated with this trial are described in detail at the following links.  These pages are publicly visible without logging in, and can be used to obtain an understanding of the dataset before going through the trouble of requesting access:

1.  [Image Collection Summary](https://doi.org/10.7937/TCIA.582B-XZ89)
2.  [Annotation Summary](https://doi.org/10.7937/D8A8-6252)
3.  [Clinical datasets](https://nctn-data-archive.nci.nih.gov/node/838)

**Note:** You can use the link above to view data dictionaries outlining the specific clinical variables that were collected before requesting access.

### Requesting Access to the data
In order to download the actual data you must request access through the NCTN Data Archive via the following steps:
 
 1. [Register an account on the NCTN Data Archive](https://nctn-data-archive.nci.nih.gov/).  
 2. After logging in, use the "Request Data" link in the left side menu.  
 3. Follow the on screen instructions, and enter ***NCT00392327*** when asked which trial you want to request.  
 4. In step 2 of the Create Request form, be sure to select “Imaging Data Requested”. 
 
Once you are approved for access you'll be able to download the clinical data from the NCTN Archive.  You will then be asked to create an account on TCIA with the same email address to access the imaging data.  Please contact NCINCTNDataArchive@mail.nih.gov for any questions about access requests.  

# 2 Import TCIA Utils

This imports a variety of useful functions for accessing TCIA via Jupyter/Python.

In [20]:
import requests
import pandas as pd

tcia_utils_text = requests.get("https://github.com/kirbyju/TCIA_Notebooks/raw/main/tcia_utils.py")
with open('tcia_utils.py', 'wb') as f:
    f.write(tcia_utils_text.content)

In [21]:
import tcia_utils as tcia

# 3 Downloading images and annotations with NBIA Data Retriever

TCIA utilizes software called NBIA to manage its DICOM data.  One way to download TCIA data is to install the [linux command-line version of the NBIA Data Retriever](https://wiki.cancerimagingarchive.net/x/2QKPBQ) using the following steps.  This tool provides a number of useful features such as auto-retry if there are any problems, saving data in an organized hierarchy on your hard drive (Collection > Patient > Study > Series > Images) and providing a CSV file containing key DICOM metadata about the images you've downloaded.

**Note:** It's also possible to download these data via our REST API if you can't or don't want to install Data Retriever. This is covered later in the notebook.

## 3.1 Install the NBIA Data Retriever CLI package

In [22]:
# install NBIA Data Retriever CLI software for downloading images later in this notebook

!mkdir /usr/share/desktop-directories/
!wget -P /content/NBIA-Data-Retriever https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/nbia-data-retriever-4.4.deb
!dpkg -i /content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb

# NOTE: If you're working on a Linux OS that uses RPM packages you can change the wget line above to point to
#       https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/NBIADataRetriever-4.4-1.x86_64.rpm

--2022-11-20 13:59:51--  https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/nbia-data-retriever-4.4.deb
Resolving cbiit-download.nci.nih.gov (cbiit-download.nci.nih.gov)... 129.43.254.25, 2607:f220:41d:21c1::812b:fe19
Connecting to cbiit-download.nci.nih.gov (cbiit-download.nci.nih.gov)|129.43.254.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68709292 (66M) [application/x-debian-package]
Saving to: ‘/content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb’


2022-11-20 14:00:28 (1.79 MB/s) - ‘/content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb’ saved [68709292/68709292]

Selecting previously unselected package nbia-data-retriever.
(Reading database ... 123991 files and directories currently installed.)
Preparing to unpack .../nbia-data-retriever-4.4.deb ...
Unpacking nbia-data-retriever (4.4) ...
Setting up nbia-data-retriever (4.4) ...
Adding shortcut to the menu


## 3.2 Setup your credential file
Since this Collection requires logging in you must setup a **credentials.txt** file which contains your user name and password. We will leverage the **tcia.makeCredentialFile()** function to do this.

In [23]:
tcia.makeCredentialFile()

Enter User: 
kirbyju
Enter Password: ··········
Credential file for NBIA Data Retriever saved: credentials.txt


## 3.3 Download data
The Data Retriever software works by ingesting a "manifest" file that contains the DICOM Series Instance UIDs of the scans you'd like to download. The manifest files can be downloaded from [this page](https://doi.org/10.7937/D8A8-6252), but you can also obtain these manifests with the commands below.

* Annotations -- Segmentations, Seed Points, and Negative Findings Assessments 
* Source Images used to create Segmentations & Seed Points
* Source Images used to create Negative Assessment reports

In [None]:
# Annotations -- Segmentations, Seed Points, and Negative Findings Assessments
manifest = requests.get("https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia?api=v2")
with open('ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia', 'wb') as f:
    f.write(manifest.content)

In [None]:
# Source Images used to create Segmentations & Seed Points
manifest = requests.get("https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-OriginalMRs-SEGSandSeedpoints-manifest_10-04-2022.tcia?api=v2")
with open('ACNS0332-OriginalMRs-SEGSandSeedpoints-manifest_10-04-2022.tcia', 'wb') as f:
    f.write(manifest.content)


In [None]:
# Source Images used to create Negative Assessment reports
# (no segmentation or seed points created for the scan)
manifest = requests.get("https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-OriginalMRs-NegativeAssessments-manifest_10-04-2022.tcia?api=v2")
with open('ACNS0332-OriginalMRs-NegativeAssessments-manifest_10-04-2022.tcia', 'wb') as f:
    f.write(manifest.content)

A manifest containing sample images and annotations for a single subject has also been created for use with this notebook to facilitate quick testing and demonstrations.

In [24]:
# Single subject manifest containing examples of each annotation type
# Use this one for a quick demo
manifest = requests.get("https://github.com/kirbyju/TCIA_Notebooks/raw/main/ACNS0332/acns0332-demo-PARJIR.tcia")
with open('acns0332-demo-PARJIR.tcia', 'wb') as f:
    f.write(manifest.content)

Now we can open the sample manifest file with the NBIA Data Retriever to download the actual data. You can repeat this step for each manifest you'd like to download by changing the path.

**<font color='red'>After running the following command, click in the output cell, type "y," and press Enter to agree with the TCIA Data Usage Policy and start the download.</font>**

In [25]:
!/opt/nbia-data-retriever/nbia-data-retriever --cli '/content/acns0332-demo-PARJIR.tcia' -d /content/ -l /content/credentials.txt

The download log can be found at /content/NBIADataRetrieverCLI-20220120020105.log
2022-11-20 14:01:05: INFO: Using manifiest file: /content/acns0332-demo-PARJIR.tcia

2022-11-20 14:01:05: INFO: Running with option: quiet = false; verbose = false; force = false

2022-11-20 14:01:05: INFO: The type of data downloading is DICOM

Data Usage Policy

Any user accessing TCIA data must agree to:
- Not use the requested datasets, either alone or in concert with any other information, to identify or contact individual participants from whom data and/or samples were collected and follow all other conditions specified in the TCIA Site Disclaimer. Approved Users also agree not to generate and use information (e.g., facial images or comparable representations) in a manner that could allow the identities of research participants to be readily ascertained. These provisions do not apply to research investigators operating with specific IRB approval, pursuant to 45 CFR 46, to contact individuals within 

# 4 Accessing the REST APIs 
The NBIA REST APIs allow TCIA users to query metadata and download image data.  Since this dataset is "limited access" we'll need to use the "NBIA Search with Authentication REST API" described at https://wiki.cancerimagingarchive.net/x/X4ATBg which enables you to use your login credentials to create API tokens to access this Collection.  However, we'll rely heavily on [**tcia_utils**](https://github.com/kirbyju/TCIA_Notebooks/blob/main/tcia_utils.py) to simplify things.  

## 4.1 Create an API token

We'll use **tcia.getToken()** to generate an access token to query restricted Collections on TCIA.  **<font color='red'>Tokens are valid for 2 hours and must be refreshed after that point.</font>** See https://wiki.cancerimagingarchive.net/x/X4ATBg for more details. 

In [26]:
access_token = tcia.getToken()

Enter User: 
kirbyju
Enter Password: ··········
Success - Token saved to api_call_headers variable:  {'Authorization': 'Bearer f1b5c90d-3e6d-4392-89a2-5163481a377f'}
Token expires at 2022-11-20 16:06:27.989169


## 4.2 Explore the data with REST API Queries

Let's start by looking at what body parts and modalities are contained in the collection.  For this dataset the 3D segmentations are SEG modality. RTSTRUCTs were used to record seed points and scans where no tumor was found.

In [27]:
# count patients for each modality
data = tcia.getModalityCounts(collection = "ACNS0332")

# format results in dataframe
df = pd.DataFrame(data)
df.rename(columns = {'criteria':'Modality', 'count':'PatientCount'}, inplace = True)
df.PatientCount = df.PatientCount.astype(int)
display(df.sort_values(by='PatientCount', ascending=False, ignore_index = True))

Calling...  https://services.cancerimagingarchive.net/nbia-api/services/getModalityValuesAndCounts with parameters {'Collection': 'ACNS0332'}


Unnamed: 0,Modality,PatientCount
0,MR,85
1,RTSTRUCT,85
2,SEG,85
3,CT,5


In [28]:
# Count patients for each body part examined
data = tcia.getBodyPartCounts(collection = "ACNS0332")

# format results in dataframe
df = pd.DataFrame(data)
df.rename(columns = {'criteria':'BodyPartExamined', 'count':'PatientCount'}, inplace = True)
df.PatientCount = df.PatientCount.astype(int)
display(df.sort_values(by='PatientCount', ascending=False, ignore_index = True))

Calling...  https://services.cancerimagingarchive.net/nbia-api/services/getBodyPartValuesAndCounts with parameters {'Collection': 'ACNS0332'}


Unnamed: 0,BodyPartExamined,PatientCount
0,NOT SPECIFIED,85
1,HEAD,30
2,BRAIN,17
3,SPINE,12
4,CSPINE,5
5,TSPINE,4
6,WHOLESPINE,3
7,ORBIT,2


Now let's run some queries to see what we can learn about the patient cohort from the DICOM metadata.  This information can include things like age, gender, and ethnicity.  However, in the case of ACNS0332, most of this information is also available in the clinical data at https://nctn-data-archive.nci.nih.gov/node/838.

In [29]:
# obtain patient details (e.g. species, gender, ethnicity) for the collection 
data = tcia.getPatient(collection = "ACNS0332", api_url = "restricted")

# format as dataframe
df = pd.DataFrame(data)
display(df)

# optional - save to CSV file
# df.to_csv('patient_metadata.csv')

Calling...  https://services.cancerimagingarchive.net/nbia-api/services/v2/getPatient with parameters {'Collection': 'ACNS0332'}


Unnamed: 0,PatientId,PatientName,PatientSex,Collection,Phantom,SpeciesCode,SpeciesDescription,EthnicGroup
0,ACNS0332_PASFHY,ACNS0332_PASFHY,F,ACNS0332,NO,337915000,Homo sapiens,
1,ACNS0332_PARMZW,ACNS0332_PARMZW,F,ACNS0332,NO,337915000,Homo sapiens,
2,ACNS0332_PASEUA,ACNS0332_PASEUA,M,ACNS0332,NO,337915000,Homo sapiens,
3,ACNS0332_PASFUN,ACNS0332_PASFUN,F,ACNS0332,NO,337915000,Homo sapiens,
4,ACNS0332_PARJKJ,ACNS0332_PARJKJ,F,ACNS0332,NO,337915000,Homo sapiens,
...,...,...,...,...,...,...,...,...
80,ACNS0332_PAUEPE,ACNS0332_PAUEPE,M,ACNS0332,NO,337915000,Homo sapiens,Non-Hispanic
81,ACNS0332_PATETZ,ACNS0332_PATETZ,F,ACNS0332,NO,337915000,Homo sapiens,
82,ACNS0332_PATUKV,ACNS0332_PATUKV,M,ACNS0332,NO,337915000,Homo sapiens,W
83,ACNS0332_PAUFRD,ACNS0332_PAUFRD,M,ACNS0332,NO,337915000,Homo sapiens,


In [30]:
# obtain study/visit details (e.g. anonymized study date, age at the time of visit)
data = tcia.getStudy(collection = "ACNS0332", api_url = "restricted")

# format as dataframe
df = pd.DataFrame(data)
display(df)

# optional - save to CSV file
# df.to_csv('study_metadata.csv')

Calling...  https://services.cancerimagingarchive.net/nbia-api/services/v2/getPatientStudy with parameters {'Collection': 'ACNS0332'}


Unnamed: 0,StudyInstanceUID,StudyDate,StudyDescription,PatientID,PatientName,PatientSex,Collection,SeriesCount,PatientAge,AdmittingDiagnosesDescription,EthnicGroup
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.962621379139...,1960-03-23 00:00:00.0,IRM cérébrale + colonne complète C+,ACNS0332_PASFHY,ACNS0332_PASFHY,F,ACNS0332,24,,,
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.579827085764...,1959-12-04 00:00:00.0,MRI BRAIN-C SPINE,ACNS0332_PARMZW,ACNS0332_PARMZW,F,ACNS0332,27,,,
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.631553018438...,1959-12-13 00:00:00.0,MRI THORACO-LUMBAR SPINE [SP],ACNS0332_PASEUA,ACNS0332_PASEUA,M,ACNS0332,15,005Y,,
3,1.3.6.1.4.1.14519.5.2.1.1610.1210.479997672838...,1960-10-10 00:00:00.0,MRI BRAIN W&W/O,ACNS0332_PASFUN,ACNS0332_PASFUN,F,ACNS0332,21,006Y,,
4,1.3.6.1.4.1.14519.5.2.1.1610.1210.238485496393...,1960-03-21 00:00:00.0,MRI BRAIN W & WO CONTRAST,ACNS0332_PARJKJ,ACNS0332_PARJKJ,F,ACNS0332,22,,,
...,...,...,...,...,...,...,...,...,...,...,...
683,1.3.6.1.4.1.14519.5.2.1.1610.1210.234859294674...,1963-04-22 00:00:00.0,MRI THORACIC SPINE W/WO,ACNS0332_PATJAX,ACNS0332_PATJAX,F,ACNS0332,3,008Y,,W
684,1.3.6.1.4.1.14519.5.2.1.1610.1210.263334701210...,1963-04-22 00:00:00.0,MRI CERVICAL SPINE W/WO,ACNS0332_PATJAX,ACNS0332_PATJAX,F,ACNS0332,5,008Y,,W
685,1.3.6.1.4.1.14519.5.2.1.1610.1210.201151845894...,1960-03-26 00:00:00.0,MRI LUMBAR SPINE W/WO,ACNS0332_PATJAX,ACNS0332_PATJAX,F,ACNS0332,6,005Y,,W
686,1.3.6.1.4.1.14519.5.2.1.1610.1210.161183035460...,1963-04-22 00:00:00.0,MRI LUMBAR SPINE W/WO,ACNS0332_PATJAX,ACNS0332_PATJAX,F,ACNS0332,3,008Y,,W


We can also create a report that gives useful metadata about each scan in the dataset (e.g. series description, modality, scanner manufacturer & software version, number of images).

In [31]:
# obtain scan/series metadata for a collection as JSON
collection_series = tcia.getSeries(collection = "ACNS0332", api_url = "restricted")

# format as dataframe
df = pd.DataFrame(collection_series)
display(df)

# optional - save to CSV file
# df.to_csv('series_metadata.csv')

Calling...  https://services.cancerimagingarchive.net/nbia-api/services/v2/getSeries with parameters {'Collection': 'ACNS0332'}


Unnamed: 0,SeriesInstanceUID,StudyInstanceUID,Modality,ProtocolName,SeriesDate,SeriesDescription,SeriesNumber,Collection,PatientID,Manufacturer,ManufacturerModelName,SoftwareVersions,ImageCount,TimeStamp,BodyPartExamined
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.201677653774...,1.3.6.1.4.1.14519.5.2.1.1610.1210.255904210784...,MR,CERVICAL CTL/,1960-09-15 00:00:00.0,3 PL LOC,1,ACNS0332,ACNS0332_PARJXF,GE MEDICAL SYSTEMS,GENESIS_SIGNA,09,21,2021-04-02 18:43:55.0,
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.194512030268...,1.3.6.1.4.1.14519.5.2.1.1610.1210.736815988173...,MR,CTL-LSP/,1960-03-24 00:00:00.0,AX T2 (top),5,ACNS0332,ACNS0332_PARJXF,GE MEDICAL SYSTEMS,GENESIS_SIGNA,09,39,2021-04-02 18:46:15.0,
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.969034421107...,1.3.6.1.4.1.14519.5.2.1.1610.1210.164586532274...,MR,THORACIC SPINE/6,1960-03-31 00:00:00.0,Axial T1 Survey+C Upper,14,ACNS0332,ACNS0332_PASDZF,GE MEDICAL SYSTEMS,GENESIS_SIGNA,09,42,2021-04-02 18:46:44.0,
3,1.3.6.1.4.1.14519.5.2.1.1610.1210.474335544217...,1.3.6.1.4.1.14519.5.2.1.1610.1210.138753276907...,MR,,1959-12-07 00:00:00.0,Sag T1 GADO,10,ACNS0332,ACNS0332_PARWGR,GE MEDICAL SYSTEMS,GENESIS_SIGNA,09,15,2021-04-02 17:29:58.0,
4,1.3.6.1.4.1.14519.5.2.1.1610.1210.135652871183...,1.3.6.1.4.1.14519.5.2.1.1610.1210.242950619555...,MR,,1959-12-16 00:00:00.0,AX T1(LOW),12,ACNS0332,ACNS0332_PARJXF,GE MEDICAL SYSTEMS,GENESIS_SIGNA,09,38,2021-04-02 18:48:39.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11955,2.25.746725799395800412155734761275600547204,1.3.6.1.4.1.14519.5.2.1.1610.1210.130842300940...,SEG,,2022-03-05 00:00:00.0,"Post-operative, 2 - FLAIR - L Parietal - V1",603,ACNS0332,ACNS0332_PAVFYR,Philips Medical Systems,Ingenia,4.1.2,1,2022-09-27 12:28:35.0,
11956,2.25.416888458822390997925483369955069571587,1.3.6.1.4.1.14519.5.2.1.1610.1210.995615149443...,SEG,,2022-03-05 00:00:00.0,"Post-radiation, 3 - FLAIR - L Parietal - V1",602,ACNS0332,ACNS0332_PAVFYR,Philips Medical Systems,Ingenia,4.1.2,1,2022-09-27 12:23:04.0,
11957,2.25.701985526089272443691923292317804909336,1.3.6.1.4.1.14519.5.2.1.1610.1210.651528408107...,SEG,,2022-03-06 00:00:00.0,"Post-operative, 2 - FLAIR - R Frontal lobe - V1",5,ACNS0332,ACNS0332_PAVHLY,SIEMENS,TrioTim,syngo MR B17,1,2022-09-27 12:32:02.0,
11958,2.25.717041541199265087103681199531085143462,1.3.6.1.4.1.14519.5.2.1.1610.1210.204192488038...,SEG,,2022-03-11 00:00:00.0,"Recurrence, 5 - T1 post - 3D - R occipital met...",17,ACNS0332,ACNS0332_PARZJN,SIEMENS,Aera,syngo MR D13,1,2022-09-27 12:31:24.0,


Finally, we can use the results from the getSeries() query to generate some summary statistics about the scans in the collection.

In [32]:
# Calculate summary statistics for a given collection 
tcia.makeSeriesReport(collection_series)

Summary Statistics

Subjects:  85 subjects
Subjects:  688 studies
Subjects:  11960 series
Images:  379346 images

Series Counts - Modality:
MR          8606
RTSTRUCT    2257
SEG         1086
CT            11
Name: Modality, dtype: int64 

Series Counts - Body Parts Examined:
NaN           9319
HEAD          1343
BRAIN          535
SPINE          410
TSPINE         132
CSPINE          91
WHOLESPINE      88
ORBIT           42
Name: BodyPartExamined, dtype: int64 

Series Counts - Device Manufacturers:
SIEMENS                          4731
GE MEDICAL SYSTEMS               3399
NaN                              1402
Philips Medical Systems          1294
Unspecified                       670
Philips Healthcare                251
Philips Medical Systems, Inc.     187
Hitachi Medical Corporation        16
Philips                             4
AMICAS Inc.                         3
Toshiba                             2
Picker                              1
Name: Manufacturer, dtype: int64


## 4.3 Downloading data with the REST API
Next we'll cover using the API to download data.  This can be useful if you'd like to download results from API queries rather than using an existing manifest file.  It's also useful if you can't install the NBIA Data Retriever or want to integrate TCIA downloads into other pipelines/tools.  

As a reminder, many of the scans in the ACNS0332 Collection were not annotated by the authors of https://doi.org/10.7937/D8A8-6252.  The reasons are outlined in the Annotation Protocol on that page.  As a result, you may wish to download only a subset of the scans such as:

1. Seed point labels (RTSTRUCTs)
2. 3d segmentation labels (SEGs)
3. All Source images used to create either seed points or segmentations
4. Source images used to create segmentations
5. Source images with negative finding assessments

The following examples will demonstrate how to tackle each of these use cases. 

### 4.3.1 tcia_utils Download Functions
**tcia_utils** contains two functions for downloading data. These are **downloadSampleSeries()** and **downloadSeries()**.   The only difference between them is that **downloadSampleSeries()** only grabs the first 3 scans in the list of scans to download, which is useful for demonstration and testing purposes.

Both functions ingest a set of seriesUids to download.  By default, the functions expect JSON data containing "SeriesInstanceUID" elements which can be generated using **getSeries()**.  However, if you have a series UID list from some other source you can set **input_type = "list"** to pass a regular list of series UIDs instead of JSON. Both functions return a dataframe of the series metadata describing the data that were downloaded.  You can optionally export a CSV of the series metadata by specifying the **csv_filename** parameter.

### 4.3.2 Download a sample subject
Before downloading data for the entire cohort let's take a quick look at the various types of data for a single subject.  We'll use the **acns0332-demo-PARJIR.tcia** manifest file in this example to tell **tcia.downloadSeries()** what to download.

TCIA manifest files contain several lines of download parameters that precede a list of Series Instance UIDs to download.  The steps below will put the Series UIDs into a list and ignore the parameters in the first 6 lines of the manifest so that we have a clean list to feed to **tcia.downloadSeries()**.

In [33]:
# initialize variable
series_data = []

# open file
with open("acns0332-demo-PARJIR.tcia") as f:
    for line in f:
        series_data.append(line.rstrip())

# remove the parameters from the list
del series_data[:6]
#print(series_data)

print("Result contains", len(series_data), "Series Instance UIDs (scans).")


Result contains 15 Series Instance UIDs (scans).


Now we can download the series in the list and return the metadata to a dataframe. 

In [34]:
df = tcia.downloadSeries(series_data, input_type = "list", api_url = "restricted")


Downloading 15 Series Instance UIDs (scans).
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.118865946438136982211944976786
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.310223825916677578694289004355
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.133707348426261300194248861386
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.585371091614335524524223923423
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.324701521225812390399406353568
Downloading... https

Here's what the metadata looks like in the dataframe.

In [35]:
display(df)

Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.118865946438...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,axT2FlairFIL,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,32,3908660,6.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.310223825916...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,ep2ddiff3scantracep2ADC,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,33,2543760,26.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.133707348426...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,axT1MPRAGEcrfm,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,50,26368988,30.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
3,1.3.6.1.4.1.14519.5.2.1.1610.1210.585371091614...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,sagT1MPRAGEcFIL,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,160,73942660,29.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
4,1.3.6.1.4.1.14519.5.2.1.1610.1210.324701521225...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,corT1MPRAGEcrfm,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,62,32697616,31.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
5,2.25.628029196530005029948410628875495202616,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,Pre-operative 1 - FLAIR - pineal - V1,Unspecified,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,55852,6.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
6,1.2.826.0.1.534147.638.2321637500.202111221857...,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,Pre-operative 1 - DIFF - pineal - V1,,RTSTRUCT,1.2.840.10008.5.1.4.1.1.481.3,1,3784,26.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
7,2.25.921705153868212963662000820258234252513,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,Pre-operative 1 - T1 post - AX - pineal - V1,SIEMENS,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,310354,30.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
8,2.25.665103211538931597565389780998787135632,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,Pre-operative 1 - T1 post - SAG - pineal - V1,SIEMENS,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,550182,29.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
9,1.2.826.0.1.534147.638.2321637500.202111221855...,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PARJIR,1.3.6.1.4.1.14519.5.2.1.1610.1210.276497763958...,MR BRAIN WWO CONTRAST,12-07-1959,Pre-operative 1 - FLAIR - pineal - V1,,RTSTRUCT,1.2.840.10008.5.1.4.1.1.481.3,1,3744,6.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


### 4.3.3 Download subsets of the data
To identify the subsets for the other use cases we'll leverage the **annotation metadata** spreadsheet the authors provided, which you can download from https://doi.org/10.7937/D8A8-6252 or retrieve directly into a dataframe with the code below.

In [36]:
# load annotation metadata spreadsheet to df

annotation_Metadata = pd.read_csv('https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332_annotations_metadata-2022-10-04.csv?api=v2')

display(annotation_Metadata)

Unnamed: 0,PatientID,ClinicalTrialTimePointID,SeriesInstanceUID,SeriesDescription,DICOM Type,StructureSetLabel,Segment Label,Volume,Anatomic Region,Anatomic Region Modifier,ReferencedSeriesInstanceUID
0,ACNS0332_PASLXC,Post-chemotherapy,2.25.506823838552100253390355168579717781332,"Post-chemotherapy, 4 - T1 post - AX - R Pariet...",SEG,,Enhancing Lesion,1672.021200,Right parietal lobe,Cerebellopontine angle,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...
1,ACNS0332_PASLXC,Post-chemotherapy,2.25.730989031651263725336422681483644119601,"Post-chemotherapy, 4 - T1 Post - COR - L Tempo...",SEG,,Enhancing Lesion,2124.926702,Left temporal lobe,Cerebellopontine angle,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...
2,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115333...,4 - T1 post - SAG - R Parietal Met - V1 - seed...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334...
3,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115483...,4 - T1 Post - AX - L Temporal lobe met - V1 - ...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...
4,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115283...,4 - T1 Post - COR - R Temporal lobe met - V1 -...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...
...,...,...,...,...,...,...,...,...,...,...,...
3338,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442836...,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.338086694515...
3339,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442458394,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.104859860317...
3340,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442742662,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.339964117902...
3341,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442245...,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.311742493043...


#### Download seed points
Since we're working with Series UIDs from a dataframe instead of JSON output from the API we'll use the  **input_type = "list"** parameter in the remaining download steps.  Options to download a sample (3 scans) or the entire dataset are provided.  We'll also specify a **csv_filename** to save the related metadata to a file.

In [37]:
# filter dataframe to only include seed point rows
seedPoints = annotation_Metadata[annotation_Metadata['StructureSetLabel'] == 'Seed Points']
#display(seedPoints)

# extract series UID column to list for downloading
series_data = seedPoints["SeriesInstanceUID"].tolist()

# download a sample set of 3 scans 
# return metadata dataframe as df
# save a CSV of the metadata 
df = tcia.downloadSampleSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_seedPoints")

# If you'd rather download the full dataset...
# df = tcia.downloadSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_seedPoints")

Downloading first 3 scans out of 1258 Series Instance UIDs (scans).
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211153339761.4
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211154830678.4
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211152830998.4
Sample download complete.
Manifest CSV saved as acns0332_seedPoints.csv


#### Download 3D Segmentations

In [38]:
# filter dataframe to only include segmentations
segs = annotation_Metadata[annotation_Metadata['DICOM Type'] == 'SEG']
# display(segs)

# extract series UID column to list for downloading
series_data = segs["SeriesInstanceUID"].tolist()

# download a sample set of 3 scans 
# return metadata dataframe as df
# save a CSV of the metadata 
df = tcia.downloadSampleSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_SEGs")

# If you'd rather download the full dataset...
# df = tcia.downloadSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_SEGs")

Downloading first 3 scans out of 1086 Series Instance UIDs (scans).
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=2.25.506823838552100253390355168579717781332
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=2.25.730989031651263725336422681483644119601
Downloading... https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?NewFileNames=Yes&SeriesInstanceUID=2.25.442235324505669768680972302988530823444
Sample download complete.
Manifest CSV saved as acns0332_SEGs.csv


#### Download source images for seed points AND segmentations

In [None]:
# filter dataframe to only include seg and seed point rows (remove "no findings")
ref_Series = annotation_Metadata[(annotation_Metadata['StructureSetLabel'] == 'Seed Points') |
                                 (annotation_Metadata['DICOM Type'] == 'SEG')]

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')
# display(clean_refSeries)

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# download a sample set of 3 scans 
# return metadata dataframe as df
# save a CSV of the metadata 
df = tcia.downloadSampleSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "seg_seed_source_images")

# If you'd rather download the full dataset...
# df = tcia.downloadSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_seg_seed_MRIs")

#### Download source images for segmentations (excludes MRIs that only had seed points without segmentations)

In [None]:
# filter dataframe to only include seg
ref_Series = annotation_Metadata[annotation_Metadata['DICOM Type'] == 'SEG']
# display(ref_Series)

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')
# display(clean_refSeries)

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# download a sample set of 3 scans 
# return metadata dataframe as df
# save a CSV of the metadata 
df = tcia.downloadSampleSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_seg_MRIs")

# If you'd rather download the full dataset...
# df = tcia.downloadSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_seg_MRIs")

The following code will download the MRI scans for images with negative finding assessments.  These are cases where the authors of the dataset did not find anything that could be annotated.  Downloading these scans could be useful if you are training a tumor/metastases detection model.

In [None]:
# filter dataframe to only include MRIs with "no findings"
ref_Series = annotation_Metadata[annotation_Metadata['StructureSetLabel'] == 'No Findings']

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# download a sample set of 3 scans 
# return metadata dataframe as df
# save a CSV of the metadata 
df = tcia.downloadSampleSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_noFinding_MRIs")

# If you'd rather download the full dataset...
# df = tcia.downloadSeries(series_data, api_url = "restricted", input_type = "list", csv_filename = "acns0332_noFinding_MRIs")

# Acknowledgements
[The Cancer Imaging Archive (TCIA)](https://www.cancerimagingarchive.net/) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer.  TCIA is funded by the [Cancer Imaging Program (CIP)](https://imaging.cancer.gov/), a part of the United States [National Cancer Institute (NCI)](https://www.cancer.gov/), and is managed by the [Frederick National Laboratory for Cancer Research (FNLCR)](https://frederick.cancer.gov/).

This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/).  If you leverage TCIA datasets in your work please be sure to comply with the [TCIA Data Usage Policy](https://wiki.cancerimagingarchive.net/x/c4hF). Upon receiving access, you must also abide by the terms of your NCTN/NCORP Data Archive’s Data Use Agreement (DUA). You are not allowed to redistribute the data or use it for other purposes. Attribution should include references to the following citations:

## Data Citations

1. Hwang, E. I., Kool, M., Burger, P. C., Capper, D., Chavez, L., Brabetz, S., Williams-Hughes, C., Billups, C., Heier, L., Jaju, A., Michalski, J., Li, Y., Leary, S., Zhou, T., von Deimling, A., Jones, D. T. W., Fouladi, M., Pollack, I. F., Gajjar, A., … Olson, J. M. (2021). Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.582B-XZ89
2. Rozenfeld, M., & Jordan, P. (2022). Annotations for Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332-Tumor-Annotations) (Version 1) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/D8A8-6252

## Publication Citation

Hwang, E. I., Kool, M., Burger, P. C., Capper, D., Chavez, L., Brabetz, S., Williams-Hughes, C., Billups, C., Heier, L., Jaju, A., Michalski, J., Li, Y., Leary, S., Zhou, T., von Deimling, A., Jones, D. T. W., Fouladi, M., Pollack, I. F., Gajjar, A., … Olson, J. M. (2018). Extensive Molecular and Clinical Heterogeneity in Patients With Histologically Diagnosed CNS-PNET Treated as a Single Entity: A Report From the Children’s Oncology Group Randomized ACNS0332 Trial. Journal of Clinical Oncology, 36(34), 3388–3395. https://doi.org/10.1200/jco.2017.76.4720. Epub ahead of print. PMID: 30332335.

## TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7