<a href="https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/ACNS0332/ACNS0332.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Accessing DICOM images and annotations from the ACNS0332 dataset hosted on TCIA

This notebook is focused on accessing the **"Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332)"** Collection hosted on [The Cancer Imaging Archive(TCIA)](https://cancerimagingarchive.net).  This dataset includes [DICOM MRI images](https://doi.org/10.7937/TCIA.582B-XZ89) hosted on TCIA and [clinical data](https://nctn-data-archive.nci.nih.gov/node/838) hosted by the NCTN Data Archive.  The National Cancer Institute has also funded an activity to generate and publish [annotations (3d segmentation labels and seed points)](https://doi.org/10.7937/D8A8-6252) on TCIA to help jumpstart research on tumor detection and auto-segmentation methods.  


# 1 Learn about and request access to the ACNS0332 datasets

The imaging, clinical and annotation data for ACNS0332 are described in detail at the following links.  These pages are publicly visible without logging in so you can obtain an understanding of the dataset before going through the trouble of requesting access:

1.  [ACNS0332 Collection Summary](https://doi.org/10.7937/TCIA.582B-XZ89)
2.  [ACNS0332 Annotation Summary](https://doi.org/10.7937/D8A8-6252)
3.  Descriptions of the 3 clinical datasets can be viewed at https://nctn-data-archive.nci.nih.gov/node/838.  Clicking on each dataset also allows you to view a detailed Data Dictionary outlining the types of clinical variables that were collected.

### Requesting Access to the data
In order to download the actual data you must request access through the NCTN Data Archive via the following steps:
 
 1. [Register an account on the NCTN Data Archive](https://nctn-data-archive.nci.nih.gov/).  
 2. After logging in, use the "Request Data" link in the left side menu.  
 3. Follow the on screen instructions, and enter ***NCT00392327*** when asked which trial you want to request.  
 4. In step 2 of the Create Request form, be sure to select “Imaging Data Requested”. 
 
Once you are approved for access you'll be able to download the clinical data from the NCTN Archive.  You will then be asked to create an account on TCIA with the same email address so that you can access the imaging data.  Please contact NCINCTNDataArchive@mail.nih.gov for any questions about access requests.  

# 2 Set up your TCIA credential file

Since the ACNS0332 collection requires logging in you must setup a TCIA credential file which contains your user name and password. 

**NOTE:** You must enter your real user name and password before you run this, or go and edit the resulting text file with your real credentials after it's created. 

In [2]:
# Create the credential file

lines = ['userName=YourUserName', 'passWord=YourPassword']
with open('credentials.txt', 'w') as f:
    f.write('\n'.join(lines))

# 3 Downloading images and annotations with NBIA Data Retriever

TCIA utilizes software called NBIA to manage its DICOM data.  One way to download TCIA data is to install the [linux command-line version of the NBIA Data Retriever](https://wiki.cancerimagingarchive.net/x/2QKPBQ) using the following steps.  This tool provides a number of useful features such as auto-retry if there are any problems, saving data in an organized hierarchy on your hard drive (Collection > Patient > Study > Series > Images) and providing a CSV file continaing key DICOM metadata about the images you've downloaded.

### 3.1 Install the NBIA Data Retriever CLI package

In [None]:
# install NBIA Data Retriever CLI software for downloading images later in this notebook

!mkdir /usr/share/desktop-directories/
!wget -P /content/NBIA-Data-Retriever https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/nbia-data-retriever-4.4.deb
!dpkg -i /content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb

# NOTE: If you're working on a Linux OS that uses RPM packages you can change the wget line above to point to
#       https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/NBIADataRetriever-4.4-1.x86_64.rpm

--2022-10-05 11:09:39--  https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.4/nbia-data-retriever-4.4.deb
Resolving cbiit-download.nci.nih.gov (cbiit-download.nci.nih.gov)... 129.43.254.25, 2607:f220:41d:21c1::812b:fe19
Connecting to cbiit-download.nci.nih.gov (cbiit-download.nci.nih.gov)|129.43.254.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68709292 (66M) [application/x-debian-package]
Saving to: ‘/content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb’


2022-10-05 11:10:28 (1.34 MB/s) - ‘/content/NBIA-Data-Retriever/nbia-data-retriever-4.4.deb’ saved [68709292/68709292]

Selecting previously unselected package nbia-data-retriever.
(Reading database ... 159447 files and directories currently installed.)
Preparing to unpack .../nbia-data-retriever-4.4.deb ...
Unpacking nbia-data-retriever (4.4) ...
Setting up nbia-data-retriever (4.4) ...
Adding shortcut to the menu


### 3.2 Download the full dataset using NBIA Data Retriever CLI
The Data Retriever software works by ingesting a "manifest" file that contains the DICOM Series Instance UIDs of the scans you'd like to download. The manifest files can be downloaded from [this page](https://doi.org/10.7937/D8A8-6252), but you can also use wget to obtain these manifests with the commands below.

* ACNS0332 Annotations -- Segmentations, Seed Points, and Negative Findings Assessments 
* Original ACNS0332 Images used to create Segmentations & Seed Points
* Original ACNS0332 Images used to create Negative Assessment reports
* Manifest containing examples of each annotation type for a single subject/study (useful for quick testing/demos)

In [None]:
# ACNS0332 Annotations -- Segmentations, Seed Points, and Negative Findings Assessments
!wget -O /content/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia?api=v2

# Original ACNS0332 Images used to create Segmentations & Seed Points
!wget -O /content/ACNS0332-OriginalMRs-SEGSandSeedpoints-manifest_10-04-2022.tcia https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-OriginalMRs-SEGSandSeedpoints-manifest_10-04-2022.tcia?api=v2

# Original ACNS0332 Images used to create Negative Assessment reports
# (no segmentation or seed points created for the scan)
!wget -O /content/ACNS0332-OriginalMRs-NegativeAssessments-manifest_10-04-2022.tcia https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-OriginalMRs-NegativeAssessments-manifest_10-04-2022.tcia?api=v2

# Single subject manifest containing examples of each annotation type
# Use this one for a quick demo
!wget -O /content/acns0332-demo-PARJIR.tcia https://github.com/kirbyju/TCIA_Notebooks/raw/main/ACNS0332/acns0332-demo-PARJIR.tcia

--2022-10-05 16:02:25--  https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia?api=v2
Resolving wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)... 144.30.169.13
Connecting to wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)|144.30.169.13|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 164627 (161K) [application/octet-stream]
Saving to: ‘/content/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia’


2022-10-05 16:02:25 (419 KB/s) - ‘/content/ACNS0332-Tumor-Annotations-manifest_10-04-2022.tcia’ saved [164627/164627]

--2022-10-05 16:02:26--  https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332-OriginalMRs-SEGSandSeedpoints-manifest_10-04-2022.tcia?api=v2
Resolving wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)... 144.30.169.13
Connecting to wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)|144.30.169.13|:443... connecte

Now we can open the manifest file(s) with the NBIA Data Retriever to download the actual data. ***Please note*** that after running the following command you have to ***click in the output cell and type "y"*** to agree with the TCIA Data Usage Policy to start the download.

You can repeat this step for each manifest you'd like to download by changing the path.  For demonstration purposes we'll use the single subject manifest.

In [None]:
# download the original MRI scans, seed points, and segmentations

!/opt/nbia-data-retriever/nbia-data-retriever --cli '/content/acns0332-demo-PARJIR.tcia' -d /content/ -l /content/credentials.txt

The download log can be found at /content/NBIADataRetrieverCLI-20221105111119.log
2022-10-05 11:11:19: INFO: Using manifiest file: /content/acns0332-demo-PARJIR.tcia

2022-10-05 11:11:19: INFO: Running with option: quiet = false; verbose = false; force = false

2022-10-05 11:11:19: INFO: The type of data downloading is DICOM

Data Usage Policy

Any user accessing TCIA data must agree to:
- Not use the requested datasets, either alone or in concert with any other information, to identify or contact individual participants from whom data and/or samples were collected and follow all other conditions specified in the TCIA Site Disclaimer. Approved Users also agree not to generate and use information (e.g., facial images or comparable representations) in a manner that could allow the identities of research participants to be readily ascertained. These provisions do not apply to research investigators operating with specific IRB approval, pursuant to 45 CFR 46, to contact individuals within 

# 4 Accessing the REST APIs 
The NBIA REST APIs are another way to query metadata and download image data.  Since the ACNS0332 dataset is "limited access" we'll need to use the "NBIA Search with Authentication REST API" described at https://wiki.cancerimagingarchive.net/x/X4ATBg which enables you to use your login credentials to create API tokens to access this Collection.

In the following examples we'll use the API to construct queries to explore and download ACNS0332 data.  Many of these queries shown below allow for additional query parameters to refine your search results.  These are covered in the aforementioned documentation.

In [1]:
# imports

import requests
import pandas as pd

### 4.1 Use credential file to create an API token

These steps use the credential file you created previously to generate an access token to query restricted Collections on TCIA.  

***Note:*** Tokens are valid for 2 hours and must be refreshed after that point. See https://wiki.cancerimagingarchive.net/x/X4ATBg for more details. 

In [3]:
# extract the user/pw from the credential file to variables for use in subsequent API calls and downloads          

credentialFilePath = 'credentials.txt'
mylines = []                                  
with open (credentialFilePath, 'rt') as myfile: 
    for myline in myfile:                     
        mylines.append(myline)   

userName = mylines[0].rstrip('\n').split(r'userName=')[1]
passWord = mylines[1].rstrip('\n').split(r'passWord=')[1]  

In [15]:
# request token

token_url = "https://services.cancerimagingarchive.net/nbia-api/oauth/token?username="+userName+"&password="+passWord+"&grant_type=password&client_id=nbiaRestAPIClient&client_secret=ItsBetweenUAndMe"
access_token = requests.get(token_url).json()["access_token"]
print (access_token)


3aff1d11-1d1a-4cf6-af31-b1b583046a0d


### 4.2 Explore the data with REST API Queries

Now we'll set some variables that will apply to the remaining queries.

In [16]:
# set base URL to use the NBIA Search API w/ Authentication.
# Documentation about this API is at https://wiki.cancerimagingarchive.net/x/X4ATBg
base_url = "https://services.cancerimagingarchive.net/nbia-api/services/v2/"

# set Advanced URL to use the NBIA Advanced API.
# Documentation about this API is at https://wiki.cancerimagingarchive.net/x/YoATBg
adv_url = "https://services.cancerimagingarchive.net/nbia-api/services/"

# set collection you want to explore
collection = "ACNS0332"

# set API call headers to use the access token we created
api_call_headers = {'Authorization': 'Bearer ' + access_token}

Next let's run some queries to learn about what types of images are available in this Collection.

In [17]:
# print body part(s) examined in the collection as JSON

data_url = base_url + "getBodyPartValues?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)
if data.text != "":
    data = data.json()
    print (data)
else:
    print("Collection not found")

[{}, {'BodyPartExamined': 'BRAIN'}, {'BodyPartExamined': 'CSPINE'}, {'BodyPartExamined': 'HEAD'}, {'BodyPartExamined': 'ORBIT'}, {'BodyPartExamined': 'SPINE'}, {'BodyPartExamined': 'TSPINE'}, {'BodyPartExamined': 'WHOLESPINE'}]


In [None]:
# print modalities in the collection as JSON

data_url = base_url + "getModalityValues?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)
if data.text != "":
    data = data.json()
    print (data)
else:
    print("Collection not found")

[{'Modality': 'CT'}, {'Modality': 'MR'}, {'Modality': 'RTSTRUCT'}, {'Modality': 'SEG'}]


In [18]:
# Count the number of patients with a given modality in the collection
# For ACNS0332 the 3D segmentations are SEG modality. 
# RTSTRUCT was used to record seed points and scans where no tumor was found.

# get list of available body parts examined
data_url = adv_url + "getModalityValuesAndCounts?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)

# count unique patients for each modality
if data.text != "":
    df = pd.DataFrame(data.json())
    df.rename(columns = {'criteria':'Modality', 'count':'PatientCount'}, inplace = True)
    df.PatientCount = df.PatientCount.astype(int)
    display(df.sort_values(by='PatientCount', ascending=False))
else:
    print("Collection not found.")

Unnamed: 0,Modality,PatientCount
1,MR,85
2,RTSTRUCT,85
3,SEG,85
0,CT,5


In [None]:
# Count the number of patients with a given body part examined in the collection

# get list of available body parts examined
data_url = adv_url + "getBodyPartValuesAndCounts?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)

# count unique patients for each modality
if data.text != "":
    df = pd.DataFrame(data.json())
    df.rename(columns = {'criteria':'BodyPartExamined', 'count':'PatientCount'}, inplace = True)
    df.PatientCount = df.PatientCount.astype(int)
    display(df.sort_values(by='PatientCount', ascending=False))
else:
    print("Collection not found.")

Unnamed: 0,BodyPartExamined,PatientCount
0,NOT SPECIFIED,85
3,HEAD,30
1,BRAIN,17
5,SPINE,12
2,CSPINE,5
6,TSPINE,4
7,WHOLESPINE,3
4,ORBIT,2


Now let's run some queries to see what we can learn about the patient cohort from the DICOM metadata.  This information can include things like age, gender, and ethnicity.  However, in the case of ACNS0332, most of this information is also available in the clinical data at https://nctn-data-archive.nci.nih.gov/node/838.

In [None]:
# obtain patient details (e.g. species, gender, ethnicity) for the collection 
# as JSON and create pandas dataframe w/ optional file export

data_url = base_url + "getPatient?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)
if data.text != "":
    df = pd.DataFrame(data.json())
    display(df)
    # optional - to save to JSON or CSV file
    df.to_csv(collection + '_patient_metadata.csv')
    # df.to_json(collection + '_patient_metadata.json')
else:
    print("Collection not found.")

Unnamed: 0,PatientId,PatientName,PatientSex,Collection,Phantom,SpeciesCode,SpeciesDescription,EthnicGroup
0,ACNS0332_PASFHY,ACNS0332_PASFHY,F,ACNS0332,NO,337915000,Homo sapiens,
1,ACNS0332_PARMZW,ACNS0332_PARMZW,F,ACNS0332,NO,337915000,Homo sapiens,
2,ACNS0332_PASEUA,ACNS0332_PASEUA,M,ACNS0332,NO,337915000,Homo sapiens,
3,ACNS0332_PASFUN,ACNS0332_PASFUN,F,ACNS0332,NO,337915000,Homo sapiens,
4,ACNS0332_PARJKJ,ACNS0332_PARJKJ,F,ACNS0332,NO,337915000,Homo sapiens,
...,...,...,...,...,...,...,...,...
80,ACNS0332_PAUEPE,ACNS0332_PAUEPE,M,ACNS0332,NO,337915000,Homo sapiens,Non-Hispanic
81,ACNS0332_PATETZ,ACNS0332_PATETZ,F,ACNS0332,NO,337915000,Homo sapiens,
82,ACNS0332_PATUKV,ACNS0332_PATUKV,M,ACNS0332,NO,337915000,Homo sapiens,W
83,ACNS0332_PAUFRD,ACNS0332_PAUFRD,M,ACNS0332,NO,337915000,Homo sapiens,


In [None]:
# obtain study/visit details (e.g. anonymized study date, age at the time of visit) for 
# each patient in a given collection as JSON and create pandas dataframe w/ optional file export

data_url = base_url + "getPatientStudy?Collection=" + collection
data = requests.get(data_url, headers = api_call_headers)
if data.text != "":
    df = pd.DataFrame(data.json()).sort_values(by=['PatientID','StudyDate'])
    display(df)
    # optional - to save to JSON or CSV file
    df.to_csv(collection + '_study_metadata.csv')
    # df.to_json(collection + '_study_metadata.json')
else:
    print("Collection not found.")

Unnamed: 0,StudyInstanceUID,StudyDate,StudyDescription,PatientID,PatientName,PatientSex,Collection,SeriesCount,PatientAge,AdmittingDiagnosesDescription,EthnicGroup
170,1.3.6.1.4.1.14519.5.2.1.1610.1210.153722891610...,1960-03-16 00:00:00.0,MA,ACNS0332_PARFPN,ACNS0332_PARFPN,M,ACNS0332,19,014Y,,
230,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,1960-03-16 00:00:00.0,,ACNS0332_PARFPN,ACNS0332_PARFPN,M,ACNS0332,11,014Y,,
101,1.3.6.1.4.1.14519.5.2.1.1610.1210.129723426299...,1960-03-18 00:00:00.0,,ACNS0332_PARFPN,ACNS0332_PARFPN,M,ACNS0332,11,014Y,,
181,1.3.6.1.4.1.14519.5.2.1.1610.1210.313422624304...,1960-10-05 00:00:00.0,,ACNS0332_PARFPN,ACNS0332_PARFPN,M,ACNS0332,12,014Y,,
304,1.3.6.1.4.1.14519.5.2.1.1610.1210.208081782081...,1960-10-05 00:00:00.0,NP,ACNS0332_PARFPN,ACNS0332_PARFPN,M,ACNS0332,20,014Y,,
...,...,...,...,...,...,...,...,...,...,...,...
437,1.3.6.1.4.1.14519.5.2.1.1610.1210.330599648879...,1960-03-17 00:00:00.0,MRI Cervical Spine w/ + w/o Contrast,ACNS0332_PAVURC,ACNS0332_PAVURC,F,ACNS0332,23,,,
389,1.3.6.1.4.1.14519.5.2.1.1610.1210.249877034081...,1960-10-13 00:00:00.0,,ACNS0332_PAVURC,ACNS0332_PAVURC,F,ACNS0332,23,,,
434,1.3.6.1.4.1.14519.5.2.1.1610.1210.320503336485...,1960-10-13 00:00:00.0,,ACNS0332_PAVURC,ACNS0332_PAVURC,F,ACNS0332,22,,,
395,1.3.6.1.4.1.14519.5.2.1.1610.1210.461481942519...,1961-05-12 00:00:00.0,,ACNS0332_PAVURC,ACNS0332_PAVURC,F,ACNS0332,25,,,


We can also create a report that gives useful metadata about each scan in the dataset (e.g. series description, modality, scanner manufacturer & software version, number of images).  

***Note:*** We'll define a function for this so it can be used later in the notebook.

In [None]:
# obtain scan/series metadata for a collection as JSON

def getSeries(collection):
    data_url = base_url + "getSeries?Collection=" + collection
    data = requests.get(data_url, headers = api_call_headers)
    if data.text != "":
        df = pd.DataFrame(data.json()).sort_values(by=['PatientID','SeriesDate'])
        # optional - save to CSV file
        df.to_csv(collection + '_scan_metadata.csv')
        return df
    else:
        print("Collection not found.")

df = getSeries(collection)
display(df)

Unnamed: 0,SeriesInstanceUID,StudyInstanceUID,Modality,ProtocolName,SeriesDate,SeriesDescription,SeriesNumber,Collection,PatientID,Manufacturer,ManufacturerModelName,SoftwareVersions,ImageCount,TimeStamp,BodyPartExamined
297,1.3.6.1.4.1.14519.5.2.1.1610.1210.334474787156...,1.3.6.1.4.1.14519.5.2.1.1610.1210.153722891610...,MR,C:ADC TRACE DFSN/T/SE EPI/FSAT,1960-03-16 00:00:00.0,C:ADC TRACE DFSN/T/SE EPI/FSAT,5,ACNS0332,ACNS0332_PARFPN,"Philips Medical Systems, Inc.",Infinion 1.5T,VIA5.2,20,2021-12-03 08:17:50.0,
582,1.3.6.1.4.1.14519.5.2.1.1610.1210.263873262899...,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,MR,SE T1 AXIAL FC,1960-03-16 00:00:00.0,SE T1 AXIAL FC,8,ACNS0332,ACNS0332_PARFPN,"Philips Medical Systems, Inc.",Infinion 1.5T,VIA5.2,48,2021-12-03 08:17:44.0,
924,1.3.6.1.4.1.14519.5.2.1.1610.1210.204471693976...,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,MR,FFE MT/FC T2* AXIAL,1960-03-16 00:00:00.0,FFE MT/FC T2* AXIAL,7,ACNS0332,ACNS0332_PARFPN,"Philips Medical Systems, Inc.",Infinion 1.5T,VIA5.2,48,2021-12-03 08:17:49.0,
953,1.3.6.1.4.1.14519.5.2.1.1610.1210.308416868687...,1.3.6.1.4.1.14519.5.2.1.1610.1210.153722891610...,MR,SE T1 SAG P/GAD,1960-03-16 00:00:00.0,SE T1 SAG P/GAD,11,ACNS0332,ACNS0332_PARFPN,"Philips Medical Systems, Inc.",Infinion 1.5T,VIA5.2,19,2021-12-03 08:17:52.0,
1216,1.3.6.1.4.1.14519.5.2.1.1610.1210.495608523540...,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,MR,SE T1 SAG P/GAD,1960-03-16 00:00:00.0,SE T1 SAG P/GAD,9,ACNS0332,ACNS0332_PARFPN,"Philips Medical Systems, Inc.",Infinion 1.5T,VIA5.2,14,2021-12-03 08:17:49.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11262,1.2.826.0.1.534147.638.2321637500.202229029461...,1.3.6.1.4.1.14519.5.2.1.1610.1210.216747553501...,RTSTRUCT,,2022-03-09 00:00:00.0,"Pre-operative, No Findings",3401,ACNS0332,ACNS0332_PAVURC,Philips Medical Systems,Achieva,3.2.1,1,2022-09-27 12:24:24.0,
11557,1.2.826.0.1.534147.638.2321637500.2022290375424.4,1.3.6.1.4.1.14519.5.2.1.1610.1210.187492950291...,RTSTRUCT,,2022-03-09 00:00:00.0,"Recurrence, Spine Mets",1401,ACNS0332,ACNS0332_PAVURC,Philips Medical Systems,Achieva,3.2.1,1,2022-09-27 12:25:59.0,
11678,1.2.826.0.1.534147.638.2321637500.2022290337768.4,1.3.6.1.4.1.14519.5.2.1.1610.1210.249877034081...,RTSTRUCT,,2022-03-09 00:00:00.0,"Post-chemotherapy, No Findings",1701,ACNS0332,ACNS0332_PAVURC,Philips Medical Systems,Achieva,3.2.1,1,2022-09-27 12:29:11.0,
11811,1.2.826.0.1.534147.638.2321637500.20222903717378,1.3.6.1.4.1.14519.5.2.1.1610.1210.187492950291...,RTSTRUCT,,2022-03-09 00:00:00.0,"Recurrence, No Findings",1201,ACNS0332,ACNS0332_PAVURC,Philips Medical Systems,Achieva,3.2.1,1,2022-09-27 12:34:02.0,


Finally, we can use that scan report dataframe to generate some helpful summary statistics about the Collection.

In [None]:
# Calculate summary statistics for a given collection 

# Summarize patients
print('Summary Statistics for', collection,'\n')
print('Subjects: ', len(df['PatientID'].value_counts()), 'subjects')
print('Subjects: ', len(df['StudyInstanceUID'].value_counts()), 'studies')
print('Subjects: ', len(df['SeriesInstanceUID'].value_counts()), 'series')
print('Images: ', df['ImageCount'].sum(), 'images\n')

# Summarize modalities
print("Series Counts - Modality:")
print(df['Modality'].value_counts(dropna=False),'\n')

# Summarize body parts
print("Series Counts - Body Parts Examined:")
print(df['BodyPartExamined'].value_counts(dropna=False),'\n')

# Summarize manufacturers
print("Series Counts - Device Manufacturers:")
print(df['Manufacturer'].value_counts(dropna=False))

Summary Statistics for ACNS0332 

Subjects:  85 subjects
Subjects:  688 studies
Subjects:  11960 series
Images:  379346 images

Series Counts - Modality:
MR          8606
RTSTRUCT    2257
SEG         1086
CT            11
Name: Modality, dtype: int64 

Series Counts - Body Parts Examined:
NaN           9319
HEAD          1343
BRAIN          535
SPINE          410
TSPINE         132
CSPINE          91
WHOLESPINE      88
ORBIT           42
Name: BodyPartExamined, dtype: int64 

Series Counts - Device Manufacturers:
SIEMENS                          4731
GE MEDICAL SYSTEMS               3399
NaN                              1402
Philips Medical Systems          1294
Unspecified                       670
Philips Healthcare                251
Philips Medical Systems, Inc.     187
Hitachi Medical Corporation        16
Philips                             4
AMICAS Inc.                         3
Toshiba                             2
Picker                              1
Name: Manufacturer, dtype

### 4.3 Downloading data with the REST API
Now we'll walk through using the API to download data.  This can be useful if you'd like to download specific scans from previous API queries rather than using an existing manifest file or if you can't install the NBIA Data Retriever.  

As a reminder, many of the scans in the ACNS0332 Collection were not annotated by the authors of https://doi.org/10.7937/D8A8-6252.  The reasons for this are outlined in the Annotation Protocol on that page.  As a result, you may wish to download only a subset of the scans such as:

1. Seed points
2. 3d segmentations
3. All MRI images used to create either seed points or segmentations
4. Only MRI images used to create to seed points
5. Only MRI images used to create segmentations
6. Only MRI images with negative finding assessments

The following examples will demonstrate how to download the full collection as well as how to tackle each of these specialized use cases. 

In [None]:
# download imports
import requests, zipfile
from io import BytesIO

First, let's define a generic download function that we can re-use for the various use cases.  This will take a list of series UIDs as the input, download each scan, and create a dataframe/CSV that contains the metadata about each of those scans.  It also accepts an optional parameter to specify a file name if you'd like a CSV export of the dataframe.

***Note: By default only the first 3 scans for each use case below will be downloaded for demonstration purposes. If you'd like to download the full collection you must comment out the relevant lines.***

In [None]:
# define a function to accept a list of seriesInstanceUIDs and download it
# reminder: this only downloads the first 3 scans unless you comment out that section

def downloadSeries(series_data, csv_filename=""):  
    manifestDF=pd.DataFrame()
    seriesUID = ''
    count = 0
    for x in series_data:
        seriesUID = x
        data_url = base_url + "getImage?SeriesInstanceUID=" + seriesUID
        print("Downloading " + data_url)
        data = requests.get(data_url, headers = api_call_headers)
        file = zipfile.ZipFile(BytesIO(data.content))
        # print(file.namelist())
        file.extractall(path = "apiDownload/" + collection + "/" + seriesUID)
        # write the series metadata to a dataframe
        metadata_url = base_url + "getSeriesMetaData?SeriesInstanceUID=" + seriesUID
        metadata = requests.get(metadata_url, headers = api_call_headers).json()
        newRow = pd.DataFrame.from_dict(metadata)
        tmpManifest = pd.concat([manifestDF, newRow], ignore_index = True)
        tmpManifest.reset_index()
        manifestDF = tmpManifest
        # Repeat n times for demo purposes - comment out these next 3 lines to download a full results
        count += 1;
        if count == 3:
            break  
    # display manifest dataframe and/or save manifest to CSV file
    if csv_filename != "":
        manifestDF.to_csv(csv_filename + '.csv')
        display(manifestDF)
    else:
        display(manifestDF)

The most basic use case would be to simply download the entire Collection.  This will provide all of the annotation data (seed points, segmentations, negative finding reports) as well as all of the original scans in the ACNS0332 collection.  Make sure you have enough disk space (~95 GBytes) if you comment out the code that limits the download to the first 3 scans! 

In [None]:
# call getSeries function to retrieve scan metadata for the whole collection
df = getSeries(collection)

# extract the SeriesInstanceUID column
series_data = list(df['SeriesInstanceUID'])

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_full_Collection")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.334474787156668475073572184314
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.263873262899365520466222721148
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.204471693976824958653054017321


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.334474787156...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARFPN,1.3.6.1.4.1.14519.5.2.1.1610.1210.153722891610...,MA,03-16-1960,CADC TRACE DFSNTSE EPIFSAT,Philips Medical Systems Inc.,MR,1.2.840.10008.5.1.4.1.1.4,20,710456,5.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.263873262899...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARFPN,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,,03-16-1960,SE T1 AXIAL FC,Philips Medical Systems Inc.,MR,1.2.840.10008.5.1.4.1.1.4,48,6423862,8.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.204471693976...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PARFPN,1.3.6.1.4.1.14519.5.2.1.1610.1210.102717324321...,,03-16-1960,FFE MTFC T2 AXIAL,Philips Medical Systems Inc.,MR,1.2.840.10008.5.1.4.1.1.4,48,6424648,7.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


To identify the subsets for the other use cases we'll leverage the supplemental spreadsheet the authors provided, which you can download from https://doi.org/10.7937/D8A8-6252 or retrieve directly with ***wget*** using the command below.

In [None]:
# wget ACNS0332 annotation metadata file

!wget -O /content/ACNS0332_annotations_metadata-2022-10-04.csv https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332_annotations_metadata-2022-10-04.csv?api=v2

--2022-10-05 11:29:43--  https://wiki.cancerimagingarchive.net/download/attachments/119703167/ACNS0332_annotations_metadata-2022-10-04.csv?api=v2
Resolving wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)... 144.30.169.13
Connecting to wiki.cancerimagingarchive.net (wiki.cancerimagingarchive.net)|144.30.169.13|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 752921 (735K) [text/csv]
Saving to: ‘/content/ACNS0332_annotations_metadata-2022-10-04.csv’


2022-10-05 11:29:45 (638 KB/s) - ‘/content/ACNS0332_annotations_metadata-2022-10-04.csv’ saved [752921/752921]



Let's take a look at the contents of the spreadsheet using a Pandas dataframe.  

In [None]:
# load annotation metadata spreadsheet to df

annotation_Metadata = pd.read_csv('ACNS0332_annotations_metadata-2022-10-04.csv')

display(annotation_Metadata)

Unnamed: 0,PatientID,ClinicalTrialTimePointID,SeriesInstanceUID,SeriesDescription,DICOM Type,StructureSetLabel,Segment Label,Volume,Anatomic Region,Anatomic Region Modifier,ReferencedSeriesInstanceUID
0,ACNS0332_PASLXC,Post-chemotherapy,2.25.506823838552100253390355168579717781332,"Post-chemotherapy, 4 - T1 post - AX - R Pariet...",SEG,,Enhancing Lesion,1672.021200,Right parietal lobe,Cerebellopontine angle,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...
1,ACNS0332_PASLXC,Post-chemotherapy,2.25.730989031651263725336422681483644119601,"Post-chemotherapy, 4 - T1 Post - COR - L Tempo...",SEG,,Enhancing Lesion,2124.926702,Left temporal lobe,Cerebellopontine angle,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...
2,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115333...,4 - T1 post - SAG - R Parietal Met - V1 - seed...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334...
3,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115483...,4 - T1 Post - AX - L Temporal lobe met - V1 - ...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...
4,ACNS0332_PASLXC,Post-chemotherapy,1.2.826.0.1.534147.638.2321637500.202221115283...,4 - T1 Post - COR - R Temporal lobe met - V1 -...,RTSS,Seed Points,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...
...,...,...,...,...,...,...,...,...,...,...,...
3338,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442836...,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.338086694515...
3339,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442458394,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.104859860317...
3340,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442742662,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.339964117902...
3341,ACNS0332_PASTAK,Pre-operative,1.2.826.0.1.534147.638.2321637500.202212442245...,"Pre-operative, No Findings",RTSS,No Findings,,,,,1.3.6.1.4.1.14519.5.2.1.1610.1210.311742493043...


The following cells will let you build a list of Series Instance UIDs to download based on the previously mentioned use cases.

In [None]:
# Use case: Download seed point RTSTRUCTs

# filter dataframe to only include seed point rows
seedPoints = annotation_Metadata[annotation_Metadata['StructureSetLabel'] == 'Seed Points']
#display(seedPoints)

# extract series UID column to list for downloading
series_data = seedPoints["SeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_seedPoints")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211153339761.4
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211154830678.4
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.2.826.0.1.534147.638.2321637500.2022211152830998.4


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.2.826.0.1.534147.638.2321637500.202221115333...,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,4 - T1 post - SAG - R Parietal Met - V1 - seed...,SIEMENS,RTSTRUCT,1.2.840.10008.5.1.4.1.1.481.3,1,3846,18.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.2.826.0.1.534147.638.2321637500.202221115483...,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,4 - T1 Post - AX - L Temporal lobe met - V1 - ...,SIEMENS,RTSTRUCT,1.2.840.10008.5.1.4.1.1.481.3,1,3848,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.2.826.0.1.534147.638.2321637500.202221115283...,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,4 - T1 Post - COR - R Temporal lobe met - V1 -...,SIEMENS,RTSTRUCT,1.2.840.10008.5.1.4.1.1.481.3,1,3852,19.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


In [None]:
# Use case: Download 3d segmentations

# filter dataframe to only include segmentations
segs = annotation_Metadata[annotation_Metadata['DICOM Type'] == 'SEG']
# display(segs)

# extract series UID column to list for downloading
series_data = segs["SeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_segs")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=2.25.506823838552100253390355168579717781332
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=2.25.730989031651263725336422681483644119601
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=2.25.442235324505669768680972302988530823444


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,2.25.506823838552100253390355168579717781332,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,Post-chemotherapy 4 - T1 post - AX - R Parieta...,SIEMENS,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,23238,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,2.25.730989031651263725336422681483644119601,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,Post-chemotherapy 4 - T1 Post - COR - L Tempor...,SIEMENS,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,63600,19.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,2.25.442235324505669768680972302988530823444,ACNS0332,yes,https://doi.org/10.7937/d8a8-6252,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,Post-chemotherapy 4 - T1 Post - AX - R Tempora...,SIEMENS,SEG,1.2.840.10008.5.1.4.1.1.66.4,1,30478,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


The following cells will download the corresponding MRIs that were annotated.  ***There is significant overlap in the MRIs used between these two sets, so if you're doing full downloads you should only utilize one of the 3 cells below depending on your use case:***
1. All MRIs used for both segmentations ***and*** seedpoints
2. Only MRIs used for segmentations
3. Only MRIs used for seed points

In [None]:
# Use case: Download all MRIs for segmentations AND seed points

# filter dataframe to only include seg and seed point rows (remove "no findings")
ref_Series = annotation_Metadata[(annotation_Metadata['StructureSetLabel'] == 'Seed Points') |
                                 (annotation_Metadata['DICOM Type'] == 'SEG')]

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')
# display(clean_refSeries)

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_seg_seed_MRIs")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595624707876216415765
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549682819864285580214
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334484642800163537050


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 AX post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,4186976,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 Coro Post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,33,14450190,19.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 SAG post gad,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,20063194,18.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


In [None]:
# Use case: Download only MRI images used to create seed points

# filter dataframe to only include seed point rows
ref_Series = annotation_Metadata[annotation_Metadata['StructureSetLabel'] == 'Seed Points']
# display(ref_Series)

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')
# display(clean_refSeries)

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_seed_MRIs")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334484642800163537050
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595624707876216415765
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549682819864285580214


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 SAG post gad,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,20063194,18.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 AX post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,4186976,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 Coro Post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,33,14450190,19.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


In [None]:
# Use case: Download only MRI images used to create 3D segmentations
# USE THIS OPTION IF YOU DO NOT WANT THE ADDITIONAL 
# MRI SCANS USED FOR SEED POINTS 

# filter dataframe to only include seg
ref_Series = annotation_Metadata[annotation_Metadata['DICOM Type'] == 'SEG']
# display(ref_Series)

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')
# display(clean_refSeries)

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_seg_MRIs")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595624707876216415765
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549682819864285580214
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334484642800163537050


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.718042271595...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 AX post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,4186976,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.135379308549...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 Coro Post,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,33,14450190,19.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.505613098334...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,T1 SAG post gad,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,38,20063194,18.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


The following code will download the MRI scans for images with negative finding assessments.  These are cases where the authors of the dataset did not find anything that could be annotated.  Downloading these scans could be useful if you are training a tumor/metastases detection model.

In [None]:
# Use case: Download only images with negative finding assessments
# USE THIS OPTION IF YOU WANT TO REVIEW THE ADDITIONAL 
# MRI SCANS WHERE THE AUTHORS INDICATED THERE WAS NOTHING TO ANNOTATE 

# filter dataframe to only include MRIs with "no findings"
ref_Series = annotation_Metadata[annotation_Metadata['StructureSetLabel'] == 'No Findings']

# remove duplicate ReferencedSeriesUIDs
clean_refSeries = ref_Series.drop_duplicates(subset='ReferencedSeriesInstanceUID')

# extract series UID column to list for downloading
series_data = clean_refSeries["ReferencedSeriesInstanceUID"].tolist()

# feed series_data to our downloadSeries function
downloadSeries(series_data, collection + "_noFinding_MRIs")

Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.255212824149495026818639807813
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.139264764389857635566195644225
Downloading https://services.cancerimagingarchive.net/nbia-api/services/v2/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.1610.1210.182833106195538721839114305481


Unnamed: 0,Series UID,Collection,3rd Party Analysis,Data Description URI,Subject ID,Study UID,Study Description,Study Date,Series Description,Manufacturer,Modality,SOP Class UID,Number of Images,File Size,Series Number,License Name,License URL,Annotation Size
0,1.3.6.1.4.1.14519.5.2.1.1610.1210.255212824149...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.116504503026...,MRI C-SPINE W WO CONTRAST,10-25-1960,T1 AX Post bottom,GE MEDICAL SYSTEMS,MR,1.2.840.10008.5.1.4.1.1.4,26,3494660,17.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
1,1.3.6.1.4.1.14519.5.2.1.1610.1210.139264764389...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.116504503026...,MRI C-SPINE W WO CONTRAST,10-25-1960,T1 Sag POST,GE MEDICAL SYSTEMS,MR,1.2.840.10008.5.1.4.1.1.4,14,7386330,14.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0
2,1.3.6.1.4.1.14519.5.2.1.1610.1210.182833106195...,ACNS0332,NO,https://doi.org/10.7937/TCIA.582B-XZ89,ACNS0332_PASLXC,1.3.6.1.4.1.14519.5.2.1.1610.1210.210266444704...,BRAIN WWO CONTRDIFF MRA PROT,10-03-1960,DTI AXADC,SIEMENS,MR,1.2.840.10008.5.1.4.1.1.4,30,2321322,11.0,NCTN Data Archive License,https://nctn-data-archive.nci.nih.gov/,0


# Acknowledgements
[The Cancer Imaging Archive (TCIA)](https://www.cancerimagingarchive.net/) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer.  TCIA is funded by the [Cancer Imaging Program (CIP)](https://imaging.cancer.gov/), a part of the United States [National Cancer Institute (NCI)](https://www.cancer.gov/), and is managed by the [Frederick National Laboratory for Cancer Research (FNLCR)](https://frederick.cancer.gov/).

This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/), [Petr Jordan](https://www.linkedin.com/in/petrjordan/) and Qinyan Pan.  If you leverage the ACNS0332 or any other TCIA datasets in your work please be sure to comply with the [TCIA Data Usage Policy](https://wiki.cancerimagingarchive.net/x/c4hF). Upon receiving access, you must also abide by the terms of your NCTN/NCORP Data Archive’s Data Use Agreement (DUA). You are not allowed to redistribute the data or use it for other purposes. Attribution should include references to the following citations:

## Data Citations

1. Hwang, E. I., Kool, M., Burger, P. C., Capper, D., Chavez, L., Brabetz, S., Williams-Hughes, C., Billups, C., Heier, L., Jaju, A., Michalski, J., Li, Y., Leary, S., Zhou, T., von Deimling, A., Jones, D. T. W., Fouladi, M., Pollack, I. F., Gajjar, A., … Olson, J. M. (2021). Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.582B-XZ89
2. Rozenfeld, M., & Jordan, P. (2022). Annotations for Chemotherapy and Radiation Therapy in Treating Young Patients With Newly Diagnosed, Previously Untreated, High-Risk Medulloblastoma/PNET (ACNS0332-Tumor-Annotations) (Version 1) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/D8A8-6252

## Publication Citation

Hwang, E. I., Kool, M., Burger, P. C., Capper, D., Chavez, L., Brabetz, S., Williams-Hughes, C., Billups, C., Heier, L., Jaju, A., Michalski, J., Li, Y., Leary, S., Zhou, T., von Deimling, A., Jones, D. T. W., Fouladi, M., Pollack, I. F., Gajjar, A., … Olson, J. M. (2018). Extensive Molecular and Clinical Heterogeneity in Patients With Histologically Diagnosed CNS-PNET Treated as a Single Entity: A Report From the Children’s Oncology Group Randomized ACNS0332 Trial. Journal of Clinical Oncology, 36(34), 3388–3395. https://doi.org/10.1200/jco.2017.76.4720. Epub ahead of print. PMID: 30332335.

## TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7