# TUTORIAL

## Setup 

By default the client will use the Public API with the Guest Login credentials.
from the nbia which has access to all public data. 

To use your own credentials you can pass them in as parameters to the client using:

 `NBIAClient(username="YOUR_USERNAME", password="YOUR_PASSWORD")`

In [3]:
# Install nbiatoolkit using pip

!pip install nbiatoolkit;



In [1]:
from nbiatoolkit import NBIAClient
from pprint import pprint

# Instantiate the client. 
client = NBIAClient(log_level='info')


In [3]:
# import nbiatoolkit 
# TODO::implement method forprint(nbiatoolkit.__version__)

## Get Collection Methods

### get list of collection (names only)
`getCollections(prefix: str = "")`

In [4]:
collections = client.getCollections()
print("Total collections: ", len(collections))
print(collections[0:5])

Total collections:  123
['4D-Lung', 'ACRIN-6698', 'ACRIN-Contralateral-Breast-MR', 'ACRIN-FLT-Breast', 'ACRIN-NSCLC-FDG-PET']


In [5]:
collections = client.getCollections(prefix = "TCGA")
print(collections)

['TCGA-BLCA', 'TCGA-BRCA', 'TCGA-CESC', 'TCGA-COAD', 'TCGA-ESCA', 'TCGA-KICH', 'TCGA-KIRC', 'TCGA-KIRP', 'TCGA-LIHC', 'TCGA-LUAD', 'TCGA-LUSC', 'TCGA-OV', 'TCGA-PRAD', 'TCGA-READ', 'TCGA-SARC', 'TCGA-STAD', 'TCGA-THCA', 'TCGA-UCEC']


### get Counts of Patients for each collection
`getCollectionPatientCount(prefx: str = "")`

In [6]:
collectionsPatientCount = client.getCollectionPatientCount()
pprint(collectionsPatientCount[0:5])

[{'Collection': '4D-Lung', 'PatientCount': 20},
 {'Collection': 'ACRIN-6698', 'PatientCount': 385},
 {'Collection': 'ACRIN-Contralateral-Breast-MR', 'PatientCount': 984},
 {'Collection': 'ACRIN-FLT-Breast', 'PatientCount': 83},
 {'Collection': 'ACRIN-NSCLC-FDG-PET', 'PatientCount': 242}]


In [7]:
collectionsPatientCount = client.getCollectionPatientCount(prefix="TCGA")
pprint(collectionsPatientCount)

# get the collection with max PatientCount
print("Collection with max PatientCount: ", max(collectionsPatientCount, key=lambda x:x['PatientCount']))

[{'Collection': 'TCGA-BLCA', 'PatientCount': 120},
 {'Collection': 'TCGA-BRCA', 'PatientCount': 139},
 {'Collection': 'TCGA-CESC', 'PatientCount': 54},
 {'Collection': 'TCGA-COAD', 'PatientCount': 25},
 {'Collection': 'TCGA-ESCA', 'PatientCount': 16},
 {'Collection': 'TCGA-KICH', 'PatientCount': 15},
 {'Collection': 'TCGA-KIRC', 'PatientCount': 267},
 {'Collection': 'TCGA-KIRP', 'PatientCount': 33},
 {'Collection': 'TCGA-LIHC', 'PatientCount': 97},
 {'Collection': 'TCGA-LUAD', 'PatientCount': 69},
 {'Collection': 'TCGA-LUSC', 'PatientCount': 37},
 {'Collection': 'TCGA-OV', 'PatientCount': 143},
 {'Collection': 'TCGA-PRAD', 'PatientCount': 14},
 {'Collection': 'TCGA-READ', 'PatientCount': 3},
 {'Collection': 'TCGA-SARC', 'PatientCount': 5},
 {'Collection': 'TCGA-STAD', 'PatientCount': 46},
 {'Collection': 'TCGA-THCA', 'PatientCount': 6},
 {'Collection': 'TCGA-UCEC', 'PatientCount': 65}]
Collection with max PatientCount:  {'Collection': 'TCGA-KIRC', 'PatientCount': 267}


### get Counts of Patients grouped by Body Parts
`getBodyPartCounts(collection: str = "", modality: str = "")`

In [8]:
bodypart_count = client.getBodyPartCounts()
print("Total Number of body parts:" + str(len(bodypart_count)))

print("First 5 body parts:")
pprint(bodypart_count[0:5])

Total Number of body parts:60
First 5 body parts:
[{'BodyPartExamined': 'NOT SPECIFIED', 'Count': '7839'},
 {'BodyPartExamined': 'ABDOMEN', 'Count': '1731'},
 {'BodyPartExamined': 'ABDOMEN CAVIT', 'Count': '2'},
 {'BodyPartExamined': 'ABDOMENPELVIC', 'Count': '2'},
 {'BodyPartExamined': 'ABDOMENPELVIS', 'Count': '50'}]


In [9]:
bodypart_count = client.getBodyPartCounts(Collection = '4D-Lung')
print("Total Number of body parts:" + str(len(bodypart_count)))

print("Number of patients for each body part in 4D-Lung collection:")
pprint(bodypart_count)

Total Number of body parts:1
Number of patients for each body part in 4D-Lung collection:
[{'BodyPartExamined': 'LUNG', 'Count': '20'}]


In [10]:
bodypart_count = client.getBodyPartCounts(Collection = 'NSCLC Radiogenomics', Modality='CT')
print("Total Number of body parts:" + str(len(bodypart_count)))

print("Number of patients for each body part in 4D-Lung collection:")
pprint(bodypart_count)

Total Number of body parts:5
Number of patients for each body part in 4D-Lung collection:
[{'BodyPartExamined': 'NOT SPECIFIED', 'Count': '194'},
 {'BodyPartExamined': 'ABDOMEN', 'Count': '11'},
 {'BodyPartExamined': 'CHEST', 'Count': '54'},
 {'BodyPartExamined': 'HEART', 'Count': '2'},
 {'BodyPartExamined': 'THORAX', 'Count': '1'}]


## Get Patient Methods

#### get Patient IDs by Collection and Modality
`getPatientData(collection: str, modality: str)`

In [11]:
patients = client.getPatients(Collection="4D-Lung", Modality="CT")
print(patients)

['100_HM10395', '101_HM10395', '102_HM10395', '103_HM10395', '104_HM10395', '105_HM10395', '106_HM10395', '107_HM10395', '108_HM10395', '109_HM10395', '110_HM10395', '111_HM10395', '112_HM10395', '113_HM10395', '114_HM10395', '115_HM10395', '116_HM10395', '117_HM10395', '118_HM10395', '119_HM10395']


## Get Series Methods

### get Series Data using parameters

``` python
getSeries(
        Collection: str = "", 
        PatientID: str = "",
        StudyInstanceUID: str = "",
        Modality: str = "",
        SeriesInstanceUID: str = "",
        BodyPartExamined: str = "",
        ManufacturerModelName: str = "",
        Manufacturer: str = "") 
```

In [12]:
# Get all the series in the NSCLC Radiogenomics collection
seriesJSON = client.getSeries(Collection="NSCLC Radiogenomics")
print(f"There are {len(seriesJSON)} series in the NSCLC Radiogenomics collection.")
print("First series:")
pprint(seriesJSON[0])

There are 1351 series in the NSCLC Radiogenomics collection.
First series:
{'Collection': 'NSCLC Radiogenomics',
 'CollectionURI': 'https://doi.org/10.7937/K9/TCIA.2017.7hs46erv',
 'FileSize': 146022934,
 'ImageCount': 277,
 'LicenseName': 'Creative Commons Attribution 3.0 Unported License',
 'LicenseURI': 'http://creativecommons.org/licenses/by/3.0/',
 'Manufacturer': 'GE MEDICAL SYSTEMS',
 'ManufacturerModelName': 'LightSpeed16',
 'Modality': 'CT',
 'PatientID': 'R01-016',
 'ProtocolName': '5.1 CT CHEST ROUTINE',
 'SeriesDate': '1990-10-16 00:00:00.0',
 'SeriesDescription': 'Recon 3: CT CHEST W/O',
 'SeriesInstanceUID': '1.3.6.1.4.1.14519.5.2.1.4334.1501.447728746190490431768087181101',
 'SeriesNumber': 4,
 'SoftwareVersions': 'LightSpeedverrel',
 'StudyInstanceUID': '1.3.6.1.4.1.14519.5.2.1.4334.1501.162279367121844086709629964588',
 'TimeStamp': '2017-12-12 12:58:26.0'}


In [13]:
# Get all the series in the NSCLC Radiogenomics collection for a given body part
seriesbyPatientJSON = client.getSeries(Collection="NSCLC Radiogenomics", BodyPartExamined="HEART")
print(f"There are {len(seriesbyPatientJSON)} series in the NSCLC Radiogenomics collection for HEART.")
print("First series:")
pprint(seriesbyPatientJSON[0])

There are 8 series in the NSCLC Radiogenomics collection for HEART.
First series:
{'BodyPartExamined': 'HEART',
 'Collection': 'NSCLC Radiogenomics',
 'CollectionURI': 'https://doi.org/10.7937/K9/TCIA.2017.7hs46erv',
 'FileSize': 196233190,
 'ImageCount': 372,
 'LicenseName': 'Creative Commons Attribution 3.0 Unported License',
 'LicenseURI': 'http://creativecommons.org/licenses/by/3.0/',
 'Manufacturer': 'SIEMENS',
 'Modality': 'CT',
 'PatientID': 'AMC-015',
 'ProtocolName': 'GATED_CHEST_CTA',
 'SeriesDate': '1992-02-04 00:00:00.0',
 'SeriesDescription': 'Gated Chest  1.0  B25f  BestDiast 70 %',
 'SeriesInstanceUID': '1.3.6.1.4.1.14519.5.2.1.4334.1501.253298261882254993527951068007',
 'SeriesNumber': 5,
 'SoftwareVersions': 'syngo CT 2008G',
 'StudyInstanceUID': '1.3.6.1.4.1.14519.5.2.1.4334.1501.119531128953610472040332469413',
 'TimeStamp': '2017-12-12 13:58:34.0'}


## download Series Methods
``` python
downloadSeries(
    SeriesInstanceUID: Union[str, list],
    downloadDir: str,
    filePattern: str,
    overwrite: bool,
    nParallel: int)
```

In [2]:
# Get all the series in the NSCLC Radiogenomics collection
seriesJSON = client.getSeries(Collection="NSCLC Radiogenomics")

# first get a list of the SeriesInstanceUIDs
seriesUIDS = [series['SeriesInstanceUID'] for series in seriesJSON]
pprint(seriesUIDS[0:5])

['1.3.6.1.4.1.14519.5.2.1.4334.1501.447728746190490431768087181101',
 '1.3.6.1.4.1.14519.5.2.1.4334.1501.322979353264523657170838529817',
 '1.3.6.1.4.1.14519.5.2.1.4334.1501.240772414306783411390403639800',
 '1.3.6.1.4.1.14519.5.2.1.4334.1501.173425581038606308807499332185',
 '1.3.6.1.4.1.14519.5.2.1.4334.1501.137861564599845711151768978695']


In [3]:
# call client.downloadSeries() on each SeriesInstanceUID
import os
downloadDir = "./data"
os.makedirs(downloadDir, exist_ok=True)
print("Downloading to: " + os.path.abspath(downloadDir))

cores = 4   # number of parallel downloads
client.downloadSeries(
    seriesUIDS[0:5], downloadDir, overwrite=True, nParallel=cores)
    
pprint(os.listdir(downloadDir))
    

Downloading to: /Users/bhklab/Documents/GitHub/NBIA-toolkit/docs/data


Downloading 5 series:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading 5 series: 100%|██████████| 5/5 [00:26<00:00,  5.31s/it]

['R01-016',
 'P100',
 'R01-010',
 'R01-019',
 'R01-044',
 'R01-022',
 'UnknownCollection']





### Configure File names during download

Due to the unique nature of the data in NBIA, the file names are not always consistent.

To configure the file names during download you can pass in a parameter called `filePattern` to the `downloadSeries` method which is used by the `DICOMSorter`. For more information on how to configure the `filePattern` see the `nbiatoolkit.DICOMSorter()` class.

The filePattern is a string of DICOM tags indicated by a `%` that are extracted from each DICOM file metadata and used to create the file name: 
- i.e `%PatientName%_%SeriesInstanceUID%.dcm` will create a file name with the PatientName and SeriesInstanceUID.
  - note: the UIDs will be shortened to the final 5 characters to avoid long file names.

The default filePattern is : `%PatientName/%StudyDescription-%StudyDate/%SeriesNumber-%SeriesDescription-%SeriesInstanceUID/%InstanceNumber.dcm`. This will create the following tree structure:

``` json
PatientName
└── StudyDescription-StudyDate
    └── SeriesNumber-SeriesDescription-SeriesInstanceUID
        └── InstanceNumber.dcm
```

In [5]:
client.downloadSeries(
    seriesUIDS[0:5], 
    downloadDir, 
    filePattern="%PatientName/%SeriesNumber-%SeriesInstanceUID/%InstanceNumber-%SOPInstanceUID.dcm",
    overwrite=True, nParallel=5)

Downloading 5 series: 100%|██████████| 5/5 [00:18<00:00,  3.69s/it]


True