You can download and run this notebook locally, or you can run it for free in a cloud environment using Colab or Sagemaker Studio Lab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/TCIA_Series_UID_Report.ipynb)

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Series_UID_Report.ipynb)

# Summary

This notebook can be used to generate summary reports to help understand the contents of a TCIA manifest or a spreadsheet/list of TCIA Series Instance UIDs.  

[TCIA manifest files are used with the NBIA Data Retriever](https://wiki.cancerimagingarchive.net/x/egOnAg) to download DICOM data from TCIA.  Manifest files to download full collections can be found on their respective homepages.  Custom manifests can also be created via our search portal at https://nbia.cancerimagingarchive.net.

It is also possible to use the Export Metadata function on the "cart" page of https://nbia.cancerimagingarchive.net or use the [REST API](https://wiki.cancerimagingarchive.net/x/NIIiAQ) to create spreadsheets or lists of Series Instance UIDs of interest.

This notebook will provide a series-level metadata report and then help you prepare the data for use by the **reportCollectionSummary()** or **reportDoiSummary()** functions in **tcia_utils**, which summarize the data by ingesting the Series Instance UIDs (series_data) and returning the following:

    - Modalities: List of unique values by collection
    - Licenses: List of unique values by collection
    - Manufacturers: List of unique values in the collection
    - Body Parts: List of unique values by collection
    - Subjects: Number of subjects by collection
    - Studies: Number of studies by collection
    - Series: Number of series by collection
    - Images: Number of images by collection
    - Disk Space: Formatted as KB/MB/GB/TB/PB by collection

Parameters:

    series_data: The input data to be summarized (expects JSON by default).
    input_type: Defaults to dataframe if not populated.  
                Set to 'list' for python list, or 'manifest' for *.TCIA manifest file.
                If manifest is used, series_data should be the path to the TCIA manifest file.
    format (str): Output format (default is dataframe, 'csv' for CSV file, 'chart' for charts).
    api_url: Only necessary if input_type = list or manifest.
            Set to 'restricted' for limited-access collections or
            'nlst' for National Lung Screening trial.

# 1 Setup

Install the latest release of [**tcia_utils**](https://pypi.org/project/tcia-utils/) and import the modules we need.

In [None]:
import sys

# install tcia utils
!{sys.executable} -m pip install --upgrade -q tcia_utils

In [None]:
import requests
import pandas as pd
from tcia_utils import nbia

# set logging level to INFO in Google Colab (not necessary in Jupyter)
if 'google.colab' in sys.modules:
  import logging

  for handler in logging.root.handlers[:]:
      logging.root.removeHandler(handler)

  # Set handler with level = info
  logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s',
                      level=logging.INFO)

  print("Google Colab Logging set to INFO")

# 2 Create a Token (optional)
If you're working with any restricted collections, you must enter your TCIA login/password to create a token.  If not, you can skip this step.




In [None]:
nbia.getToken()

# 3 Prepare your Series UIDs

To import a file to Colab from your hard drive, use the menu on the left sidebar to upload it and then proceed to the next section.

To import a file from the web (e.g. TCIA), use the command in the next cell by updating it with the URL of the file you want to analyze.  



In [None]:
# OPTIONAL: only needed to directly import your UID file from the web
url = "https://cancerimagingarchive.net/wp-content/uploads/YourManifest.tcia"
downloaded_filename = "YourManifest.tcia"

manifest = requests.get(url)
with open(downloaded_filename, 'wb') as f:
    f.write(manifest.content)

Next we'll read in the UIDs from your file into a python list.  If you're using a manifest file, the code below will put the Series UIDs into a list while ignoring the parameter text.  

If you're using a custom text/csv file of UIDs it will insert all rows into the list.  You must verify the file is formatted correctly **(one UID per row with no column header or commas)** or you may encounter errors.

In [None]:
# enter manifest path/filename
manifest = "YourManifest.tcia"

# converts manifest to list of UIDs
uids = nbia.manifestToList(manifest)

print("Your data has been imported.")


# 4 Download series metadata

Using the next step you can create a dataframe and save **series_metadata.csv** containing the Collection Name, Subject ID, Study UID, Study Description, Study Date, Series UID, Series Description, Series Number, Number of Images, File Size (Bytes), Modality, Manufacturer, Data Description URI (DOI), 3rd party analysis status, License Name, and License URL for each series.

**Note:** Due to its size (> 26,000 patients!) the [National Lung Screening Trial](https://doi.org/10.7937/TCIA.HMQ8-J677) resides on a separate server.  If you'd like to create a report about this collection use the 2nd option below.

In [None]:
# use empty api_url to default to main nbia server
api_url = ""

df = nbia.getSeriesList(uids)
df = nbia.formatSeriesInput(df, input_type = "df", api_url = api_url)

display(df)
df.to_csv("series_metadata.csv")

In [None]:
# set api_url for NLST server
api_url = "nlst"

df = nbia.getSeriesList(uids, api_url = api_url)
df = nbia.formatSeriesInput(df, input_type = "df", api_url = api_url)

display(df)
df.to_csv("series_metadata.csv")

# Create the summary report
Now we can use the metadata we've downloaded to create the summary report.  Options are provided for creating a report to divide things up by collection or by DOI.

In [None]:
nbia.reportCollectionSummary(df, format = "chart")

DOI-based reports are particularly useful when trying to understand manifests or series UID lists that contain [Analysis Result datasets](https://www.cancerimagingarchive.net/tcia-analysis-results/).

In [None]:
nbia.reportDoiSummary(df, format = "chart")

# Acknowledgements
TCIA is funded by the [Cancer Imaging Program (CIP)](https://imaging.cancer.gov/), a part of the United States [National Cancer Institute (NCI)](https://www.cancer.gov/), and is managed by the [Frederick National Laboratory for Cancer Research (FNLCR)](https://frederick.cancer.gov/).

This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/).  If you leverage this notebook or any TCIA datasets in your work, please be sure to comply with the [TCIA Data Usage Policy](https://wiki.cancerimagingarchive.net/x/c4hF). In particular, make sure to cite the DOI(s) for the specific TCIA datasets you used in addition to the following paper!

## TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7