<a href="https://colab.research.google.com/github/stratis-forge/radiomics-workflows/blob/main/demo_CERR_Radiomics_on_IDC_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
This tutorial demonstrates radiomics calculations using STRATIS on a cohort of patients from the [NCI's IDC](https://portal.imaging.datacommons.cancer.gov/about/) repository. 

We extract features from CT scans of 10 non-small cell lung cancer (NSCLC) patients from the [Lung1](https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics) dataset.




## Requirements


* GNU Octave with `statistics`, `io` & `image` packages for Debian/Linux distributed via MSKCC Box.
* [CERR](https://github.com/cerr/CERR/tree/octave_dev)
* Python libraries bridging Octave & Python: `Oct2py` and `Octave_kernel` 

 *See [installation instructions for different operating systems](https://github.com/stratis-forge/installation-and-dependencies).*  
 
 *Note: Octave and CERR can be downloaded to Google Drive and [mounted](https://colab.research.google.com/notebooks/io.ipynb) to reduce runtime overheads.*

---


## Octave and CERR locations

In [1]:
# Specify paths
octave_path = '/content/octave'
cerr_path = '/content/CERR'

## Install dependencies

Uncomment `(Ctrl + /)`and evaluate the following cells  to download GNU Octave 
and CERR

In [2]:
%%capture
# # Download Octave
# oct_build_box = 'https://mskcc.box.com/shared/static/ylfkha0p66oc8v5kh2z1qx9m13n0ijcx.gz'
# oct_save_path = '/content/octave_7.3.0.tar.gz'
# ! wget {oct_build_box} -O {oct_save_path}
# ! tar xf {oct_save_path}
# ! rm {oct_save_path}

In [3]:
#%%capture
# Download CERR
# ! cd "$(dirname -- "$cerr_path")" && git clone --depth 1 --single-branch \
# --branch octave_dev https://www.github.com/cerr/CERR.git \
# && cd cerr_path && git checkout e888db94b74b2a1b409c9eac52110fe9b001a21e

In [4]:
#%%capture 
# # Download Octave dependencies
# ! apt-get update
# ! cd /usr/lib/x86_64-linux-gnu/ && ln -s libhdf5_serial.so.103 libhdf5_serial.so.100 && ln -s libreadline.so.8 libreadline.so.7
# ! apt-get install libgraphicsmagick++-q16-12 libcholmod3 libcxsparse3 \
# libumfpack5 libspqr2 libqrupdate1 libfftw3-3 libgfortran4 gnuplot openjdk-8-jdk

# # Install Oct2py package for Python-Octave communication
# ! pip3 install octave_kernel
# ! pip3 install oct2py==5.3.0

In [5]:
# # Set path to Octave exectuable 
import os, urllib.request, json
os.environ['OCTAVE_EXECUTABLE'] = octave_path + '/bin/octave-cli'
os.environ['PATH'] = octave_path + '/bin:' + os.environ['PATH']

# Enable Octave magic
%load_ext oct2py.ipython
from oct2py import octave

## Set up a GCP BigQuery project (Adapted from [IDC tutorials](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part1_prerequisites.ipynb))





1.   ####  A Google account  
  Login to existing Google account or see [instructions](https://accounts.google.com/signup/v2/webcreateaccount?dsh=308321458437252901&continue=https%3A%2F%2Faccounts.google.com%2FManageAccount&flowName=GlifWebSignIn&flowEntry=SignUp#FirstName=&LastName=) to create a new account.

2. #### Activate GCP and create a GCP project  
 
  *  Go to https://console.cloud.google.com/, and accept Terms and conditions.
  *  Click `Select a project` in the upper-left corner of the GCP console.
  *  Click `Create new project`.
  *  Open the GCP console menu icon `☰` and select `Dashboard` to display project information. Copy your `Project ID` and insert it in place of `REPLACE_ME_WITH_YOUR_PROJECT_ID` in the cell below.   


3. #### Add the bigquery-public-data project
  * Open the BigQuery console: https://console.cloud.google.com/bigquery, and click the `+ ADD DATA` button.
  * Select `Star project by name` and type in `bigquery-public-data` as the project name.


## Authentication for Google services

In [6]:
# Initialize Google Cloud Project ID
my_ProjectID = "REPLACE_ME_WITH_YOUR_PROJECT_ID " #Replace with project ID 

import os
os.environ["GCP_PROJECT_ID"] = my_ProjectID

In [7]:
from google.colab import auth
auth.authenticate_user()

## Download selected cohort

Query the `idc_current` dataset and output a dataframe with URLs for patients from the `Lung1` dataset.   

In [8]:
from google.cloud import bigquery

bq_client = bigquery.Client(my_ProjectID) # BigQuery client is initialized with 
                                          # user-input project ID  

selection_query = """
SELECT
  PatientID,
  StudyInstanceUID,
  SeriesInstanceUID,
  collection_id,
  gcs_url
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality IN ("CT",
    "RTSTRUCT")
  AND Collection_ID = "nsclc_radiomics"
"""
selection_result = bq_client.query(selection_query)
cohort_df = selection_result.result().to_dataframe()

#cohort_df
display(cohort_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51652 entries, 0 to 51651
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          51652 non-null  object
 1   StudyInstanceUID   51652 non-null  object
 2   SeriesInstanceUID  51652 non-null  object
 3   collection_id      51652 non-null  object
 4   gcs_url            51652 non-null  object
dtypes: object(5)
memory usage: 2.0+ MB


None

We will use CT scans and GTV segmentations from a random subset of 10 patients for this demonstration

In [9]:
from pandas import pandas
import random

num_sample = 10
all_patients = set(cohort_df["PatientID"])
sample_patients = random.sample(all_patients,num_sample)
selected_df = cohort_df[cohort_df["PatientID"].isin(sample_patients)]

display(selected_df)

since Python 3.9 and will be removed in a subsequent version.
  sample_patients = random.sample(all_patients,num_sample)


Unnamed: 0,PatientID,StudyInstanceUID,SeriesInstanceUID,collection_id,gcs_url
2,LUNG1-096,1.3.6.1.4.1.32722.99.99.4293513726905508119378...,1.3.6.1.4.1.32722.99.99.1755398672387103079081...,nsclc_radiomics,gs://idc-open-cr/0529b666-4081-4a7b-a2de-21dc0...
98,LUNG1-096,1.3.6.1.4.1.32722.99.99.4293513726905508119378...,1.3.6.1.4.1.32722.99.99.1755398672387103079081...,nsclc_radiomics,gs://idc-open-cr/5ff7ad1a-cce5-4d60-a363-75f16...
128,LUNG1-204,1.3.6.1.4.1.32722.99.99.1671472585921516437259...,1.3.6.1.4.1.32722.99.99.2942659339146695041486...,nsclc_radiomics,gs://idc-open-cr/9e3ba571-b75d-40c8-8350-f6605...
210,LUNG1-096,1.3.6.1.4.1.32722.99.99.4293513726905508119378...,1.3.6.1.4.1.32722.99.99.1755398672387103079081...,nsclc_radiomics,gs://idc-open-cr/23f5e1fe-2ccd-4126-b630-307b3...
256,LUNG1-130,1.3.6.1.4.1.32722.99.99.2247831625448944617268...,1.3.6.1.4.1.32722.99.99.1168237517724326299245...,nsclc_radiomics,gs://idc-open-cr/148abc96-b4ad-4480-9516-7d8eb...
...,...,...,...,...,...
50845,LUNG1-022,1.3.6.1.4.1.32722.99.99.2542014380649713731465...,1.3.6.1.4.1.32722.99.99.8681460013110355772969...,nsclc_radiomics,gs://idc-open-cr/46c4b7d0-43e3-44f4-a792-e83c4...
50851,LUNG1-204,1.3.6.1.4.1.32722.99.99.1671472585921516437259...,1.3.6.1.4.1.32722.99.99.2942659339146695041486...,nsclc_radiomics,gs://idc-open-cr/e66c9c2f-7011-4ed7-bb0b-0839d...
50903,LUNG1-239,1.3.6.1.4.1.32722.99.99.1027586798959164522298...,1.3.6.1.4.1.32722.99.99.4811146435637138599915...,nsclc_radiomics,gs://idc-open-cr/cb6d4fc1-09e4-4bd8-8ea9-1242d...
50923,LUNG1-096,1.3.6.1.4.1.32722.99.99.4293513726905508119378...,1.3.6.1.4.1.32722.99.99.1755398672387103079081...,nsclc_radiomics,gs://idc-open-cr/3d7c1be2-3888-4023-88f8-8a65c...


Save URLs into a manifest file for download using `gsutil` from the Google Cloud SDK*. 

*Note: Google Cloud SDK is pre-installed on Colab, but will need to be installed if downloading directly to your computer.* 

In [10]:
%%capture
#Save URLs to manifest file
selected_df["gcs_url"].to_csv("manifest.txt", header=False, index=False)

#Download images to /content/downloaded_cohort_files
!rm -rf downloaded_cohort_files && mkdir downloaded_cohort_files
!cat manifest.txt | gsutil -m cp -I downloaded_cohort_files

## Install *DICOMSort* to organize downloaded data



In [11]:
%%capture
!git clone https://github.com/pieper/dicomsort src/dicomsort
!pip install -r /content/src/dicomsort/requirements.txt

Organize DICOM files by patient ID and modality

In [12]:
%%capture
# Run  DICOMSort 
!python src/dicomsort/dicomsort.py -u /content/downloaded_cohort_files/ /content/organized_cohort_files/dicom/%PatientID/%SOPInstanceUID.dcm

# Delete temporary directory for unsorted DICOM data 
!rm -rf /content/downloaded_cohort_files/

## Add CERR to GNU Octave path and load required Octave packages

In [13]:
%octave_push cerr_path

In [14]:
%%capture
%%octave
# Load required Octave packages
pkg load statistics
pkg load image
pkg load io

# Add CERR to path
cd(cerr_path)
addToPath2(cerr_path)

## Import DICOM data to CERR's `planC` format

In [15]:
%%capture
%%octave
sourceDir = '/content/organized_cohort_files/dicom'  
destDir = '/content/organized_cohort_files/cerr/'  
zipFlag = 'No';
mergeFlag = 'No';
singleCerrFileFlag = 'No';
init_ML_DICOM;
batchConvertWithSubDirs(sourceDir,destDir,zipFlag,mergeFlag,singleCerrFileFlag);

# Extract radiomics features

In [16]:
%%octave

#Path to radiomics settings
#-- Extract CT radiomic features as defined in IBSI-1 
paramFileNameIBSI1 = fullfile(getCERRPath,'ModelImplementationLibrary,'RadiomicsModels',...
                          'Settings','paramsForIBSI1CtRadiomics.json'); 
#-- Extract first-order features from IBSI2-compatible filter response maps
paramFileNameIBSI2 = fullfile(getCERRPath,'ModelImplementationLibrary','RadiomicsModels',...
                          'Settings','paramsForCtRadiomicsWithIBSI2Filters.json');


strName = 'GTV-1';

#Extract features
destDir = '/content/organized_cohort_files/cerr/'  
[featIBSI1S,err1C] = batchExtractRadiomics(destDir,paramFileNameIBSI1)
[featIBSI2S,err2C] = batchExtractRadiomics(destDir,paramFileNameIBSI2);


#Write to CSV
outputFile1 = '/content/radiomicFeaturesIbsi1.csv'; 
outputFile2 = '/content/radiomicFeaturesIbsi2.csv'; 
selectField = ['struct_',strName];
selectField = strrep(selectField,'-','_');
featS = featIBSI1S.(selectField);
writeFeaturesToCSV(featS,outputFile1);
featS = featIBSI2S.(selectField);
writeFeaturesToCSV(featS,outputFile2);

#### View results

In [17]:
%octave_pull outputFile1 outputFile2

In [18]:
import pandas as pd

# Display IBSI1-compatible radiomics 
df = pd.read_csv(outputFile1)
df.head(10) # shows top 10 rows

Unnamed: 0,id,Shape_majorAxis,Shape_minorAxis,Shape_leastAxis,Shape_flatness,Shape_elongation,Shape_max3dDiameter,Shape_max2dDiameterAxialPlane,Shape_max2dDiameterSagittalPlane,Shape_max2dDiameterCoronalPlane,...,Original_ivhFeaturesS_MOCx90,Original_ivhFeaturesS_Vx10,Original_ivhFeaturesS_Vx20,Original_ivhFeaturesS_Vx30,Original_ivhFeaturesS_Vx40,Original_ivhFeaturesS_Vx50,Original_ivhFeaturesS_Vx60,Original_ivhFeaturesS_Vx70,Original_ivhFeaturesS_Vx80,Original_ivhFeaturesS_Vx90
0,Pt1,2.0522,1.9696,1.325,0.64562,0.95973,2.7423,2.2718,2.3356,2.4107,...,-113.24,0.96733,0.92187,0.875,0.83097,0.78196,0.70526,0.61577,0.52557,0.32244


In [20]:
# Display first-order statsitics from IBSI2-compatible filter responses
df = pd.read_csv(outputFile2)
# shows top 10 rows
df.head(10)

Unnamed: 0,id,Shape_majorAxis,Shape_minorAxis,Shape_leastAxis,Shape_flatness,Shape_elongation,Shape_max3dDiameter,Shape_max2dDiameterAxialPlane,Shape_max2dDiameterSagittalPlane,Shape_max2dDiameterCoronalPlane,...,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_totalEnergy,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_meanAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_medianAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_P10,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_P90,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_robustMeanAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_robustMedianAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_interQuartileRange,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_coeffDispersion,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_coeffVariation
0,Pt1,2.1212,2.0707,1.4201,0.66947,0.9762,2.8231,2.3707,2.3707,2.4352,...,633710000.0,72.594,72.174,263.63,497.16,50.073,49.98,118.45,0.15005,0.23639
