<a href="https://colab.research.google.com/github/stratis-forge/radiomics-workflows/blob/main/demo_CERR_Radiomics_on_IDC_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
This tutorial demonstrates radiomics calculations using STRATIS on a cohort of patients from the [NCI's IDC](https://portal.imaging.datacommons.cancer.gov/about/) repository. 

We extract features from CT scans of 10 non-small cell lung cancer (NSCLC) patients from the [Lung1](https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics) dataset.




## Requirements


* GNU Octave with `statistics`, `io` & `image` packages for Debian/Linux distributed via MSKCC Box.
* [CERR](https://github.com/cerr/CERR/tree/octave_dev)
* Python libraries bridging Octave & Python: `Oct2py` and `Octave_kernel` 

 *See [installation instructions for different operating systems](https://github.com/stratis-forge/installation-and-dependencies).*  
 
 *Note: Octave and CERR can be downloaded to Google Drive and [mounted](https://colab.research.google.com/notebooks/io.ipynb) to reduce runtime overheads.*

---


## Octave and CERR locations

In [1]:
# Specify paths
octave_path = '/content/octave'
cerr_path = '/content/CERR'

## Install dependencies

Uncomment `(Ctrl + /)`and evaluate the following cells  to download GNU Octave 
and CERR

In [2]:
%%capture
# # Download Octave
# oct_build_box = 'https://mskcc.box.com/shared/static/ylfkha0p66oc8v5kh2z1qx9m13n0ijcx.gz'
# oct_save_path = '/content/octave_7.3.0.tar.gz'
# ! wget {oct_build_box} -O {oct_save_path}
# ! tar xf {oct_save_path}
# ! rm {oct_save_path}

In [3]:
#%%capture
# # Download CERR
# ! cd "$(dirname -- "$cerr_path")" && git clone --depth 1 --single-branch \
# --branch octave_dev https://www.github.com/cerr/CERR.git \
# && cd cerr_path && git checkout e888db94b74b2a1b409c9eac52110fe9b001a21e

Cloning into 'CERR'...
remote: Enumerating objects: 3786, done.[K
remote: Counting objects: 100% (3786/3786), done.[K
remote: Compressing objects: 100% (3001/3001), done.[K
remote: Total 3786 (delta 897), reused 2838 (delta 641), pack-reused 0[K
Receiving objects: 100% (3786/3786), 418.87 MiB | 18.21 MiB/s, done.
Resolving deltas: 100% (897/897), done.
Updating files: 100% (3596/3596), done.
/bin/bash: line 0: cd: cerr_path: No such file or directory


In [4]:
#%%capture 
# # Download Octave dependencies
# ! apt-get update
# ! cd /usr/lib/x86_64-linux-gnu/ && ln -s libhdf5_serial.so.103 libhdf5_serial.so.100 && ln -s libreadline.so.8 libreadline.so.7
# ! apt-get install libgraphicsmagick++-q16-12 libcholmod3 libcxsparse3 \
# libumfpack5 libspqr2 libqrupdate1 libfftw3-3 libgfortran4 gnuplot openjdk-8-jdk

# # Install Oct2py package for Python-Octave communication
# ! pip3 install octave_kernel
# ! pip3 install oct2py==5.3.0

In [5]:
# # Set path to Octave exectuable 
import os, urllib.request, json
os.environ['OCTAVE_EXECUTABLE'] = octave_path + '/bin/octave-cli'
os.environ['PATH'] = octave_path + '/bin:' + os.environ['PATH']

# Enable Octave magic
%load_ext oct2py.ipython
from oct2py import octave

## Set up a GCP BigQuery project (Adapted from [IDC tutorials](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part1_prerequisites.ipynb))





1.   ####  A Google account  
  Login to existing Google account or see [instructions](https://accounts.google.com/signup/v2/webcreateaccount?dsh=308321458437252901&continue=https%3A%2F%2Faccounts.google.com%2FManageAccount&flowName=GlifWebSignIn&flowEntry=SignUp#FirstName=&LastName=) to create a new account.

2. #### Activate GCP and create a GCP project  
 
  *  Go to https://console.cloud.google.com/, and accept Terms and conditions.
  *  Click `Select a project` in the upper-left corner of the GCP console.
  *  Click `Create new project`.
  *  Open the GCP console menu icon `☰` and select `Dashboard` to display project information. Copy your `Project ID` and insert it in place of `REPLACE_ME_WITH_YOUR_PROJECT_ID` in the cell below.   


3. #### Add the bigquery-public-data project
  * Open the BigQuery console: https://console.cloud.google.com/bigquery, and click the `+ ADD DATA` button.
  * Select `Star project by name` and type in `bigquery-public-data` as the project name.


## Authentication for Google services

In [6]:
# Initialize Google Cloud Project ID
my_ProjectID = "REPLACE_ME_WITH_YOUR_PROJECT_ID " #Replace with project ID 

import os
os.environ["GCP_PROJECT_ID"] = my_ProjectID

In [7]:
from google.colab import auth
auth.authenticate_user()

## Download selected cohort

Query the `idc_current` dataset and output a dataframe with URLs for patients from the `Lung1` dataset.   

In [8]:
from google.cloud import bigquery

bq_client = bigquery.Client(my_ProjectID) # BigQuery client is initialized with 
                                          # user-input project ID  

selection_query = """
SELECT
  PatientID,
  StudyInstanceUID,
  SeriesInstanceUID,
  collection_id,
  gcs_url
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality IN ("CT",
    "RTSTRUCT")
  AND Collection_ID = "nsclc_radiomics"
"""
selection_result = bq_client.query(selection_query)
cohort_df = selection_result.result().to_dataframe()

#cohort_df
display(cohort_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51652 entries, 0 to 51651
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          51652 non-null  object
 1   StudyInstanceUID   51652 non-null  object
 2   SeriesInstanceUID  51652 non-null  object
 3   collection_id      51652 non-null  object
 4   gcs_url            51652 non-null  object
dtypes: object(5)
memory usage: 2.0+ MB


None

We will use CT scans and GTV segmentations from a random subset of 10 patients for this demonstration

In [9]:
from pandas import pandas
import random

num_sample = 10
all_patients = set(cohort_df["PatientID"])
sample_patients = random.sample(all_patients,num_sample)
selected_df = cohort_df[cohort_df["PatientID"].isin(sample_patients)]

display(selected_df)

since Python 3.9 and will be removed in a subsequent version.
  sample_patients = random.sample(all_patients,num_sample)


Unnamed: 0,PatientID,StudyInstanceUID,SeriesInstanceUID,collection_id,gcs_url
1,LUNG1-023,1.3.6.1.4.1.32722.99.99.7486199564895728038835...,1.3.6.1.4.1.32722.99.99.1724029149513556880669...,nsclc_radiomics,gs://idc-open-cr/eb4c012a-1204-4da3-8669-3f197...
101,LUNG1-112,1.3.6.1.4.1.32722.99.99.1049352317400962548827...,1.3.6.1.4.1.32722.99.99.5493778159864037311318...,nsclc_radiomics,gs://idc-open-cr/620f606d-487e-485c-998d-cfc75...
124,LUNG1-188,1.3.6.1.4.1.32722.99.99.3273357274711148574189...,1.3.6.1.4.1.32722.99.99.2407852723406492228554...,nsclc_radiomics,gs://idc-open-cr/5917e5bb-6100-4b2c-b75f-6cf15...
138,LUNG1-128,1.3.6.1.4.1.32722.99.99.2376717734365131368170...,1.3.6.1.4.1.32722.99.99.5451739623359018242391...,nsclc_radiomics,gs://idc-open-cr/25abc243-ea56-4d94-bb9e-b8e64...
189,LUNG1-383,1.3.6.1.4.1.32722.99.99.9008629664281541137327...,1.3.6.1.4.1.32722.99.99.9883740591383182586083...,nsclc_radiomics,gs://idc-open-cr/8d7c0ffc-5def-4f25-84ee-e4f14...
...,...,...,...,...,...
50936,LUNG1-214,1.3.6.1.4.1.32722.99.99.6659142950812631671418...,1.3.6.1.4.1.32722.99.99.3217785867892313959650...,nsclc_radiomics,gs://idc-open-cr/8f5853fa-7bbe-4760-984f-60adf...
50952,LUNG1-188,1.3.6.1.4.1.32722.99.99.3273357274711148574189...,1.3.6.1.4.1.32722.99.99.2407852723406492228554...,nsclc_radiomics,gs://idc-open-cr/9e5b6940-cad1-42a5-8e33-c42d9...
50964,LUNG1-023,1.3.6.1.4.1.32722.99.99.7486199564895728038835...,1.3.6.1.4.1.32722.99.99.1724029149513556880669...,nsclc_radiomics,gs://idc-open-cr/e1537c80-c5c3-48e9-80fa-61daa...
50994,LUNG1-383,1.3.6.1.4.1.32722.99.99.9008629664281541137327...,1.3.6.1.4.1.32722.99.99.9883740591383182586083...,nsclc_radiomics,gs://idc-open-cr/92f48245-64a4-4384-94b1-0b783...


Save URLs into a manifest file for download using `gsutil` from the Google Cloud SDK*. 

*Note: Google Cloud SDK is pre-installed on Colab, but will need to be installed if downloading directly to your computer.* 

In [10]:
%%capture
#Save URLs to manifest file
selected_df["gcs_url"].to_csv("manifest.txt", header=False, index=False)

#Download images to /content/downloaded_cohort_files
!rm -rf downloaded_cohort_files && mkdir downloaded_cohort_files
!cat manifest.txt | gsutil -m cp -I downloaded_cohort_files

## Install *DICOMSort* to organize downloaded data



In [11]:
%%capture
!git clone https://github.com/pieper/dicomsort src/dicomsort
!pip install -r /content/src/dicomsort/requirements.txt

Organize DICOM files by patient ID and modality

In [12]:
%%capture
# Run  DICOMSort 
!python src/dicomsort/dicomsort.py -u /content/downloaded_cohort_files/ /content/organized_cohort_files/dicom/%PatientID/%SOPInstanceUID.dcm

# Delete temporary directory for unsorted DICOM data 
!rm -rf /content/downloaded_cohort_files/

## Add CERR to GNU Octave path and load required Octave packages

In [13]:
%octave_push cerr_path

In [14]:
%%capture
%%octave
# Load required Octave packages
pkg load statistics
pkg load image
pkg load io

# Add CERR to path
cd(cerr_path)
addToPath2(cerr_path)

## Import DICOM data to CERR's `planC` format

In [15]:
%%capture
%%octave
sourceDir = '/content/organized_cohort_files/dicom'  
destDir = '/content/organized_cohort_files/cerr/'  
zipFlag = 'No';
mergeFlag = 'No';
singleCerrFileFlag = 'No';
init_ML_DICOM;
batchConvertWithSubDirs(sourceDir,destDir,zipFlag,mergeFlag,singleCerrFileFlag);

# Extract radiomics features

In [16]:
%%capture
%%octave

#Path to radiomics settings
#-- Extract CT radiomic features as defined in IBSI-1 
paramFileNameIBSI1 = fullfile(getCERRPath,'ModelImplementationLibrary','RadiomicsModels',...
                          'Settings','paramsForIBSI1CtRadiomics.json'); 
#-- Extract first-order features from IBSI2-compatible filter response maps
paramFileNameIBSI2 = fullfile(getCERRPath,'ModelImplementationLibrary','RadiomicsModels',...
                          'Settings','paramsForCtRadiomicsWithIBSI2Filters.json');


strName = 'GTV-1';

#Extract features
destDir = '/content/organized_cohort_files/cerr/'  
[featIBSI1S,err1C] = batchExtractRadiomics(destDir,paramFileNameIBSI1)
[featIBSI2S,err2C] = batchExtractRadiomics(destDir,paramFileNameIBSI2);


#Write to CSV
outputFile1 = '/content/radiomicFeaturesIbsi1.csv'; 
outputFile2 = '/content/radiomicFeaturesIbsi2.csv'; 
selectField = ['struct_',strName];
selectField = strrep(selectField,'-','_');
featS = [featIBSI1S.(selectField)];
writeFeaturesToCSV(featS,outputFile1);
featS = [featIBSI2S.(selectField)];
writeFeaturesToCSV(featS,outputFile2);

#### View results

In [17]:
%octave_pull outputFile1 outputFile2

In [18]:
import pandas as pd

# Display IBSI1-compatible radiomics 
df = pd.read_csv(outputFile1)
df.head(10) # shows top 10 rows

Unnamed: 0,id,Shape_majorAxis,Shape_minorAxis,Shape_leastAxis,Shape_flatness,Shape_elongation,Shape_max3dDiameter,Shape_max2dDiameterAxialPlane,Shape_max2dDiameterSagittalPlane,Shape_max2dDiameterCoronalPlane,...,Original_ivhFeaturesS_MOCx90,Original_ivhFeaturesS_Vx10,Original_ivhFeaturesS_Vx20,Original_ivhFeaturesS_Vx30,Original_ivhFeaturesS_Vx40,Original_ivhFeaturesS_Vx50,Original_ivhFeaturesS_Vx60,Original_ivhFeaturesS_Vx70,Original_ivhFeaturesS_Vx80,Original_ivhFeaturesS_Vx90
0,Pt1,4.1278,2.6151,2.2414,0.543,0.63354,4.5646,4.0618,3.2565,4.4792,...,-13.056,0.99002,0.97922,0.96088,0.94438,0.8857,0.48166,0.027099,0.00652,0.002445
1,Pt2,11.884,8.9577,8.5502,0.71946,0.75375,13.245,13.076,13.147,10.851,...,25.808,0.9912,0.98087,0.9689,0.95441,0.93473,0.67382,0.01268,0.003951,0.001736
2,Pt3,3.1185,2.0705,1.9923,0.63889,0.66396,3.6434,2.7081,3.5358,3.5019,...,55.291,0.99858,0.99716,0.99574,0.99183,0.98863,0.97974,0.96731,0.70185,0.01919
3,Pt4,1.8146,1.5612,1.2366,0.68147,0.86036,2.0997,1.8015,2.03,1.7198,...,-118.74,0.9621,0.90587,0.84841,0.79829,0.70905,0.61369,0.48166,0.19315,0.018337
4,Pt5,6.7549,3.9893,3.7535,0.55567,0.59058,8.4657,7.4026,7.9612,4.966,...,-25.704,0.98085,0.95509,0.92783,0.89191,0.81913,0.35764,0.019382,0.004613,0.001584
5,Pt6,4.8503,3.551,2.297,0.47358,0.73213,5.8994,5.0414,4.9588,3.7859,...,-30.463,0.99315,0.98294,0.96298,0.9392,0.89557,0.81608,0.55686,0.11708,0.003133
6,Pt7,12.833,9.5395,6.2367,0.486,0.74336,14.872,11.53,12.627,14.825,...,9.318,0.99962,0.99905,0.99796,0.99581,0.98105,0.78015,0.061853,0.000906,5.7e-05
7,Pt8,3.5576,1.6615,1.1609,0.32631,0.46702,3.5573,1.8961,3.5358,3.323,...,-107.95,0.9649,0.93789,0.89829,0.85509,0.77318,0.68137,0.55986,0.28983,0.029703
8,Pt9,11.358,9.3159,6.9069,0.60811,0.82021,13.312,11.836,12.733,12.021,...,24.663,0.99972,0.99893,0.99749,0.99498,0.99014,0.9684,0.006976,0.000492,6.9e-05


In [20]:
# Display first-order statsitics from IBSI2-compatible filter responses
df = pd.read_csv(outputFile2)
# shows top 10 rows
df.head(10)

Unnamed: 0,id,Mean_kernelSize555_voxelSize_mm111_firstOrderS_min,Mean_kernelSize555_voxelSize_mm111_firstOrderS_max,Mean_kernelSize555_voxelSize_mm111_firstOrderS_mean,Mean_kernelSize555_voxelSize_mm111_firstOrderS_range,Mean_kernelSize555_voxelSize_mm111_firstOrderS_std,Mean_kernelSize555_voxelSize_mm111_firstOrderS_var,Mean_kernelSize555_voxelSize_mm111_firstOrderS_median,Mean_kernelSize555_voxelSize_mm111_firstOrderS_skewness,Mean_kernelSize555_voxelSize_mm111_firstOrderS_kurtosis,...,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_totalEnergy,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_meanAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_medianAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_P10,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_P90,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_robustMeanAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_robustMedianAbsDev,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_interQuartileRange,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_coeffDispersion,Gabor_voxSz111mm_Sigma5mm_AR1_5_wavLen2mm_OrientAvg_22_54567_590112_5135157_5180202_5225247_5270292_5315337_5360_firstOrderS_coeffVariation
0,Pt1,-866.47,199.13,-30.779,1065.6,115.59,13361.0,5.5555,-2.9605,10.851,...,176770000.0,45.606,42.963,37.646,174.29,29.183,28.333,68.115,0.40757,0.65942
1,Pt2,-771.69,406.46,16.889,1178.2,111.16,12357.0,49.251,-3.1694,11.592,...,10156000000.0,50.699,41.408,50.868,200.43,23.539,20.999,40.651,0.25691,0.75119
2,Pt3,-367.06,132.13,56.45,499.19,47.3,2237.3,65.644,-4.5657,25.031,...,43250000.0,10.586,10.053,58.12,87.62,5.2144,5.1888,11.877,0.085917,0.23375
3,Pt4,-650.92,33.118,-176.89,684.03,141.44,20006.0,-163.24,-0.48187,-0.54373,...,221720000.0,32.287,32.284,251.39,355.82,22.329,22.326,53.248,0.08726,0.13344
4,Pt5,-846.69,183.26,-64.414,1030.0,158.83,25225.0,8.4727,-1.8631,3.0199,...,2691800000.0,99.159,92.808,52.565,348.35,71.342,68.023,165.47,0.55038,0.74896
5,Pt6,-597.32,187.92,-45.294,785.24,101.09,10219.0,-13.117,-1.5696,2.5982,...,584340000.0,65.538,63.882,44.563,249.57,47.369,46.797,114.96,0.46857,0.61548
6,Pt7,-679.84,186.46,13.188,866.3,37.516,1407.5,22.912,-4.3537,34.132,...,2026100000.0,22.352,18.651,24.472,68.135,6.9409,6.8617,15.419,0.2042,0.8747
7,Pt8,-733.24,46.678,-218.74,779.92,165.21,27294.0,-195.44,-0.58162,-0.31782,...,689950000.0,70.808,70.272,334.36,564.9,51.243,50.982,123.4,0.14081,0.19636
8,Pt9,-724.75,328.48,24.721,1053.2,36.837,1357.0,32.099,-5.4911,42.519,...,2363300000.0,30.324,22.521,27.985,108.23,10.683,8.9671,12.218,0.16228,0.91321
