# Radiomics Features Extraction

In this notebook, we will be extracting radiomics features for prostate lesions on all mpMRI modalities. These lesions have been manually segmented by radiologists, and we will be using separate settings for each feature extractor.

Radiomics is the process of extracting quantitative features from medical imaging data. These features can provide important information about the characteristics of a lesion, and can be used for a variety of purposes, including diagnosis, prognosis, and treatment planning.

Before we begin, it's important to note that the process of extracting radiomics features can be computationally intensive, so please be patient as the scripts in this notebook run.

Now, let's get started by setting up our environment and importing the necessary libraries.



In [None]:
import sys 
import os 
from config import config # For reading the config files

# Setup the notebook
%load_ext autoreload
%autoreload 2

# Add src to path

if os.path.basename(os.getcwd()) != 'ai4ar-radiomics':
    os.chdir('..')

if 'src' not in sys.path:
    sys.path.append('src')

cfg = config(
    ('json', 'config/config.json', True),
    ('json', 'config/config-ext.json', True), 
    ('json', 'config/radiomics-test.json', True), 
    ignore_missing_paths = True
)

In [None]:
import ai4ar # AI4AR Helper package
from extractor_utils import construct_feature_extractor, extract # Extractor utils from src folder

import pandas as pd # For saving the features

## Dataset initialization

In [None]:
dataset = ai4ar.Dataset(cfg['data_dir'])

In [None]:
# Clinical metadata
dataset[dataset.case_ids[0]].clinical_metadata()

In [None]:
# Radiological metadata
dataset[dataset.case_ids[0]].radiological_metadata()

## Feature Extraction

### Construct the extractors

Extractors are created based on the config/radiomics-test.json configuration

In [None]:

# Possible feature class names =  ['firstorder', 'glcm', 'gldm', 'glrlm', 'glszm', 'ngtdm', 'shape', 'shape2D']

extractors = {}

for modality in cfg['radiomics.settings.extractor'].keys():
    extractors[modality] = construct_feature_extractor(cfg['radiomics.settings.extractor'][modality])


### Extract the radiomics feature

Create the jobs (copies of radiological_metadata dataframe) with info about proper mask and image paths

In [None]:

jobs_dfs = {}

# Create jobs for modalities in the dataset with proper extractor and store them in a dictionary of dataframes
for modality in extractors.keys():    
    # Create a dataframe with the jobs for this modality
    jobs_dfs[modality] = dataset.radiological_metadata[['patient_id', 'lesion_id', 'radiologist_id', 'label_'+modality]].copy()
    # Rename the label column to mask_path 
    jobs_dfs[modality].rename(columns={'label_'+modality: 'mask_path'}, inplace=True)
    # Add the data path column
    jobs_dfs[modality]['data_path'] = 'data/'+modality
    
    # Drop rows with no mask
    jobs_dfs[modality].dropna(subset=['mask_path'], inplace=True)
    
    # If dataset is empty, remove it
    if jobs_dfs[modality].empty:
        del jobs_dfs[modality]
        continue


Extract the features and dump the results for each modality to the tmp dir

In [None]:
# Target directory for the features
floc_dir = os.path.join('.', dataset.tmp_dir)

for modality, jobs_df in jobs_dfs.items():
    floc = os.path.join(floc_dir, f'radiomics_{modality}.csv')
    
    if not os.path.exists(floc):
        print(f'Extracting features for {modality}')
        features = extract(dataset, extractors[modality], jobs_df, n_jobs=4)
        print(f'Features for {modality} extracted, saving')
        
        # Save the not none features
        pd.DataFrame([f for f in features if f is not None]).to_csv(floc, index=False)
        
        # Report the number of cases with no features
        print('Failed features')
        print(jobs_df.loc[[f is None for f in features]][['patient_id', 'lesion_id', 'radiologist_id']])
    else:
        print(f'Features for {modality} already extracted, skipping')
        

In [None]:
import pandas as pd
# Visualize features 


pd.read_csv(os.path.join(dataset.tmp_dir, 'radiomics_t2w.csv')).head()