<a href="https://colab.research.google.com/github/retico/cmepda_medphys/blob/master/L5_code/Lecture5_demo1_extract_features_pyradiomics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extracting features from a segmented lesion with PyRadiomics

This demo shows how to extract intensity and shape based features form segmented masses in mammography using the [PyRadiomics](https://www.radiomics.io/pyradiomics.html) python package.
A number of mammography images and mass segmentatation masks are available in the [shared folder on Drive](https://drive.google.com/drive/folders/1YqK7ZkM-P2IrqfD7Pj-SCmjz-GWd_1-Y )

in IMAGES/Mammography_masses/


# Reading data from Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
!unzip -q /content/gdrive/MyDrive/CMEPDA_MedPhys_datasets/IMAGES/Mammography_masses/small_sample_Im_segmented_ref.zip -d /content/

dataset_path = "/content/small_sample_Im_segmented_ref"

In [None]:
!ls /content/small_sample_Im_segmented_ref

# Overview of the dataset.
We use [Pillow](https://pypi.org/project/Pillow/), a Python Imaging Library which adds image processing capabilities to the Python interpreter

In [None]:
import os
import PIL

In [None]:
PIL.Image.open(os.path.join(dataset_path, "0008p1_3_1_2_resized.pgm"))

In [None]:
PIL.Image.open(os.path.join(dataset_path, "0008p1_3_1_2_mass_mask.pgm"))

# Install PyRadiomics

In [None]:
pip list

In [None]:
!pip install pyradiomics

# Use PyRadiomics for feature extraction

In [None]:
import os
import numpy as np
import six
from radiomics import featureextractor, getFeatureClasses
import radiomics

The feature extractor handles preprocessing, and then calls the needed featureclasses to calculate the features.

In [None]:
featureClasses = getFeatureClasses()

In [None]:
featureClasses

We have to initialize the feature extractor and we can customize extraction settings. We will use a configuration file: Params_tol_0_0001.yaml

In [None]:
!ls /content/gdrive/MyDrive/Colab\ Notebooks/CMEPDA_MedPhys/Params_tol_0_0001.yaml

In [None]:
!!cp /content/gdrive/MyDrive/Colab\ Notebooks/CMEPDA_MedPhys/Params_tol_0_0001.yaml /content/.

In [None]:
!head /content/Params_tol_0_0001.yaml

In [None]:
paramPath = 'Params_tol_0_0001.yaml'

In [None]:
extractor = featureextractor.RadiomicsFeatureExtractor(paramPath)
verbose = True
if verbose:
    print('Extraction parameters:\n\t', extractor.settings)
    print('Enabled filters:\n\t', extractor.enabledImagetypes)
    print('Enabled features:\n\t', extractor.enabledFeatures)

Input images: by default, and according to our settings, only the 'original' (no filtered) images are enabled. Optionally different image types can be enabled, e.g.

In [None]:
extractor.enableImageTypeByName('Wavelet')

To check the enabled input images

In [None]:
for imageType in extractor.enabledImagetypes.keys():
    print('\t' + imageType)

To disable all feature classes

In [None]:
extractor.disableAllFeatures()

To enable all features in firstorder

In [None]:
extractor.enableFeatureClassByName('firstorder')

Alternatively: only enable 'Mean' and 'Skewness' features in firstorder

In [None]:
extractor.enableFeaturesByName(firstorder=['Mean', 'Skewness'])

In [None]:
extractor.enabledFeatures

We can get the docstrings of the active features

In [None]:
print('Active features:')
for cls, features in six.iteritems(extractor.enabledFeatures):
  if len(features) == 0:
    features = [f for f, deprecated in six.iteritems(featureClasses[cls].getFeatureNames()) if not deprecated]
  for f in features:
    print(f)
    print(getattr(featureClasses[cls], 'get%sFeatureValue' % f).__doc__)

# Reading the images in memory

PyRadiomics accept directly NIfTi file ('*.nii.gz') or  [SimpleITK](https://simpleitk.org/) objects in input. It needs both the image and the corresponding mask

In [None]:
import SimpleITK as sitk

Our images are ".pgm" files. This format is not supported by SimpleITK

In [None]:
os.path.join(dataset_path, "0008p1_3_1_2_resized.pgm")

We read the images with Pillow and store them in NumPy array, than we convert them in a SimpleITK object

In [None]:
def read_pgm_as_sitk(image_path):
  """ Read a pgm image as sitk image """
  np_array = np.asarray(PIL.Image.open(image_path))
  sitk_image = sitk.GetImageFromArray(np_array)
  return sitk_image

In [None]:
im_1 = read_pgm_as_sitk(os.path.join(dataset_path, "0008p1_3_1_2_resized.pgm"))
im_1_mask = read_pgm_as_sitk(os.path.join(dataset_path, "0008p1_3_1_2_mass_mask.pgm"))

In [None]:
type(im_1)

# Calculating the values of active features

In [None]:
print('Calculating features')
featureDict = extractor.execute(im_1,im_1_mask,label=255)
type(featureDict)

Features are stored in a dictionary

In [None]:
featureDict

# Compute the features for the whole dataset and store them in a file

We will compute the features for the whole dataset, and we will add to the dictionary the case IDs. We will use the [csv](https://docs.python.org/3/library/csv.html#module-csv) module to export the features as a csv file.


In [None]:
import glob
import csv

In [None]:
images_fnames = glob.glob(os.path.join(dataset_path,'*_resized.pgm'))

In [None]:
extracted_data = []
for image_fname in images_fnames:
  mask_fname = image_fname.replace('resized', 'mass_mask')
  image = read_pgm_as_sitk(image_fname)
  mask = read_pgm_as_sitk(mask_fname)
  featureVector = extractor.execute(image, mask, label=255)
  featureVector['image_ID'] = os.path.basename(image_fname)
  featureVector['mask_ID'] = os.path.basename(mask_fname)
  extracted_data.append(featureVector)


In [None]:
extracted_data

The extracted data is a list of dictionary

In [None]:
type(extracted_data)

We will identify the variables we will store in the .csv file.

In [None]:
list(extracted_data[0].keys())

We can either store them all:

In [None]:
csv_columns =  list(extracted_data[0].keys())

or we can select a number of interesting features:

In [None]:
selected_features_name = [x for x in list(extracted_data[0].keys()) if 'firstorder' in x]
csv_columns = [ 'image_ID', 'mask_ID' ] + selected_features_name
csv_columns

We will store the selected features for further analysis in a .csv file

In [None]:
csv_file = "extracted_features.csv"

with open(csv_file, 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=csv_columns, extrasaction='ignore' )
    writer.writeheader()
    writer.writerows(extracted_data)

In [None]:
!ls

In [None]:
!cat extracted_features.csv

We can copy the output file on our gdrive folder, as the /content/ forlder content will be reset after the session

In [None]:
!cp extracted_features.csv /content/gdrive/My\ Drive/.