# RSNA-MICCAI Brain Tumor Radiogenomic Classification : A simple EDA (work in progress, more to come soon)

In this competition, we are provided with MRI images of glioblastoma (malignant brain tumor) patients and asked to predict MGMT methylation, a genetic subtype that is a strong predictor of a patient's responsiveness to chemotherapy. 

Currently, in order to determine the genetic status of the tumor, a biopsy must be obtained during surgery, and it may take several weeks to genetically characterize the specimen. In this competition, we are directly predicting the genetic status from MRI images, which could limit the amount of surgeries required and also lead to accelerated treatment and better selection of chemotherapeutics. It will lead to better management of patients with brain cancer.

There is a paper [here](https://arxiv.org/abs/2107.02314) with more details about this competition.

I'll provide a quick and simple EDA to help you get started with this very interesting competition!

# Imports

Let's start out by setting up our environment by importing the required modules:

In [None]:
! conda install -c conda-forge gdcm -y

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import pydicom
import glob
from tqdm.notebook import tqdm
from pydicom.pixel_data_handlers.util import apply_voi_lut
import matplotlib.pyplot as plt
from skimage import exposure
import cv2
import warnings
from fastai.vision.all import *
from fastai.medical.imaging import *
warnings.filterwarnings('ignore')

# A look at the provided data

Let's check what data is available to us:

In [None]:
dataset_path = Path('../input/rsna-miccai-brain-tumor-radiogenomic-classification')

In [None]:
dataset_path.ls()

In [None]:
train_df = pd.read_csv(dataset_path/'train_labels.csv')
print(f'There are {len(train_df)} patients in the dataset')

In [None]:
train_df.head(10)

We can see we have:

* train_labels.csv - file containing the target `MGMT_value` for each of the 585 patients in the training data
* sample_submission.csv - a sample submission file for us to predict `MGMT_value`.
* train folder - comprises of MRI scans for 585 patients in DICOM format.
* test folder - The hidden test dataset is >5x the size of the training set.

In [None]:
train_df['MGMT_value'].hist()

The distribution of the MGMT methylation status is fairly balanced.

# Dataset organization and MRI images

This competition dataset provides four types of MRI images:

1. [Fluid Attenuated Inversion Recovery (FLAIR)](https://en.wikipedia.org/wiki/Fluid-attenuated_inversion_recovery)
2. [T1-weighted pre-contrast (T1w)](https://radiopaedia.org/articles/t1-weighted-image?lang=us)
3. [T1-weighted post-contrast (T1Gd)](https://radiopaedia.org/articles/t1-weighted-image?lang=us)
4. [T2-weighted (T2)](https://radiopaedia.org/articles/t2-weighted-image?lang=us)

Let's look at some example images:

In [None]:
def dicom2array(path, voi_lut=True, fix_monochrome=True):
    dicom = pydicom.read_file(path)
    # VOI LUT (if available by DICOM device) is used to
    # transform raw DICOM data to "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
    return data
        
    
def plot_img(img, size=(7, 7), is_rgb=True, title="", cmap='gray'):
    plt.figure(figsize=size)
    plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()


def plot_imgs(imgs, cols=4, size=7, is_rgb=True, title="", cmap='gray', img_size=(500,500)):
    rows = len(imgs)//cols + 1
    fig = plt.figure(figsize=(cols*size, rows*size))
    for i, img in enumerate(imgs):
        if img_size is not None:
            img = cv2.resize(img, img_size)
        fig.add_subplot(rows, cols, i+1)
        plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()

In [None]:
dicom_paths = [i.ls()[0] for i in (dataset_path/'train'/'00688').ls()]
imgs = [dicom2array(path) for path in dicom_paths]
plot_imgs(imgs)

I hope this is helpful (more to come soon), please upvote if you enjoyed this kernel! :)