<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active alert alert-info" data-toggle="list" role="tab" aria-controls="home"><center>Navigation</center></h2>

* [File structure](#1)
* [Data Visualization](#2)
* [Animation](#3)
    
    
* [Modeling](#20)
    
    
* [LIME](#30)

This work uses some ideas from: 

https://www.kaggle.com/code/ihelon/brain-tumor-eda-with-animations-and-modeling - navigation & animation technique

https://www.kaggle.com/code/cedricsoares/tf-efficientnet-transfer-learning-strat-split/notebook?scriptVersionId=77118556 - model 

https://www.kaggle.com/competitions/rsna-miccai-brain-tumor-radiogenomic-classification/leaderboard - leaderboard

<a id="1"></a>
## <div class="alert alert-warning" style="border:0;margin:0"><center> File structure </center></div>

The competition data is defined by three cohorts: Training, Validation (Public), and Testing (Private). The “Training” and the “Validation” cohorts are provided to the participants, whereas the “Testing” cohort is kept hidden at all times, during and after the competition.

These 3 cohorts are structured as follows: Each independent case has a dedicated folder identified by a five-digit number. Within each of these “case” folders, there are four sub-folders, each of them corresponding to each of the structural multi-parametric MRI (mpMRI) scans, in DICOM format. The exact mpMRI scans included are:

- Fluid Attenuated Inversion Recovery (FLAIR)
- T1-weighted pre-contrast (T1w)
- T1-weighted post-contrast (T1Gd)
- T2-weighted (T2)

Exact folder structure:

```
Training/Validation/Testing
│
└─── 00000
│   │
│   └─── FLAIR
│   │   │ Image-1.dcm
│   │   │ Image-2.dcm
│   │   │ ...
│   │   
│   └─── T1w
│   │   │ Image-1.dcm
│   │   │ Image-2.dcm
│   │   │ ...
│   │   
│   └─── T1wCE
│   │   │ Image-1.dcm
│   │   │ Image-2.dcm
│   │   │ ...
│   │   
│   └─── T2w
│   │   │ Image-1.dcm
│   │   │ Image-2.dcm
│   │   │ .....
│   
└─── 00001
│   │ ...
│   
│ ...   
│   
└─── 00002
│   │ ...
```

## Files

- **train/** - folder containing the training files, with each top-level folder representing a subject. **NOTE:** There are some unexpected issues with the following three cases in the training dataset, participants can exclude the cases during training: `[00109, 00123, 00709]`. We have checked and confirmed that the testing dataset is free from such issues.
- **train\_labels.csv** - file containing the target `MGMT_value` for each subject in the training data (e.g. the presence of MGMT promoter methylation)
- **test/** - the test files, which use the same structure as `train/`; your task is to predict the `MGMT_value` for each subject in the test data. **NOTE**: the total size of the rerun test set (Public and Private) is ~5x the size of the Public test set
- **sample\_submission.csv** - a sample submission file in the correct format

train/ - folder containing the training files, with each top-level folder representing a subject. 

NOTE: There are some unexpected issues with the following three cases in the training dataset, participants can exclude the cases during training: [00109, 00123, 00709]. We have checked and confirmed that the testing dataset is free from such issues.

In [None]:
# import os
# path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/train'

# sum(os.path.isdir(os.path.join(path, i)) for i in os.listdir(path))

In [None]:
# subjects = !ls /kaggle/input/rsna-miccai-brain-tumor-radiogenomic-classification/train/* -d
# len(subjects)

We're given 585 subjects in the training dataset

In [None]:
# path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/test'

# sum(os.path.isdir(os.path.join(path, i)) for i in os.listdir(path))

We're given 87 subjects in the test dataset

In [None]:
# import numpy as np
# import pandas as pd
# import pydicom
# import os
# import glob 
# # The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, 
# #although results are returned in arbitrary order.
# import matplotlib.pyplot as plt

In [None]:
# path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/train/*/*/*'

# len(glob.glob(path))

In [None]:
# path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/test/*/*/*'

# len(glob.glob(path))

In [None]:
# path_name = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/test/*/*/*'
# path = glob.glob(path_name)
# len(path)

In [None]:
# slices = 0
# for filename in path:
#     data = load_dicom(filename)
#     # Exclude the blank images
#     if data.max() == 0:
#         slices += 1 
# print(slices)

In [None]:
# (14404/51473)*100

In [None]:
# path_name = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/train/*/*/*'
# path = glob.glob(path_name)
# len(path)

In [None]:
# slices = 0
# for filename in path:
#     data = load_dicom(filename)
#     # Exclude the blank images
#     if data.max() == 0:
#         slices += 1 
# print(slices)

In [None]:
# (94752/348641)*100

It total we have 348 641 scans (train) and 51 473 scans (test)

94 752 (train, 27.17%)

14 404 (test, puste, 27.98%)   

In [None]:
# scans = !ls /kaggle/input/rsna-miccai-brain-tumor-radiogenomic-classification/test/*/FLAIR/* -d
# len(scans)

In [None]:
# from tqdm import tqdm
# count_min = 10e6
# count_max = 0
# for s in tqdm(scans):
#     cnt, = !ls {s}/* | wc -l # this is super-slow.. probably avoiding "!" would be a good idea
#     count_min = min(count_min, int(cnt))
#     count_max = max(count_max, int(cnt))
# count_min, count_max

Each scan consists of number of slices, between 15 and 514

<a id="2"></a>
## <div class="alert alert-warning" style="border:0;margin:0"><center> Data visualization </center></div>

In [None]:
import numpy as np
import pandas as pd
import pydicom
import os
import glob 
# The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, 
#although results are returned in arbitrary order.
import matplotlib.pyplot as plt

In [None]:
def load_dicom(path):
    ds = pydicom.dcmread(path)
    
    # Convert to float to avoid overflow or underflow losses.
    image_2d = ds.pixel_array.astype(float)

    # Convert 16-bit pixels to 8-bit
    if image_2d.max() != 0.0:
        image_2d_scaled = (np.maximum(image_2d, 0) / image_2d.max()) * 255.0
    else:
        image_2d_scaled = image_2d  # blank image

    # Convert to uint
    image_2d_scaled = np.uint8(image_2d_scaled)
    
    return image_2d_scaled

def get_middle_image(file_directory):
    list_of_files = os.listdir(file_directory)
    #  sorts the list of image directories by image number in a path
    list_of_files.sort(key=lambda x: int(x.split('/')[-1].split('-')[-1].split('.')[0]))  
    num_of_scans = len(list_of_files)
    middle_file = list_of_files[num_of_scans//2]
    return middle_file


def full_ids(data):
# Add the full paths for each id for different types of sequences to the csv 
# 0 -> 00000; 1010 ->  01010
    return data.astype(str).str.zfill(5)


def check_MGMT_value(subject_folder):
    train_labels = pd.read_csv("../input/rsna-miccai-brain-tumor-radiogenomic-classification/train_labels.csv")
    mgmt_value = train_labels[full_ids(train_labels.BraTS21ID) == subject_folder]["MGMT_value"].values[0]
    return mgmt_value


def diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types):
    plt.figure(figsize=(16, 5))
    for i, scan_type in enumerate(scans_types, 1):
        file_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)   
        file_name = get_middle_image(file_directory)
        plt.subplot(1, 4, i)
        plt.title(f"{scan_type}", fontsize=16)
 
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        ds = load_dicom(os.path.join(file_directory, file_name))
        plt.imshow(ds, cmap="gray") 
    
    mgmt_value = check_MGMT_value(subject_folder)
    plt.suptitle(f"MGMT_value: {mgmt_value}", fontsize=16);
    plt.text(0.45, 0.88, f"Subject: {subject_folder}", fontsize=15, transform=plt.gcf().transFigure);
    plt.show()

In [None]:
# # data visualization
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00000'
# scans_types = ['FLAIR','T1w','T1wCE','T2w']


# diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types)

In [None]:
# subject_folder = '00003'

# diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types)

<a id="3"></a>
## <div class="alert alert-warning" style="border:0;margin:0"><center> Animation </center></div>

In [None]:
from matplotlib import animation, rc
rc('animation', html='jshtml')


def create_animation(ims):
    fig = plt.figure(figsize=(6, 6))
    plt.axis('off')
    im = plt.imshow(ims[0], cmap="gray");

    def animate_func(i):
        im.set_array(ims[i])
        return [im]

    return animation.FuncAnimation(fig, animate_func, frames = len(ims), interval = 100)


def read_scan(path, include_empty=True):
    t_paths = sorted(glob.glob(os.path.join(path, "*")), 
                     key=lambda x: int(x[:-4].split("-")[-1]))
    slices = []
    for filename in t_paths:
        data = load_dicom(filename)
        # Exclude the blank images
        if not include_empty and data.max() == 0:
            continue  # choose next image
        slices.append(data)
    return slices

In [None]:
def whole_scan_subplot(directory_path, dataset_type, subject_folder, scan_type, include_empty):
    folder_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)  
    print(folder_directory)
    images = read_scan(folder_directory,include_empty=include_empty)

    print('No of images:', len(images))
    mgmt_value = check_MGMT_value(subject_folder)
    print('MGMT: ', mgmt_value)

    fig = plt.figure(figsize=(30,10))

    c = 1
    for image in images:
        ax = fig.add_subplot(len(images)//10+1, 10, c)
        ax.imshow(image, cmap='gray')
        c+=1
        plt.axis('off')
        plt.suptitle(f"Folder: {dataset_type}/{subject_folder}/{scan_type} \nMGMT_value: {mgmt_value}", fontsize=35);
        fig.tight_layout()

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00000'
# scan_type = 'T1w'

# whole_scan_subplot(directory_path, dataset_type, subject_folder, scan_type, include_empty=True)

In [None]:
# images = read_scan("../input/rsna-miccai-brain-tumor-radiogenomic-classification/train/00000/T1w",include_empty=False)
# print(len(images))
# anim = create_animation(images)

In [None]:
# anim

In [None]:
# anim.save('im.mp4')

<a id="20"></a>
## <div class="alert alert-danger" style="border:0;margin:0"><center> Modeling </center></div>

**3rd place solution: https://www.kaggle.com/code/cedricsoares/tf-efficientnet-transfer-learning-strat-split/notebook?scriptVersionId=77118556**

His aproach: 
- He used stratified split based on patient ids and class on train dataset to sample a validation dataset
- He trained four EfficientNet-B3. One for each kind of MRI scans (FLAIR, T1w, T1wCE, T2w)
- He aggregated results by patient ids to compare differences between the maximum prediction and the average of predictions to average average and the minimum of predictions to keep the prediction linked to the highest difference.

https://www.tensorflow.org/tutorials/keras/save_and_load#options - how to load model

In [None]:
import os                          # Iterate over dataset directories
import math                        # For ceil method
import cv2                         # Resize for image files
import pydicom                     # Read dcm files
import matplotlib.pyplot as plt    # Plot images
import numpy as np                 # Linear algebra
import pandas as pd                # Data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf            # Load model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras_preprocessing.image.dataframe_iterator import DataFrameIterator

In [None]:
root_dir = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/'
df_train = pd.read_csv(root_dir+'train_labels.csv')

In [None]:
df_train['BraTS21ID_full'] = full_ids(df_train['BraTS21ID'])

# Add all the paths to the df for easy access
df_train['flair'] = df_train['BraTS21ID_full'].apply(lambda file_id : root_dir+'train/'+file_id+'/FLAIR/')
df_train['t1w'] = df_train['BraTS21ID_full'].apply(lambda file_id : root_dir+'train/'+file_id+'/T1w/')
df_train['t1wce'] = df_train['BraTS21ID_full'].apply(lambda file_id : root_dir+'train/'+file_id+'/T1wCE/')
df_train['t2w'] = df_train['BraTS21ID_full'].apply(lambda file_id : root_dir+'train/'+file_id+'/T2w/')

In [None]:
df_train.head(5)

In [None]:
df_test = pd.read_csv(root_dir+'sample_submission.csv')

df_test['BraTS21ID_full'] = full_ids(df_test['BraTS21ID'])

# Add all the paths to the df for easy access
df_test['flair'] = df_test['BraTS21ID_full'].apply(lambda file_id : root_dir+'test/'+file_id+'/FLAIR/')
df_test['t1w'] = df_test['BraTS21ID_full'].apply(lambda file_id : root_dir+'test/'+file_id+'/T1w/')
df_test['t1wce'] = df_test['BraTS21ID_full'].apply(lambda file_id : root_dir+'test/'+file_id+'/T1wCE/')
df_test['t2w'] = df_test['BraTS21ID_full'].apply(lambda file_id : root_dir+'test/'+file_id+'/T2w/')

In [None]:
df_test

In [None]:
# 00109 (FLAIR images are blank) 00123 (T1w images are blank) 00709 (FLAIR images are blank)
def get_train_val_dataframe(mri_type):
    all_img_files = []
    all_img_labels = []
    all_img_patient_ids = []
    for row in df_train.iterrows():
        if row[1]['BraTS21ID_full'] == '00109' and mri_type == 'flair':
            continue
        if row[1]['BraTS21ID_full'] == '00123' and mri_type == 't1w':
            continue
        if row[1]['BraTS21ID_full'] == '00709' and mri_type == 'flair':
            continue
        img_dir = row[1][mri_type]
        img_files = os.listdir(img_dir)
        img_nums = sorted([int(ele.replace('Image-', '').replace('.dcm', '')) for ele in img_files])
        mid_point = int(len(img_nums)/2)
        start_point = mid_point - max(int(mid_point*0.1), 1)
        end_point = mid_point + max(int(mid_point*0.1), 1)
        img_names = [f'Image-{img_nums[i]}.dcm' for i in range(start_point, end_point+1)]
        img_paths = [img_dir+ele for ele in img_names]
        img_labels = [row[1]['MGMT_value']]*len(img_paths)
        img_patient_ids = [row[1]['BraTS21ID']]*len(img_paths)
        all_img_files.extend(img_paths)
        all_img_labels.extend(img_labels)
        all_img_patient_ids.extend(img_patient_ids)

    train_val_df = pd.DataFrame({'patient_ids': all_img_patient_ids,
                  'labels': all_img_labels,
                  'file_paths': all_img_files})

    train_val_df['labels'] = train_val_df['labels'].map({1: '1', 0: '0'})
    
    #stratifiied 90% split on patient_ids and labels  
    class_prop= 0.90
    classes_splits  = {}
    for i in range(2):
        train_val_label_class = train_val_df[train_val_df['labels']==f'{i}']
        train_val_list_ids =  list(train_val_label_class['patient_ids'].unique())
        train_threshold = math.ceil(class_prop*len(train_val_list_ids))
        train_ids = train_val_list_ids[:train_threshold]
        val_ids = train_val_list_ids[train_threshold:]
        classes_splits[f'train_{i}'] = train_val_label_class[train_val_label_class['patient_ids'].isin(train_ids)]
        classes_splits[f'val_{i}'] = val_df = train_val_label_class[train_val_label_class['patient_ids'].isin(val_ids)]
        
    train_df = pd.concat([classes_splits['train_0'], classes_splits['train_1']], axis=0)
    val_df = pd.concat([classes_splits['val_0'], classes_splits['val_1']], axis=0)
  
    return train_df, val_df

In [None]:
get_train_val_dataframe('t2w')

In [None]:
def get_test_dataframe(mri_type):
    
    all_test_img_files = []
    all_test_img_labels = []
    all_test_img_patient_ids = []
    for row in df_test.iterrows():
        img_dir = row[1][mri_type]
        img_files = os.listdir(img_dir)
        img_nums = sorted([int(ele.replace('Image-', '').replace('.dcm', '')) for ele in img_files])
        mid_point = int(len(img_nums)/2)
        start_point = mid_point - max(int(mid_point*0.1), 1)
        end_point = mid_point + max(int(mid_point*0.1), 1)
        img_names = [f'Image-{img_nums[i]}.dcm' for i in range(start_point, end_point+1)]
        img_paths = [img_dir+ele for ele in img_names]
        img_labels = [row[1]['MGMT_value']]*len(img_paths)
        img_patient_ids = [row[1]['BraTS21ID']]*len(img_paths)
        all_test_img_files.extend(img_paths)
        all_test_img_labels.extend(img_labels)
        all_test_img_patient_ids.extend(img_patient_ids)

    test_df = pd.DataFrame({'patient_ids': all_test_img_patient_ids,
                  'labels': all_test_img_labels,
                  'file_paths': all_test_img_files})
    
    test_df['labels'] = ['1']*(len(test_df)-1) + ['0'] # workaround for testing data gen
    
    return test_df

In [None]:
get_test_dataframe("t2w")

In [None]:
class DCMDataFrameIterator(DataFrameIterator):
    def __init__(self, *arg, **kwargs):
        self.white_list_formats = ('dcm')
        super(DCMDataFrameIterator, self).__init__(*arg, **kwargs)
        self.dataframe = kwargs['dataframe']
        self.x = self.dataframe[kwargs['x_col']]
        self.y = self.dataframe[kwargs['y_col']]
        self.color_mode = kwargs['color_mode']
        self.target_size = kwargs['target_size']

    def _get_batches_of_transformed_samples(self, indices_array):
        # get batch of images
        batch_x = np.array([self.read_dcm_as_array(dcm_path, self.target_size, color_mode=self.color_mode)
                            for dcm_path in self.x.iloc[indices_array]])

        batch_y = np.array(self.y.iloc[indices_array].astype(np.uint8))  # astype because y was passed as str

        # transform images
        if self.image_data_generator is not None:
            for i, (x, y) in enumerate(zip(batch_x, batch_y)):
                transform_params = self.image_data_generator.get_random_transform(x.shape)
                batch_x[i] = self.image_data_generator.apply_transform(x, transform_params)
                # you can change y here as well, eg: in semantic segmentation you want to transform masks as well 
                # using the same image_data_generator transformations.

        return batch_x, batch_y

    
    #####################
    @staticmethod
    def read_dcm_as_array(dcm_path, target_size=(300, 300), color_mode='rgb'):
        image_array = pydicom.dcmread(dcm_path).pixel_array
        pixels = image_array - np.min(image_array)
        pixels = pixels / np.max(pixels)
        image_manual_norm = (pixels * 255).astype(np.uint8)
        image_array = cv2.resize(image_manual_norm, target_size, interpolation=cv2.INTER_NEAREST)  #this returns a 2d array
#         image_array = np.expand_dims(image_array, -1)
        if color_mode == 'rgb':
            image_array = np.dstack((image_array, np.zeros_like(image_array), np.zeros_like(image_array)))
        return image_array

In [None]:
SEED = 369
BATCH_SIZE = 128
CLASS_MODE = 'binary'
COLOR_MODE = 'rgb'
TARGET_SIZE = (300, 300)

In [None]:
def get_data_generators(train_df,val_df, test_df):
    train_augmentation_parameters = dict(
        rescale=1.0/255,
        zoom_range=0.2,
        rotation_range=0.2,
        fill_mode='nearest',
        height_shift_range= 0.1,
        width_shift_range=0.1,
        horizontal_flip=True,
        brightness_range = [0.8, 1.2]
    )
    
    val_augmentation_parameters = dict(
        rescale=1.0/255.0
    )

    test_augmentation_parameters = dict(
        rescale=1.0/255.0
    )

    train_consts = {
        'seed': SEED,
        'batch_size': BATCH_SIZE,
        'class_mode': CLASS_MODE,
        'color_mode': COLOR_MODE,
        'target_size': TARGET_SIZE,  
    }
    
    val_consts = {
    'batch_size': BATCH_SIZE,
    'class_mode': CLASS_MODE,
    'color_mode': COLOR_MODE,
    'target_size': TARGET_SIZE,
    'shuffle': False
    }

    test_consts = {
        'batch_size': BATCH_SIZE,
        'class_mode': CLASS_MODE,
        'color_mode': COLOR_MODE,
        'target_size': TARGET_SIZE,
        'shuffle': False
    }

    train_augmenter = ImageDataGenerator(**train_augmentation_parameters)
    val_augmenter = ImageDataGenerator(**val_augmentation_parameters)
    test_augmenter = ImageDataGenerator(**test_augmentation_parameters)

    train_generator = DCMDataFrameIterator(dataframe=train_df,
                                 x_col='file_paths',
                                 y_col='labels',
                                 image_data_generator=train_augmenter,
                                 **train_consts)
    
    val_generator = DCMDataFrameIterator(dataframe=val_df,
                                 x_col='file_paths',
                                 y_col='labels',
                                 image_data_generator=val_augmenter,
                                 **val_consts)
    
    test_generator = DCMDataFrameIterator(dataframe=test_df,
                                 x_col='file_paths',
                                 y_col='labels',
                                 image_data_generator=test_augmenter,
                                 **test_consts)
    
    return train_generator, val_generator, test_generator

In [None]:
# Recreate the exact same model, including its weights and the optimizer
best_model = tf.keras.models.load_model('../input/cedric-soares-best-model/best_model.h5')

# # Show the model architecture
# best_model.summary()

In [None]:
# len(best_model.layers)

In [None]:
# tf.keras.utils.plot_model(best_model)

In [None]:
%%time
# train a model for each of the mri types and then ensemble predictions
all_test_preds = []

# Re-evaluate the model
for mt in ['flair', 't1w', 't1wce', 't2w']:
    print(mt.upper())
    train_df, val_df = get_train_val_dataframe(mt)
    test_df = get_test_dataframe(mt)
    train_g, val_g, test_g = get_data_generators(train_df, val_df, test_df)

    results = best_model.evaluate(test_g, steps=len(test_g), verbose=2)
    print(f"Restored model. Test loss, test acc, test AUC: {results}")
    test_pred = best_model.predict(test_g, steps=len(test_g))
    test_df['pred_y'] = test_pred
    # aggregate the predictions on all image for each person (take the most confident prediction out of all image predictions)
    mean_pred = test_pred.mean()
    test_pred_agg = test_df.groupby('patient_ids').apply(
        lambda x: x['pred_y'].max()
        if (x['pred_y'].max() - mean_pred) > (mean_pred - x['pred_y'].min()) 
        else x['pred_y'].min())
    all_test_preds.append(test_pred_agg.values)

In [None]:
all_test_preds = np.array(all_test_preds)
plt.hist(all_test_preds.mean(0))

In [None]:
subm = pd.read_csv(root_dir+'sample_submission.csv')
subm['MGMT_value'] = all_test_preds.mean(0)
subm.to_csv("submission.csv", index=False)

In [None]:
subm

In [None]:
def check_MGMT_value_test(subject_folder):
    mgmt_value = subm[subm.BraTS21ID.astype(str).str.zfill(5) == subject_folder]["MGMT_value"].values[0]
    return mgmt_value


<a id="30"></a>
## <div class="alert alert-warning" style="border:0;margin:0"><center> LIME </center></div>

https://towardsdatascience.com/interpreting-image-classification-model-with-lime-1e7064a2f2e5

In [None]:
!pip install lime

In [None]:
from lime import lime_image
from skimage.segmentation import mark_boundaries
from tqdm import tqdm

explainer = lime_image.LimeImageExplainer()

In [None]:
def diffrent_types_subplot_LIME_steps(directory_path, dataset_type, subject_folder, scans_types):
    plt.figure(figsize=(16, 22))
    
    for i, scan_type in enumerate(scans_types, 1):
        file_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)   
        file_name = get_middle_image(file_directory)
        
        image = load_dicom(os.path.join(file_directory, file_name))
        plt.subplot(5, len(scans_types), i)
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(image, cmap="gray") 
        
        explanation = explainer.explain_instance(image, best_model.predict, top_labels=2, hide_color=0, num_samples=500)
        plt.subplot(5, len(scans_types), i + len(scans_types))
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(mark_boundaries(image, explanation.segments))

        
        # print(explanation.__dict__.keys())
        # print(explanation.top_labels)
        temp_1, mask_1 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
        temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)
        plt.subplot(5, len(scans_types), i + len(scans_types)*2)
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(mark_boundaries(temp_2, explanation.segments))
        

        plt.subplot(5, len(scans_types), i + len(scans_types)*3)
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(mark_boundaries(temp_2, mask_2))
        
        
        plt.subplot(5, len(scans_types), i + len(scans_types)*3)
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(mark_boundaries(temp_2, mask_2))

    
    mgmt_value = check_MGMT_value(subject_folder)
    plt.suptitle(f"MGMT_value: {mgmt_value}", fontsize=16);
    plt.text(0.45, 0.95, f"Subject: {subject_folder}", fontsize=15, transform=plt.gcf().transFigure);
    plt.show()

In [None]:
directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
dataset_type = 'train'
subject_folder = '00000'
scans_types = ['FLAIR','T1w', 'T1wCE','T2w']


diffrent_types_subplot_LIME_steps(directory_path, dataset_type, subject_folder, scans_types)

In [None]:
# subject_to_check = '../input/rsna-miccai-brain-tumor-radiogenomic-classification/train/*'

# subject_to_check = glob.glob(subject_to_check)
# subject_to_check = [p[-5:] for p in subject_to_check]
# subject_to_check.sort()

In [None]:
# for subject_folder in tqdm(subject_to_check):
#     directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
#     dataset_type = 'train'
#     # subject_folder = '00000'
#     scans_types = ['FLAIR','T1w','T1wCE','T2w']


#     diffrent_types_subplot_LIME_2(directory_path, dataset_type, subject_folder, scans_types)
#     diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types)

images — The image that we want LIME to explain.

classifier_fn — Your image classier prediction function.

top_labels — The number of labels that you want LIME to show. If it’s 3, then it will only show the top 3 labels with highest probabilities and ignore the rest.

num_samples — to determine the amount of artificial data points similar to our input that will be generated by LIME.

In [None]:
from skimage.segmentation import mark_boundaries

def diffrent_types_subplot_with_LIME_2(directory_path, dataset_type, subject_folder, scan_type):
    for scan_type in scans_types:
        file_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)   
        file_name = get_middle_image(file_directory)
        path_to_file = os.path.join(file_directory, file_name)
        image = load_dicom(path_to_file)

        explanation = explainer.explain_instance(image, best_model.predict, top_labels=2, hide_color=0, num_samples=500)


        temp_1, mask_1 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
        temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)

        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
        ax1.imshow(mark_boundaries(temp_1, mask_1))
        ax2.imshow(mark_boundaries(temp_2, mask_2))
        ax1.axis('off')
        ax2.axis('off')
        mgmt_value = check_MGMT_value(subject_folder)
        plt.suptitle(f"MGMT_value: {mgmt_value}\nImage: {dataset_type}/{subject_folder}/{scan_type}/{file_name}", fontsize=16);
        plt.show()

In [None]:
from skimage.segmentation import mark_boundaries


def diffrent_types_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type):
    plt.figure(figsize=(16, 6))
    for i, scan_type in enumerate(scans_types, 1):
        file_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)   
        file_name = get_middle_image(file_directory)
        path_to_file = os.path.join(file_directory, file_name)
        image = load_dicom(path_to_file)

        explanation = explainer.explain_instance(image, best_model.predict, top_labels=2, hide_color=0, num_samples=500)

        temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)

        plt.subplot(1, len(scans_types), i)
        plt.title(f"{scan_type}", fontsize=16)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(file_name, fontsize=14)
        plt.imshow(mark_boundaries(temp_2, mask_2))
    
    mgmt_value = check_MGMT_value(subject_folder)
    plt.suptitle(f"MGMT_value: {mgmt_value}", fontsize=16);
    plt.text(0.45, 0.88, f"Subject: {subject_folder}", fontsize=15, transform=plt.gcf().transFigure);
    plt.show()

In [None]:
directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
dataset_type = 'train'
subject_folder = '00285'
scans_types = ['FLAIR','T1w','T1wCE','T2w']


diffrent_types_subplot_with_LIME(directory_path, dataset_type, subject_folder, scans_types)
diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types)

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00003'
# scans_types = ['FLAIR','T1w','T1wCE','T2w']

# diffrent_types_subplot(directory_path, dataset_type, subject_folder, scans_types)
# diffrent_types_subplot_with_LIME(directory_path, dataset_type, subject_folder, scans_types)

In [None]:
def whole_scan_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type, include_empty):
    folder_directory = os.path.join(directory_path, dataset_type, subject_folder, scan_type)  
    print(folder_directory)
    images = read_scan(folder_directory,include_empty=include_empty)

    print('No of images:', len(images))
    mgmt_value = check_MGMT_value(subject_folder)
    print('MGMT: ', mgmt_value)

    fig = plt.figure(figsize=(30,10))

    c = 1
    masks= []
    temps = []
    for image in images:
        explanation = explainer.explain_instance(image, best_model.predict, top_labels=2, hide_color=0, num_samples=500)

        #temp_1, mask_1 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
        temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)
        masks.append(mask_2)
        temps.append(temp_2)
        
        ax = fig.add_subplot(len(images)//10+1, 10, c)
        
        ax.imshow(mark_boundaries(temp_2, mask_2))
        c+=1
        plt.axis('off')
        plt.suptitle(f"Folder: {dataset_type}/{subject_folder}/{scan_type} \nMGMT_value: {mgmt_value}", fontsize=35);
        fig.tight_layout()
    return masks, temps

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00000'
# scan_type = 'T1wCE'

# masks, temps = whole_scan_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type, include_empty=False)

In [None]:
def create_animation_LIME(masks, temps):
    fig = plt.figure(figsize=(6, 6))
    plt.axis('off')
    im = plt.imshow(mark_boundaries(temps[0], masks[0]));

    def animate_func(i):
        im.set_array(temps[i])
        return [im]

    return animation.FuncAnimation(fig, animate_func, frames = len(temps), interval = 100)


In [None]:
# anim = create_animation_LIME(masks, temps)

In [None]:
# anim

In [None]:
# anim.save('im_lime_T1wCE_1.mp4')

Now we know why our model classifies our image as a panda! On the left image, we can see that only the super-pixels where the panda is visible are shown. This means that our model classifies our image as a panda because of these parts of super-pixels.

On the right image, the area of super-pixels colored in green are the ones that increase the probability of our image belongs to a panda class, while the super-pixels colored in red are the ones that decrease the probability.

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00003'
# scan_type = 'T1wCE'

# masks, temps = whole_scan_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type, include_empty=False)
# anim = create_animation_LIME(masks, temps)
# anim

In [None]:
# anim.save('im_lime_T1wCE_0.mp4')

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00000'
# scan_type = 'T1w'

# masks_T1w_1, temps_T1w_1 = whole_scan_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type, include_empty=False)

In [None]:
# anim_T1w_1 = create_animation_LIME(masks_T1w_1, temps_T1w_1)
# anim_T1w_1.save('im_lime_T1w_1.mp4')
# anim_T1w_1

In [None]:
# directory_path = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
# dataset_type = 'train'
# subject_folder = '00003'
# scan_type = 'T1w'

# masks_T1w_0, temps_T1w_0 = whole_scan_subplot_with_LIME(directory_path, dataset_type, subject_folder, scan_type, include_empty=False)

In [None]:
# anim_T1w_0 = create_animation_LIME(masks_T1w_0, temps_T1w_0)
# anim_T1w_0.save('im_lime_T1w_0.mp4')
# anim_T1w_0