 **This notebook attempts to use Pytorch Lightning to speed up the process to modeling.
 A large part of this code was taken from Roland Leuthy the link to his notebook is [here](https://www.kaggle.com/rluethy/efficientnet3d-with-one-mri-type). It's a great notebook and he has managed to get a great score on the leaderboard.**

This notebook uses [Pytorch Lightning](https://www.pytorchlightning.ai/) if you know Pytorch you should be able to pick this up quite quickly. I'll attempt explain the best I can in this notebook.



![](https://images.unsplash.com/photo-1559757175-5700dde675bc?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=889&q=80)

# **Dependencies**

In [None]:
import torch 
import torch.nn as nn
import cv2
import pytorch_lightning as pl
from pytorch_lightning.core.lightning import LightningModule

from torch.utils.data import Dataset,DataLoader 
import pydicom
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import random
import glob
import time
import pandas as pd
from sklearn.model_selection import train_test_split


import seaborn as sns

The goal of this notebook is to build a Functioning model for the **Flair** MRI type. After building a model for each MRI type one should be able to make a separate notebook combining all of them. 

In [None]:
if os.path.exists("../input/rsna-miccai-brain-tumor-radiogenomic-classification"):
    data_directory = '../input/rsna-miccai-brain-tumor-radiogenomic-classification'
    pytorch3dpath = "../input/efficientnet3d/EfficientNet-PyTorch-3D-master"
else:
    data_directory = '/media/roland/data/kaggle/rsna-miccai-brain-tumor-radiogenomic-classification'
    pytorch3dpath = "EfficientNet-PyTorch-3D"
    
mri_types = ['FLAIR','T1w','T1wCE','T2w']
SIZE = 256
NUM_IMAGES = 64


# **Data**

In [None]:
df = pd.read_csv("../input/rsna-miccai-brain-tumor-radiogenomic-classification/train_labels.csv")
df = df.loc[df['BraTS21ID'] != 109]
df = df.loc[df['BraTS21ID'] != 709]
df = df.reset_index(drop=True)
df

In [None]:
sample_df = pd.read_csv("../input/rsna-miccai-brain-tumor-radiogenomic-classification/sample_submission.csv")

In [None]:
X = df[["BraTS21ID"]]
y = df[["MGMT_value"]]
train_x,test_x,train_y,test_y = train_test_split(X,y,test_size = 0.25 , random_state = 42,stratify=y["MGMT_value"])
test_x,val_x,test_y,val_y = train_test_split(test_x,test_y,test_size = 0.25 , random_state = 42, stratify=test_y["MGMT_value"])
print(len(train_x) , "\n" , len(test_x) , "\n" , len(val_x))

# **Data Preprocessing**

The preprocessing will convert each FLAIR dicom file for that patient into a 3D image of dimensions 64,256,256

In [None]:
def load_dicom_image(path, img_size=SIZE):
    dicom = pydicom.read_file(path)
    data = dicom.pixel_array
    if np.min(data)==np.max(data):
        data = np.zeros((img_size,img_size))
        return data
    data = data - np.min(data)
    if np.max(data) != 0:
        data = data / np.max(data)
    
    #data = (data * 255).astype(np.uint8)
    data = cv2.resize(data, (img_size, img_size))
    return data

def load_dicom_images_3d(scan_id, num_imgs=NUM_IMAGES, img_size=SIZE, mri_type="FLAIR", split="train"):

    files = sorted(glob.glob(f"{data_directory}/{split}/{scan_id}/{mri_type}/*.dcm"))
    
    middle = len(files)//2
    num_imgs2 = num_imgs//2
    p1 = max(0, middle - num_imgs2)
    p2 = min(len(files), middle + num_imgs2)
    img3d = np.stack([load_dicom_image(f) for f in files[p1:p2]]).T 
    if img3d.shape[-1] < num_imgs:
        n_zero = np.zeros((img_size, img_size, num_imgs - img3d.shape[-1]))
        img3d = np.concatenate((img3d,  n_zero), axis = -1)
            
    return np.expand_dims(img3d,0)


In [None]:
load_dicom_images_3d("00002").shape

# **Dataset**

In [None]:
class RSNA_Dataset(Dataset):
    def __init__(self, paths, targets=None, mri_type="Flair", label_smoothing=0.0, split="train"):
        self.paths = paths
        self.targets = targets
        self.mri_type = mri_type
        self.label_smoothing = label_smoothing
        self.split = split
          
    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, index):
        scan_id = self.paths[index]
        if self.targets is None:
            data = load_dicom_images_3d(str(scan_id).zfill(5),split = "test")
        else:
            data = load_dicom_images_3d(str(scan_id).zfill(5))

        if self.targets is None:
            return torch.tensor(data).float()
        else:
            y = torch.tensor(abs(self.targets[index]-self.label_smoothing), dtype=torch.float)
            return torch.tensor(data).float(),y

In [None]:
train_dataset = RSNA_Dataset(
                train_x["BraTS21ID"].values,
                train_y["MGMT_value"].values
)

test_dataset = RSNA_Dataset(
                    val_x["BraTS21ID"].values,
                    val_y["MGMT_value"].values
)

validation_dataset = RSNA_Dataset(
                    test_x["BraTS21ID"].values,
                    test_y["MGMT_value"].values
)

predict_dataset = RSNA_Dataset(
                    sample_df["BraTS21ID"].values,
)

you could directly send the code to the trainer but I find using the lightning datamodule to be more visually pleasing and cleaner

In [None]:
class RSNA_DataModule(pl.LightningDataModule):
    def __init__(self):
        super().__init__()
        self.train = train_dataset
        self.val = validation_dataset
        self.test = test_dataset
        self.predict = predict_dataset
        
    def train_dataloader(self):
        return DataLoader(self.train,batch_size = 20,shuffle = True,num_workers=1)
    def val_dataloader(self):  
        return DataLoader(self.val,batch_size = 20,shuffle = False,num_workers=1)
    def test_dataloader(self):
        return DataLoader(self.test,batch_size = 22,shuffle = False,num_workers=1)
    def predict_dataloader(self):
        return DataLoader(self.predict,batch_size = 1,shuffle = False,num_workers=1)

Sanity Check: To check the data we're sending to the model

In [None]:
image , label  = next(iter(DataLoader(train_dataset,batch_size = 1,shuffle = True)))
print(image,label)

we'll be using a pretrained EfficentNet Model for our model and change the number of classes to 1 since we're going to be using binary classification

In [None]:
package_path = "../input/efficientnet3d/EfficientNet-PyTorch-3D-master"
sys.path.append(package_path)
from efficientnet_pytorch_3d import EfficientNet3D
neural_network = EfficientNet3D.from_name("efficientnet-b1", override_params={'num_classes': 1}, in_channels=1) 

The cell below is a functoin that returns the auc score on the validation. We're Using auc score since it;s the metric thats being used with the competition. The metrics auc and roc have been explained very well in this [video](https://www.youtube.com/watch?v=4jRBRDbJemM&vl=en)

In [None]:
from sklearn.metrics import roc_curve,auc

probs = nn.Sigmoid() # Since we're using binary cross entropy we use the Sigmoid function to convert the Logits into probabilities

def get_score(y_pred,y):
    probabilities = []
    for x in y_pred:
        prob = probs(x)
        top_p, top_class = prob.topk(1, dim = -1)
        probabilities.append(float(top_p))
    y = [float(t) for t in y]
    logistic_fpr , logistic_tpr,_ = roc_curve(y , probabilities)
    aoc_score = auc(logistic_fpr , logistic_tpr)
    return aoc_score

The model, optimizers are all set up in the lightning Module class. Notice the "auc_score" being logged in the validation step we're going to be using that to monitor our models performance.

In [None]:
class RSNA_Model(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.neural_net = neural_network
        
    def forward(self,x):
        return self.neural_net(x)
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters() , lr = 1e-4)
        sch = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 20, gamma=0.5, last_epoch=-1, verbose=False)
        return {
        "optimizer": optimizer,
        "lr_scheduler": {
            "scheduler": sch,
          #  "monitor": "",
        },
    }
    
    def training_step(self,batch,batch_idx):
        x,y = batch
        y_pred = self(x)
        y = y.unsqueeze(-1)
        loss = torch.nn.functional.binary_cross_entropy_with_logits(y_pred,y)
        return loss
    
    def validation_step(self,batch,batch_idx):
        x,y = batch
        y_pred = self(x)
        y = y.unsqueeze(-1)
        loss = torch.nn.functional.binary_cross_entropy_with_logits(y_pred,y)
        self.log("auc_score" ,get_score(y_pred,y) )
        return loss
    def test_step(self,batch,batch_idx):
        x,y = batch
        y_pred = self(x)
        y = y.unsqueeze(-1)
        loss = torch.nn.functional.binary_cross_entropy_with_logits(y_pred,y)
        self.log("test_loss : " , loss)
        return loss

# **Training**

the cell below will will passes to the trainer and will return the model with the highest "auc_score"

In [None]:
from pytorch_lightning.callbacks import ModelCheckpoint
checkpoint_callback = ModelCheckpoint(
                            monitor = "auc_score",
                            mode = "max",
)

I have already used the model and made a checkpoint. You can load checkpoints so you won't have to resume training from scratch. The best part about lightning is that you only have to declare the number of gpus you want to use and Lightning will handle the rest. Note that 15 epochs will take around 45mins on a gpu and 6 hours on a cpu.

In [None]:
%%time
from pytorch_lightning import Trainer
model = RSNA_Model()
module = RSNA_DataModule()
model.load_from_checkpoint('../input/d/aristotle609/efficient3d-checkpoint/FLAIR-Best_Checkpoint.ckpt')
trainer = Trainer(max_epochs=15,gpus = 1,  callbacks = [checkpoint_callback])
trainer.fit(model,module)

you can test the trainer below. For more accurate results I would recommend increasing the test size.

In [None]:
result = trainer.test()
print(result)

# **Predictions**

In [None]:
predictons = trainer.predict()

In [None]:
probabilities = []
for x in predictons:
    prob = probs(x)
    top_p, top_class = prob.topk(1, dim = 1)
    probabilities.append(float(top_p))

In [None]:
trainer.save_checkpoint("FLAIR-Best_Checkpoint.ckpt")# this will save the checkpoint

In [None]:
import shutil
shutil.rmtree("./lightning_logs") # you might want to keep the logs if you want to plot them on a graph since I won't be using them I have deleted them

In [None]:
data = {
    'BraTS21ID' : list(sample_df["BraTS21ID"]),
    'MGMT_value' : probabilities
}
submission = pd.DataFrame(data)
submission.to_csv("submission.csv", index=False)

In [None]:
display(pd.read_csv("./submission.csv"))
sns.displot(submission["MGMT_value"])