# Brain Tumor Detector with ResNet (using Fastai)




0. [Introduction](#0)

1.  [Preparation](#1)

    1.1 [Packages](#1.1)
    
    1.2 [Data](#1.2)

2. [Classification model](#2)

# 0. Introduction <a id=0></a>

In modern medicine, neuroimaging provides an essential tool for physicians to diagnose intracranial injuries or diseases, such as tumors. Computer vision/ML models can be trained to assist medics and radiologists in analyzing patients' scans and their use is becoming more and more widespread. Using fastai we show how to train a convolutional neural network which can identify the presence of a tumor in a brain scan with an accuracy of over 99%.

## 1. Preparation <a id=1></a>

### 1.1 Packages <a id=1.1></a>

In [None]:
import numpy as np
import pandas as pd

import plotly
import plotly.express as px
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf

init_notebook_mode(connected=True)
cf.set_config_file(sharing='public',theme='white',offline=True)

!pip install -Uqq fastbook

import fastbook
fastbook.setup_book()

from fastai.data.all import *
from fastai.vision.all import *
from fastai.text.all import *
from fastbook import *

warnings.filterwarnings(action='ignore', category=UserWarning)

### 1.2 Data <a id=1.2></a>

We merge the two following datasets, both contaning images from brain scans.

1. https://www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection
2. https://www.kaggle.com/preetviradiya/brian-tumor-dataset

For convenience we create a dataframe which keeps track of all the images' paths, the corresponding labels (Tumor/No tumor) and pixel sizes.

In [None]:
path1 = Path('../input/brian-tumor-dataset/Brain Tumor Data Set/Brain Tumor Data Set')
path2 = Path('../input/brain-mri-images-for-brain-tumor-detection')
im_paths = L(*get_image_files(path1), *get_image_files(path2))

lbl_dct = {'no': 'No tumor', 'yes': 'Tumor', 'Healthy':'No tumor', 'Brain Tumor':'Tumor'}

df = pd.DataFrame({'image':im_paths})
df['px_size'] = df.apply(lambda x: PILImage.create(x.image).size, axis=1)
df['label'] = df.apply(lambda x: lbl_dct[parent_label(x.image)], axis=1)

df.head()

In [None]:
df.value_counts('label')

The dataset in not perfectly balanced, with an excess of scans from partients with tumors.

In order to collate images into batches to pass to the neural network, we need all images to have the same size. We look at the sizes' distribution in order to choose the correct resize shape.

In [None]:
w, h = list(zip(*df['px_size'].values))

sizesdf = pd.DataFrame({'width':w, 'height':h, 'label':df.label.values})

fig = px.scatter(sizesdf, x='width', y='height', color='label', labels={'x':'heigth', 'y':'width'},
           marginal_x='violin', marginal_y='violin', title='Image sizes (pixel)', height=600)

fig.show()

We see that the images are, for the most part, approximately square-shaped. We will reshape all images to be of size (360, 360). The resizing is performed by squishing: we avoid cropping since it might miss the tumor location, adding noise to the data.

## 2. Classification model <a id=2></a>

Let us create the train and validation dataloaders to feed to the neural network: we do it by using the high-level API provided by fastai, which automatically performs the image preprocessing and the label numericalization.

In [None]:
dblock = DataBlock.from_columns(blocks=(ImageBlock, CategoryBlock),
                               get_items=lambda x: (x.image, x.label),
                               item_tfms=Resize(360, method='squish'),
                               #batch normalization
                               batch_tfms=Normalize.from_stats(*imagenet_stats)) 

dls = dblock.dataloaders(df)

dls.show_batch(max_n=15)

In [None]:
print(f"Size of training set: {len(dls.train.items)}")
print(f"Size of validation set: {len(dls.valid.items)}")

We train a residual neural network using the `resnet34` architecture, with cross entropy as loss function. We use *transfer learning*: for the first two epochs we train only the NN head, then we unfreeze the encoder's layers and train all weigths for 8 more epochs. This is done with the `fine_tune` method of fastai's `Learner` class.

In [None]:
learn = cnn_learner(dls, resnet34, loss_func=CrossEntropyLossFlat(), metrics=accuracy)

learn.model

In [None]:
learn.fine_tune(8, freeze_epochs=2)

Without many tweakings, the approach based on convolutional neural networks has produced a model with an accuracy of 99.3% on the validation set! Let us take a look at the confusion matrix and the corresponding classification report.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

interp.plot_confusion_matrix()

interp.print_classification_report()

The few errors are spread evenly among the two labels: the unbalanced dataset does not appear to have induced any bias in the model. We can look at the seven misclassified images.

In [None]:
interp.plot_top_losses(k=7, figsize=(15,6))