<img style="width:20%;float: left; margin-right: 10px;" src="https://upload.wikimedia.org/wikipedia/en/a/ae/CERN_logo.svg"/>

# 2D MRI images preprocessing

Data preprocessing is a fundamental part of data analysis, it allows us to understand the data that we are going to use for the machine learning model.

For this case, we will use a public data set <a href="#1">[1]</a>, of magnetic resonance imaging for patients with brain tumors such as meningioma, glioma, pituitary and patients without tumors.



<hr>

# Let's get started!

The first step is to import the necessary modules. Those modules are for handling numerical matrices, plotting and transforming the images.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from skimage import color
from skimage.transform import resize
from skimage import io
from tqdm import tqdm
import glob
import os


# Defining the categories

The second step is to define the categories for our problem, in our case it is an array of four labels for meningioma, glioma, pituitary tumors and no tumor.


In [None]:
categories = ['meningioma_tumor', 'glioma_tumor', 'pituitary_tumor', 'no_tumor']

# Function to load the dataset

This function allows you to load the data sets for training and testing.


This function returns a dictionary with the images and shapes, the parameter must be "trainig" or "testing" as a string to be able to select the appropriate data set.

In [None]:
def get_dataset_original(dataset="training"): #other option is test
    data_orig = {}
    path = ""
    if dataset == "training":
        print("processing training dataset")
        path = "initial/Training"
        
    if dataset == "testing":
        print("processing testing dataset")
        path = "initial/Testing"
    
    for category in categories:
        cat_path=f"{path}/{category}"
        print(f"processing category {category} from path {cat_path}")
        imgs_files = glob.glob(f"{cat_path}/*")
        imgs_np = []
        shapes = []
        for img in imgs_files:
            mat = plt.imread(img, format='jpeg')
            imgs_np.append(mat)
            shapes.append(mat.shape)
        data_orig[category] = {}
        data_orig[category]["shapes"] = set(shapes) 
        data_orig[category]["data"] = imgs_np 
    return data_orig

# Function to preprocess the images

This function does some basic preprocessing, we take the images and set them all as 64x64 pixels in one channel.

More sophisticated procedures could be performed, using advanced tools for MRI and CT normalization, such as ants <a href="#2"> [2] </a>, skull extraction can also be performed <a href = "# 3"> [3] </a> which remove the skull from images to improve accuracy. You can also apply transforms to remove noise to correct contrast, etc.

Due to limited time and computational resources, we are doing something basic here.

In [None]:
def preprocess(dataset,img_shape = (64,64)):
    data = {}
    for category in dataset:
        raw_data = dataset[category]["data"]
        new_data = []
        for img in tqdm(raw_data):#poner una barra de progresso acá
            nimg = color.rgb2gray(img)
            nimg = resize(nimg,img_shape)
            new_data.append(nimg)
        data[category] = {}
        data[category]["data"] = new_data
    return data
    
    

# Function to save the preprocessed dataset

This functions allows to save the preprocessed data in numpy arrays files.
data will be save in folders called **"preprocessed/Training"** and **"preprocessed/Testing"**

In [None]:
def save_dataset_preprocessed(data,dataset="training"):
    path = ""
    if dataset == "training":
        path = "preprocessed/Training"
        print(f"saving training dataset to {path}")
        
    if dataset == "testing":
        path = "preprocessed/Testing"
        print(f"saving testing dataset to {path}")
            
    for category in train:
        print(f"saving images for {category}")
        cat_path=f"{path}/{category}"
        if not os.path.exists(cat_path):
            os.makedirs(cat_path)
        for img_n in range(len(data[category]["data"])):
            np.save(f'{cat_path}/{img_n}.npy', data[category]["data"][img_n])


# Loading the initial training images

Load the images in the initial state to do the preprocessing, calling the previously defined function.

In [None]:
train = get_dataset_original("training")

# Printing information about the dataset

In the next two cells you can find the number of images  and the number of different shapes by category.

In [None]:
for category in train:
    img_size = len(train[category]["data"])
    print(f"category = {category}  images = {img_size}")

In [None]:
for category in train:
    img_shapes = len(train[category]["shapes"])
    print(f"category = {category}  images shapes = {img_shapes}")

# Analysing some images

Lets see how looks the images running the cell below.
You will see that images has **differents sizes**, it is **not skull stripped** and probably not normalized.

Other important thing to have in mind is the orientation of the patient, as you can see in the plots below after executing the cell that we have different 2D images with **mixed orintation**.
Just to remember the next images shows the different orientations <a id="4">[4] </a>
<img src="https://my-ms.org/images/mri_planes_gnu.jpg" style="width:40%" />

Finally, is important to have in mind that MRIs can be of different types called sequences, T1-weighted, T2-weighted and Flair <a href="#5">[5] </a>.
<img src="https://case.edu/med/neurology/NR/t1t2flairbrain.jpg" style=""/>

For this dataset according to this <a src="https://www.kaggle.com/sartajbhuvaji/brain-tumor-classification-mri/discussion/214801"> post </a> **We are using all three**.



In [None]:
for j in range(2):
    fig = plt.figure(figsize=(20,20))
    plt.gray()  # show the filtered result in grayscale
    subplots=[]
    subplots.append(fig.add_subplot(141))
    subplots.append(fig.add_subplot(142))
    subplots.append(fig.add_subplot(143))
    subplots.append(fig.add_subplot(144))
    for i in range(len(categories)):
        subplots[i].set_title(f'Train for {categories[i]} ')
        subplots[i].imshow(train[categories[i]]["data"][j])

# Preprocessing train dataset

calling the preprocess funciton for the train dataset

In [None]:
train = preprocess(train)

In [None]:
fig = plt.figure(figsize=(20,20))
plt.gray()  # show the filtered result in grayscale
subplots=[]
subplots.append(fig.add_subplot(141))
subplots.append(fig.add_subplot(142))
subplots.append(fig.add_subplot(143))
subplots.append(fig.add_subplot(144))

for i in range(len(categories)):
    subplots[i].set_title(f'Train for {categories[i]} ')
    subplots[i].imshow(train[categories[i]]["data"][0])

# Doing the same for the testing

As a in the trainig dataset, lets do the same for the testing.

In [None]:
test = get_dataset_original("testing")

# Plotting some images
let's see how looks some images from test data set.

In [None]:
for j in range(2):
    fig = plt.figure(figsize=(20,20))
    plt.gray()  # show the filtered result in grayscale
    subplots=[]
    subplots.append(fig.add_subplot(141))
    subplots.append(fig.add_subplot(142))
    subplots.append(fig.add_subplot(143))
    subplots.append(fig.add_subplot(144))
    for i in range(len(categories)):
        subplots[i].set_title(f'Test for {categories[i]} ')
        subplots[i].imshow(test[categories[i]]["data"][j])

# Preprocessing test dataset

calling the preprocess funciton for the test dataset

In [None]:
test = preprocess(test)

In [None]:
fig = plt.figure(figsize=(20,20))
plt.gray()  # show the filtered result in grayscale
subplots=[]
subplots.append(fig.add_subplot(141))
subplots.append(fig.add_subplot(142))
subplots.append(fig.add_subplot(143))
subplots.append(fig.add_subplot(144))

for i in range(len(categories)):
    subplots[i].set_title(f'Train for {categories[i]} ')
    subplots[i].imshow(test[categories[i]]["data"][0])

# Saving the preprocessed dataset

Saving the datasets for the next part of the tutorial.

In [None]:
save_dataset_preprocessed(train,"training")

In [None]:
save_dataset_preprocessed(test,"testing")


# References

<a id="1">[1] </a> https://www.kaggle.com/sartajbhuvaji/brain-tumor-classification-mri

<a id="2">[2] </a>https://github.com/ANTsX/ANTs

<a id="3">[3] </a> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4879034/

<a id="4">[4] </a> https://my-ms.org/mri_planes.htm

<a id="5">[5] </a> https://case.edu/med/neurology/NR/MRI%20Basics.htm

### Come back to the index

Lets come back to the index to continue with the tutorial.
* [Index](index.ipynb)