# Brain MRI Images for Brain Tumor Detection
## Image Augmentation

The Brain MRI Images for Brain Tumor Detection two types of data, tumorous and non-tumorous. The creator of the data set
did not specify what types of tumors are present in the toumorus data, so all data with tumors can be treated as potentially 
the same.

The data is divided up into 2 folders on download; a "yes" folder containing the toumorus data and a "no" folder containing the non-tumorous data.
In total there are 253 Brain MRI Images, not a substantial number as we are used to in deep learning. The data is split such that
there are 155 samples that are tumorous and 98 samples that are that are non-tumorous. 
Because of this low sample count, it is imperative that we use some kind of image augmentation to a) provide more samples and b)
to provide more variety in the population to help prevent overfiting. We will also correct for the imbalance of the dataset in
the augmentation phase as well.

You can find the original data [here](https://www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection).

In [7]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import glob
import cv2

%matplotlib inline

In [16]:
def augment_data(file_loc, n_samples, save_loc):
    data_gen = ImageDataGenerator(rotation_range=45, 
                                  width_shift_range=0.15, 
                                  height_shift_range=0.15, 
                                  shear_range=0.15, 
                                  brightness_range=(0.3, 1.0),
                                  horizontal_flip=True, 
                                  vertical_flip=True, 
                                  fill_mode="nearest")
    
    for file in glob.glob(file_loc + "*"):
        image = cv2.imread(file)
        image = image.reshape((1,) + image.shape)
        
        # generate augmented samples
        for i in range(n_samples):
            data_gen.flow(x=image, batch_size=1, save_to_dir=save_loc, save_format="jpg")

To correct for the class imbalance talked about above, we will generate 9 new images for every image that belongs to the
negative (non-tumurous) class and 6 images for every image that belongs to the positive (tumurous) class.

In [17]:
yes_path = "./data/raw/yes/"
no_path = "./data/raw/no/"

aug_data_path = "./data/augmented/"

# augment tumurous examples
augment_data(file_loc=yes_path, n_samples=6, save_loc=aug_data_path + "yes/")
# augment non-tumurous examples
augment_data(file_loc=no_path, n_samples=9, save_loc=aug_data_path + "no/")

We now have our data prepared to train our deep CNN.