<a href="https://www.kaggle.com/code/emanafi/brain-tumor-classification?scriptVersionId=169812238" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<a name='T'>

<p style="padding: 20px;
          background-color: #0000FF;
          font-family: computermodern;
          color: #EEECEC;
          font-size: 300%;
          text-align: center;
          border-radius: 20px 0px;
          "> CNN Deep Learning Network for Multi-Class Brain Tumor Classification. </p>

    
***
## <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  Table of Contents <b>

1. [`About The Data`](#data)
2. [`Imports Data`](#import_data)
3. [`Data Visualization`](#vis)
4. [`Analysis of CNN Output`](#a_cnn)


Author: [Nesrine Wagaa]
    

***
<a name='data'>
  
# 1 <span style='color:blue'>|</span>  About The Data 

## <b> 1.1 <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  What is a Brain Tumor? <b>
A brain tumor refers to an abnormal collection or mass of cells within the brain. The skull, which encloses the brain, has limited space, and any growth within this confined area can lead to complications. Brain tumors can be either cancerous (malignant) or noncancerous (benign). As benign or malignant tumors grow, they can increase the pressure inside the skull. This elevated pressure can cause brain damage and pose a life-threatening risk.

### <b> 1.1.1 <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  The Importance of Brain Tumor Classification <b>
The early detection and classification of brain tumors are crucial areas of research in medical imaging. Accurate classification aids in selecting the most suitable treatment method, potentially saving patients' lives.

### <b> 1.1.2 <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  Methods <b>
The application of deep learning approaches in healthcare has yielded significant advancements in health diagnosis. According to the World Health Organization (WHO), effective brain tumor diagnosis involves detecting the tumor, identifying its location within the brain, and classifying it based on malignancy, grade, and type. This experimental work focuses on diagnosing brain tumors using Magnetic Resonance Imaging (MRI). The process entails tumor detection, classification by grade and type, and identification of the tumor's location. Instead of employing individual models for each classification task, this method utilizes a single model for classifying brain MRI images across different classification tasks. The classification and detection of tumors employ a Convolutional Neural Network (CNN)-based multi-task approach. Additionally, a CNN-based model is employed to segment the brain and identify the location of the tumor.


## <b> 1.2 <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  About the Dataset <b>
This dataset is a compilation of three primary datasets: figshare, Br35H, and a removed source due to bad data.

### <b> 1.2.1 <span style='border-left: 4px solid #0000FF; padding-left: 10px;'>  Dataset Description <b>
The dataset comprises a total of `7023` human **brain MRI images**, categorized into four distinct classes. The dataset focuses on brain tumors and their classification. The four classes are as follows:

**Glioma**: Cancerous brain tumors in glial cells.

**Meningioma**: Non-cancerous tumors originating from the meninges.

**No Tumor**: Normal brain scans without detectable tumors.

**Pituitary**: Tumors affecting the pituitary gland, which can be cancerous or non-cancerous.

Advancing the development of machine learning models for tumor classification is crucial for driving progress in the field of neurology and making a significant impact on the lives of individuals. These models have the potential to enhance medical research, improve diagnostic accuracy, and contribute to effective treatment strategies for various types of tumors. By leveraging machine learning techniques, we can significantly aid in the advancement of neurology and ultimately improve healthcare outcomes for people affected by tumors.
    
The "No Tumor" class images were obtained from the `Br35H dataset`.

Note: The images in this dataset have varying sizes. After pre-processing and removing excess margins, you can resize the images to the desired dimensions.

The data link and complete description here [`Brain Tumor Data on Kaggle`](https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset)
***
    

In [None]:
import os
import pandas as pd
# Generate data paths with labels
train_data_dir = '/kaggle/input/brain-tumor-mri-dataset/Training' # This path points to the directory where your training data is stored.
filepaths = []
labels = []    # Collect all file paths and corresponding labels

folds = os.listdir(train_data_dir) # Get a list of directories (folds) within train_data_dir
#print(folds)

for fold in folds:
    foldpath = os.path.join(train_data_dir, fold)
    filelist = os.listdir(foldpath) # returns a list containing the names of the entries in the directory given by foldpath
    for file in filelist:
        fpath = os.path.join(foldpath, file)
        
        filepaths.append(fpath)
        labels.append(fold)

# Concatenate data paths with labels into one dataframe
Fseries = pd.Series(filepaths, name= 'filepaths')
Lseries = pd.Series(labels, name='labels')

train_df = pd.concat([Fseries, Lseries], axis= 1)

In [None]:
# Generate Data paths with labels (TEST)

test_data_dir = '/kaggle/input/brain-tumor-mri-dataset/Testing'
filepaths = []
labels = []

folds = os.listdir(test_data_dir)
for fold in folds:
    foldpath = os.path.join(test_data_dir, fold)
    filelist = os.listdir(foldpath)
    for file in filelist:
        fpath = os.path.join(foldpath, file)
        
        filepaths.append(fpath)
        labels.append(fold)
        
FSeries = pd.Series(filepaths, name='filepaths')
LSeries = pd.Series(labels, name='labels')

ts_df = pd.concat([FSeries, LSeries], axis=1)

In [None]:
print(train_df)

In [None]:
print(ts_df)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(figsize=(9, 3)) # The figsize parameter specifies the width and height of the figure in inches.
fig.patch.set_facecolor("#f6f5f7") # Setting the face color of the figure
ax.set_facecolor("#f6f5f7")

x = sns.countplot(data=train_df, y=train_df["labels"]) # data=train_df specifies the DataFrame from which the data will be plotted. y=train_df["labels"] 
#specifies the column in train_df containing the categorical data. This will be plotted on the y-axis.

plt.title("\nThe Count of Images in Each Folder\n")
plt.show()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(figsize=(9, 3)) # The figsize parameter specifies the width and height of the figure in inches.
fig.patch.set_facecolor("#f6f5f7") # Setting the face color of the figure
ax.set_facecolor("#f6f5f7")

x = sns.countplot(data=ts_df, y=ts_df["labels"]) # data=train_df specifies the DataFrame from which the data will be plotted. y=train_df["labels"] 
#specifies the column in train_df containing the categorical data. This will be plotted on the y-axis.

plt.title("\nThe Count of Images in Each Folder\n")
plt.show()

In [None]:
from sklearn.model_selection import train_test_split
valid_df,tst_df=train_test_split(ts_df,test_size=0.5,random_state=50,stratify=ts_df["labels"])
print(f"ts_df shape: {ts_df.shape}")
print("---"*10)
print(f"valid data shape: {valid_df.shape}")
print(f"test data shape: {tst_df.shape}")

**Create image data generator:**

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator # This class is commonly used for image data augmentation and preprocessing when working with image datasets in deep learning tasks.
 
img_size=(224,224) # Specifying the size to which you want to resize your images

# Creating ImageDataGenerator instances for training and testing data
tr=ImageDataGenerator()
ts=ImageDataGenerator()

train_gen=tr.flow_from_dataframe(train_df,x_col="filepaths",y_col="labels",  # Column containing file paths and Column containing labels
                                 target_size=img_size,  # Target size to which images will be resized
                                 batch_size=16,shuffle=True, # Specifies the number of samples in each batch of data during training.
                                 class_mode='categorical',color_mode="rgb")
# Method means that the data will be randomly shuffled after each epoch during training. 
valid_gen=ts.flow_from_dataframe(ts_df,x_col='filepaths',y_col='labels',
                                target_size=img_size,
                                 class_mode="categorical",color_mode="rgb",
                                 shuffle=True,batch_size=16,)


test_gen=ts.flow_from_dataframe(ts_df, x_col='filepaths', y_col='labels',
                                 target_size=img_size,
                                 batch_size=16, shuffle=False,
                                 color_mode="rgb", class_mode="categorical")

**Show sample from train data**


It seems like you're normalizing the pixel values of an image to be between 0 and 1 by dividing each pixel value by 255. This is a common preprocessing step when working with image data in deep learning tasks

In [None]:
import numpy as np

gen_dict = train_gen.class_indices
classes = list(gen_dict.keys())
images , labels = next(train_gen)

plt.figure(figsize= (20,20))

for i in range(16):
    plt.subplot(4,4,i+1)
    image = images[i] / 255
    plt.imshow(image)
    index = np.argmax(labels[i])
    class_name = classes[index]
    plt.title(class_name )
    plt.axis('off')
    
plt.show()

**Building Deep Learning Model**

*include_top=*False: This parameter specifies whether to include the fully connected layers at the top of the network. By setting it to False, you're excluding these layers, which is common when you want to use the pre-trained Xception model as a feature extractor and add your own custom fully connected layers on top.

weights='imagenet': This parameter specifies the weights to be loaded into the model. By setting it to 'imagenet', you're initializing the model with pre-trained weights trained on the ImageNet dataset. This is typical when you want to leverage the knowledge learned by the model on ImageNet for transfer learning

input_shape=img_shape: This parameter specifies the shape of the input images that the model expects. img_shape should be a tuple representing the dimensions of the input images (height, width, channels).

pooling='max': This parameter specifies the type of pooling to be applied after the convolutional layers. In this case, 'max' pooling is used, which takes the maximum value from each feature map.

In [None]:
from tensorflow.keras.layers import Conv2D,MaxPooling2D, Flatten
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adamax



img_shape = (224,224, 3)

model = Sequential([
    Conv2D(filters=64, kernel_size=(3,3), padding="same", activation="relu", input_shape= img_shape),
    Conv2D(filters=64, kernel_size=(3,3), padding="same", activation="relu"),
    MaxPooling2D((2, 2)),
    
    Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    MaxPooling2D((2, 2)),
    
    Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    MaxPooling2D((2, 2)),
    
    Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    MaxPooling2D((2, 2)),
    
    Flatten(),
        Dense(128,activation = "relu"),
    Dense(64,activation = "relu"),
    Dense(4, activation = "softmax")
])

model.compile(Adamax(learning_rate= 0.001), loss= 'categorical_crossentropy', metrics= ['accuracy'])

model.summary()

In [None]:
history=model.fit(train_gen,epochs=10,
                  validation_data=test_gen,
                  shuffle=False)

In [None]:
# plotting 
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')


plt.subplot(2,1,2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')

In [None]:
train_score=model.evaluate(train_gen)
valid_score=model.evaluate(valid_gen)
test_score=model.evaluate(test_gen)

print(f"Train Loss : {train_score[0]:.3f}")
print(f"Train Accuracy : {train_score[1]*100:.2f}%")
print("-"*20)
print(f"Validation Loss : {valid_score[0]:.3f}")
print(f"Validation Accuracy : {valid_score[1]*100:.2f}%")
print("-"*20)
print(f"Test Loss: {test_score[0]:.3f}")
print(f"Test Accuracy: {test_score[1]*100:.2f}%")

In [None]:
preds=model.predict(test_gen)
y_pred=np.argmax(preds,axis=1)

In [None]:
from sklearn.metrics import confusion_matrix,classification_report
plt.figure(figsize=(10,5))
plt.style.use('default')
cm=confusion_matrix(test_gen.classes,y_pred)
labels = list(test_gen.class_indices.keys())
sns.heatmap(cm,annot=True,fmt="d",xticklabels=labels,yticklabels=labels,cmap="Blues", linewidths=.5)
plt.xlabel('\nPredicted Label',fontsize=13)
plt.ylabel('Actual Label\n',fontsize=13);