In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list = []

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        list.append(os.path.join(dirname, filename))

# Printing only the first 5 items in the list        
for i in range (5):
    print(list[i])

print("...")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Project Setup and Initial File Exploration

In this cell, we begin by importing essential libraries such as:
- `NumPy` for linear algebra operations.
- `Pandas` for data processing and file input/output.

The code then explores the directory structure using the `os` library to locate and list all input files available in the read-only `/kaggle/input/` directory. The first five file paths are printed as a preview to understand the dataset structure. This step is critical for understanding the available data before performing any operations on it.

Additionally, it reminds the user about Kaggle's working directory limits and temporary storage.

---

In [None]:
import keras
from keras.layers import Conv2D, Flatten, Dense, Dropout, MaxPooling2D
from keras.models import Sequential

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.utils import shuffle

from PIL import Image

import cv2
import tqdm
import io
import ipywidgets as widgets
import tensorflow as tf

### Importing Essential Libraries for Model Development

In this cell, various libraries are imported to facilitate the construction and evaluation of the neural network model:

- `Keras`: A high-level neural networks API, particularly for defining the convolutional layers (`Conv2D`), pooling (`MaxPooling2D`), dense layers, and dropout regularization.
- `Sequential`: A model type in Keras used for building models layer by layer.
- `Scikit-learn`: Libraries such as `train_test_split`, `accuracy_score`, and `shuffle` to help with data preprocessing, splitting, and evaluation.
- `PIL (Python Imaging Library)` and `OpenCV (cv2)`: For image processing tasks.
- `tqdm`: To display progress bars in loops.
- `TensorFlow`: The backend library that powers Keras.

These imports are fundamental for building the convolutional neural network (CNN) model and for processing and splitting the image data.

---


In [None]:
image_res = 150

X_train = []
y_train = []

labels = ["glioma_tumor", 'meningioma_tumor', 'no_tumor', "pituitary_tumor"]

for i in labels:
    folder = os.path.join("/kaggle/input/brain-tumor-classification-mri/Training", i)
    
    for j in os.listdir(folder):
        image = cv2.imread(os.path.join(folder, j))
        image = cv2.resize(image, (image_res, image_res))
        X_train.append(image)
        y_train.append(i)
        
for i in labels:
    folder = os.path.join("/kaggle/input/brain-tumor-classification-mri/Testing", i)
    
    for j in os.listdir(folder):
        image = cv2.imread(os.path.join(folder, j))
        image = cv2.resize(image, (image_res, image_res))
        X_train.append(image)
        y_train.append(i)
        
X_train = np.array(X_train)
y_train = np.array(y_train)

### Loading and Preprocessing the Image Data

This cell handles the loading, resizing, and organizing of MRI images into training data for the model:

- The image resolution is set to `150x150` (`image_res = 150`).
- Two lists, `X_train` (for images) and `y_train` (for labels), are initialized to store the processed data.
- The `labels` array contains the four types of tumors: `"glioma_tumor"`, `"meningioma_tumor"`, `"no_tumor"`, and `"pituitary_tumor"`.

For both the training and testing datasets, the code:
1. Loads images from the respective directories (`Training` and `Testing`) for each tumor category.
2. Uses OpenCV (`cv2.imread`) to read the images and resizes them to `150x150` pixels (`cv2.resize`).
3. Appends the resized images to `X_train` and their corresponding labels to `y_train`.

Finally, both `X_train` and `y_train` are converted to NumPy arrays for efficient manipulation in the model.

This step prepares the image data for further processing and model training.

---


In [None]:
X_train, y_train = shuffle(X_train, y_train, random_state = 42)
X_train.shape

### Shuffling the Dataset and Checking the Shape

In this cell, the training data (`X_train` and `y_train`) is shuffled to ensure that the order of the images is randomized. This is important because shuffling helps prevent the model from learning patterns based on the order in which data is presented during training.

- The `shuffle` function from `sklearn.utils` is used, with a `random_state` of `42` to ensure reproducibility.
- The shape of `X_train` is displayed to verify the dimensions of the image data after shuffling. This step helps confirm that the dataset is prepared and organized correctly before feeding it into the model.

---


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.1,random_state = 42)

### Splitting the Dataset into Training and Testing Sets

Here, the `train_test_split` function from `sklearn.model_selection` is used to divide the shuffled data into training and testing subsets:
- `X_train` and `y_train` contain the majority of the data, while `X_test` and `y_test` hold a smaller portion for validation.
- The `test_size=0.1` parameter means that 10% of the data is reserved for testing, ensuring that the model's performance can be evaluated on unseen data.
- The `random_state=42` ensures that the split is reproducible.

This step is essential for evaluating the generalization ability of the model by separating a portion of the data that the model won't see during training.

---

In [None]:
y_train_new = []

for i in y_train:
    y_train_new.append(labels.index(i))
y_train=y_train_new
y_train = tf.keras.utils.to_categorical(y_train)


y_test_new = []

for i in y_test:
    y_test_new.append(labels.index(i))
y_test=y_test_new
y_test = tf.keras.utils.to_categorical(y_test)
# ????

### Encoding the Labels and One-Hot Encoding

This cell handles the conversion of the tumor type labels into numerical format and then applies one-hot encoding, which is crucial for categorical classification tasks:

1. **Label Conversion**: 
   - The `labels.index(i)` function is used to convert each label in `y_train` and `y_test` into an integer (e.g., "glioma_tumor" becomes 0, "meningioma_tumor" becomes 1, etc.).
   - This step is required because machine learning models work with numerical data, not strings.

2. **One-Hot Encoding**:
   - After converting the labels into integers, `tf.keras.utils.to_categorical` is applied to transform these integers into one-hot encoded vectors.
   - One-hot encoding is essential in multi-class classification problems as it converts each label into a binary vector of length equal to the number of classes (in this case, 4 classes).

The result is that `y_train` and `y_test` are transformed into matrices where each row is a one-hot encoded representation of the original label.

---

In [None]:
model = Sequential()

# 1st block
model.add(Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)))
model.add(Conv2D(32, (3,3), activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Dropout(0.2))

# 2nd block
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D(2,2))

# 3rd block
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(MaxPooling2D(2,2))

# 4th block
model.add(Conv2D(256, (3,3), activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Dropout(0.3))

# Fully connected layers
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

# Output layer (for multi-class classification)
model.add(Dense(4, activation='softmax'))

### Building the Convolutional Neural Network (CNN) Model

In this cell, a deep Convolutional Neural Network (CNN) is constructed using the `Sequential` model from Keras. The architecture consists of several convolutional blocks followed by fully connected layers:

1. **1st Block**:
   - Two convolutional layers with 32 filters, each using a 3x3 kernel, and ReLU activation.
   - Followed by a Max Pooling layer to downsample the spatial dimensions and a Dropout layer (0.2) to prevent overfitting.

2. **2nd Block**:
   - Two convolutional layers with 64 filters, again with 3x3 kernels and ReLU activation.
   - Followed by a Max Pooling layer for further downsampling.

3. **3rd Block**:
   - Two convolutional layers with 128 filters and ReLU activation, followed by Max Pooling.

4. **4th Block**:
   - One convolutional layer with 256 filters and ReLU activation.
   - Followed by Max Pooling and a stronger Dropout (0.3) to further reduce overfitting.

5. **Fully Connected Layers**:
   - After flattening the feature maps, a dense layer with 512 units and ReLU activation is added.
   - A Dropout layer (0.5) is applied for regularization.
   - Another dense layer with 256 units and ReLU activation is added, followed by a Dropout (0.5).
   
6. **Output Layer**:
   - The final dense layer has 4 units (corresponding to the 4 tumor categories) and uses the `softmax` activation function, which is suited for multi-class classification.

This CNN architecture is designed to progressively extract high-level features from the images while incorporating dropout layers to mitigate overfitting. 

---

In [None]:
model.summary()

### Model Summary

The `model.summary()` function prints a detailed summary of the CNN model architecture, showing:
- The layer-by-layer breakdown, including the type of layers (Conv2D, MaxPooling2D, Dense, etc.).
- The number of parameters for each layer (trainable parameters such as weights and biases).
- The output shape after each layer, providing a clear understanding of how the input image is transformed as it passes through the network.

---

In [None]:
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

### Compiling the Model

In this cell, the model is compiled using the following parameters:

- **Loss Function**: 
  - `categorical_crossentropy` is chosen as the loss function, which is suitable for multi-class classification problems where the output is one-hot encoded. It measures the dissimilarity between the predicted and true label distributions.

- **Optimizer**: 
  - `Adam` optimizer is used for training the model. Adam is popular due to its adaptive learning rate capabilities, which often leads to faster convergence.

- **Metrics**: 
  - The model will track `accuracy` as a performance metric during training and evaluation, providing a straightforward indication of the model's predictive performance on the dataset.

Compiling the model is a critical step before training, as it sets the optimization strategy and evaluation criteria.

---

In [None]:
history = model.fit(X_train, y_train, epochs = 30, validation_split = 0.1)

### Training the Model

In this cell, the model is trained using the `fit` method with the following parameters:

- **Training Data**: 
  - `X_train` (input images) and `y_train` (one-hot encoded labels) are used for training the model.

- **Epochs**: 
  - The model will be trained for 30 epochs, meaning the entire training dataset will be passed through the model 30 times. This allows the model to learn from the data iteratively.

- **Validation Split**: 
  - A validation split of 0.1 indicates that 10% of the training data will be reserved for validation. This means the model's performance will be evaluated on this subset at the end of each epoch, helping to monitor overfitting.

The `history` variable will store the training process details, including the loss and accuracy metrics for both training and validation sets over the epochs. This information is crucial for analyzing the model's performance and making necessary adjustments.

---

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
accuracy = history.history["accuracy"]
validation_accuracy = history.history["val_accuracy"]

epochs = range(len(accuracy))

fig = plt.figure(figsize = (14, 7))
plt.plot(epochs, accuracy, "r", label = "Accuracy of Training")
plt.plot(epochs, validation_accuracy, "b", label = "Accuracy of Validation Accuracy")
plt.legend(loc = 'upper left')

plt.show()

### Visualizing Model Accuracy

In this cell, the training and validation accuracy of the model are visualized using a line plot. The following steps are performed:

- **Extracting Accuracy Data**:
  - The accuracy of the training set is obtained from `history.history["accuracy"]`.
  - The validation accuracy is obtained from `history.history["val_accuracy"]`.

- **Creating Epochs Range**:
  - A range of epochs is generated to correspond with the accuracy values.

- **Plotting**:
  - A figure is created with a size of 14x7 inches.
  - The training accuracy is plotted in red, while the validation accuracy is plotted in blue.
  - A legend is included to distinguish between the training and validation accuracy curves.

This visualization is essential for understanding how well the model learned during training and how it performs on unseen data. It can help identify issues such as overfitting or underfitting, guiding further model improvements.

---

In [None]:
loss = history.history["loss"]
validation_loss = history.history["val_loss"]

epochs = range(len(loss))

fig = plt.figure(figsize = (14, 7))
plt.plot(epochs, loss, "r", label = "Loss of Training")
plt.plot(epochs, validation_loss, "b", label = "Loss of Validation Accuracy")
plt.legend(loc = 'upper left')

plt.show()

### Visualizing Model Loss

In this cell, the training and validation loss of the model are visualized using a line plot. The following steps are carried out:

- **Extracting Loss Data**:
  - The training loss is retrieved from `history.history["loss"]`.
  - The validation loss is retrieved from `history.history["val_loss"]`.

- **Creating Epochs Range**:
  - A range of epochs is generated to align with the loss values.

- **Plotting**:
  - A figure is created with a size of 14x7 inches.
  - The training loss is plotted in red, while the validation loss is plotted in blue.
  - A legend is included to differentiate between the training and validation loss curves.

This visualization is crucial for assessing the model's performance during training. A decreasing trend in both training and validation loss indicates that the model is learning effectively. If the validation loss begins to rise while the training loss continues to decrease, this could signal overfitting, suggesting that further adjustments may be needed.

---

In [None]:
image = cv2.imread('/kaggle/input/brain-tumor-classification-mri/Training/pituitary_tumor/p (112).jpg')
image = cv2.resize(image, (150, 150))
image_array = np.array(image)
image_array.shape

### Loading and Preprocessing a Sample Image

In this cell, a sample image from the training dataset is loaded and preprocessed for evaluation. The following steps are performed:

- **Image Loading**:
  - The image is read from the specified file path using `cv2.imread()`.

- **Image Resizing**:
  - The image is resized to 150x150 pixels using `cv2.resize()`. This is necessary to ensure the image dimensions match the input shape expected by the model.

- **Converting to Array**:
  - The resized image is converted to a NumPy array using `np.array()`. This conversion allows the image data to be used as input for the model.

- **Displaying Shape**:
  - The shape of the resulting image array is displayed. This shape will indicate the dimensions and number of color channels in the image (e.g., `(150, 150, 3)` for a 150x150 RGB image).

---

In [None]:
image_array = image_array.reshape(1, 150, 150, 3)
image_array.shape

### Reshaping the Image Array for Model Input

In this cell, the shape of the previously prepared image array is modified to make it suitable for input into the model. The following steps are performed:

- **Reshaping the Array**:
  - The image array, originally with the shape `(150, 150, 3)`, is reshaped using `reshape(1, 150, 150, 3)`. This adds a new dimension at the beginning, transforming the array into the shape `(1, 150, 150, 3)`.
  - The new shape indicates that the array now contains one image with dimensions 150x150 pixels and 3 color channels (RGB).

This reshaping step is essential because the model expects input data to be in batches, even if there is only a single image. It ensures compatibility with the model's input layer, allowing the model to process the image correctly for prediction.

---

In [None]:
from tensorflow.keras.preprocessing import image
img = image.load_img('/kaggle/input/brain-tumor-classification-mri/Training/glioma_tumor/gg (107).jpg')
plt.imshow(img,interpolation='nearest')
plt.show()

### Visualizing a Sample Image from the Dataset

In this cell, a sample image from the training dataset is loaded and displayed. The following steps are performed:

- **Image Loading**:
  - The image is loaded using `image.load_img()` from TensorFlow's Keras preprocessing module. This function allows for loading images directly from file paths while automatically handling different formats.

- **Displaying the Image**:
  - The loaded image is displayed using `plt.imshow()`, with the `interpolation` parameter set to 'nearest' to control the rendering of the image.
  - `plt.show()` is called to render the image in the output cell.

This visualization step helps in understanding the type of data being used for model training. By inspecting sample images, one can gain insights into the quality and characteristics of the dataset, which is important for evaluating the model's potential performance.

---

In [None]:
pre = model.predict(image_array)
indices = pre.argmax()
indices

### Making Predictions with the Model

In this cell, the model is used to make predictions on the preprocessed image. The following steps are performed:

- **Model Prediction**:
  - The model's `predict()` method is called with the reshaped `image_array` as input. This method generates predictions for the class of the tumor represented in the image.

- **Determining the Class Index**:
  - The predictions are processed using `argmax()` to find the index of the class with the highest predicted probability. This index corresponds to the predicted class label.

- **Displaying the Class Index**:
  - The resulting index (`indices`) is displayed. This index can be mapped back to the corresponding tumor type using the predefined `labels` list.

---

### Brain Tumor Classification Project Overview

This project aims to develop a machine learning model for the classification of brain tumor types using MRI images. The primary goal is to accurately predict tumor types, specifically glioma, meningioma, pituitary, and no tumor. Below is an overview of the key steps undertaken in this project:

1. **Library Imports**:
   - Essential libraries such as TensorFlow, Keras, OpenCV, and others are imported to facilitate data processing, model building, and image handling.

2. **Data Exploration**:
   - The dataset is explored by listing the available files and their structure. This helps understand the data organization and the number of images for each tumor type.

3. **Data Preparation**:
   - MRI images are read and resized to a uniform dimension of 150x150 pixels. Labels corresponding to each image are created to prepare for supervised learning.
   - The training dataset is shuffled to ensure randomness and split into training and testing sets for model validation.

4. **Label Encoding**:
   - The categorical labels are converted into a numerical format, and one-hot encoding is applied to prepare the labels for model training.

5. **Model Architecture**:
   - A Convolutional Neural Network (CNN) is built using Keras with multiple convolutional layers, pooling layers, and dropout layers to enhance feature extraction and reduce overfitting.

6. **Model Compilation and Training**:
   - The model is compiled with categorical cross-entropy loss and the Adam optimizer. It is then trained for 30 epochs with a portion of the data reserved for validation.

7. **Performance Evaluation**:
   - The training and validation accuracy and loss are plotted to visualize the model's performance and convergence behavior over the epochs.

8. **Predictions**:
   - Finally, the model is used to make predictions on new images, determining the predicted tumor type based on the trained model.

This project illustrates the end-to-end process of building a deep learning model for image classification, focusing on medical applications in brain tumor diagnosis.

---