# **(Modeling And Evaluation)**

## Objectives

* Build and train a custom convolutional neural network (CNN) from scratch for tumor detection in CT scans.
* Tune hyperparameters to optimize model performance.
* Evaluate the model using accuracy, recall, and inference time metrics.
* Generate model predictions and confidence scores for downstream visualization.
* Prepare the model and outputs for integration with the Streamlit dashboard.

## Inputs

* Preprocessed and augmented image data and metadata from the DataCollection notebook.
* Train/validation/test splits.
* Any configuration files or parameters for model training.

## Outputs

* Trained custom CNN model (saved in a suitable format, e.g., .h5 or .pb).
* Evaluation metrics (accuracy, recall, inference time) and confusion matrix.
* Model predictions and confidence scores for each sample.
* Artifacts for dashboard integration (e.g., prediction results, model files).

## Additional Comments

* The model should be compact enough for real-time inference (<1.5 sec/sample).
* Early stopping and validation loss monitoring should be used to prevent overfitting.
* All outputs will be used in the DataVisualization notebook and Streamlit dashboard.

---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspaces/brain-tumor-classification/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspaces/brain-tumor-classification'

**Environment Setup, Data loading and preparation**

Core libraries

In [5]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix
import time
from tqdm import tqdm
import random
import warnings

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
random.seed(SEED)

warnings.filterwarnings('ignore')

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")


TensorFlow version: 2.19.0
Keras version: 3.10.0
Numpy version: 1.26.1
Pandas version: 2.1.1


Data Loading

In [6]:
train_dir = "inputs/brain_tumor_dataset/images/train"
val_dir = "inputs/brain_tumor_dataset/images/val"
test_dir = "inputs/brain_tumor_dataset/images/test"

Use Keras utilities to load images and labels directly from directories

In [7]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMG_SIZE = (224, 224)
BATCH_SIZE = 32

# Data generators (no augmentation for val/test)
train_datagen = ImageDataGenerator(rescale=1./255)
val_test_datagen = ImageDataGenerator(rescale=1./255)

train_gen = train_datagen.flow_from_directory(
    train_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary'
)

val_gen = val_test_datagen.flow_from_directory(
    val_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=False
)

test_gen = val_test_datagen.flow_from_directory(
    test_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=False
)

Found 7030 images belonging to 2 classes.
Found 1054 images belonging to 2 classes.
Found 1054 images belonging to 2 classes.


Check the number of samples in each split

In [8]:
print(f"Train samples: {train_gen.samples}")
print(f"Validation samples: {val_gen.samples}")
print(f"Test samples: {test_gen.samples}")
print(f"Class indices: {train_gen.class_indices}")

Train samples: 7030
Validation samples: 1054
Test samples: 1054
Class indices: {'notumor': 0, 'tumor': 1}


Confirm image shape and normalization from the generator

In [9]:
batch_x, batch_y = next(train_gen)
print("Batch image shape:", batch_x.shape) 
print("Pixel value range: min =", batch_x.min(), ", max =", batch_x.max())  

Batch image shape: (32, 224, 224, 3)
Pixel value range: min = 0.0 , max = 1.0


Check class balance in the training set

In [10]:
import numpy as np

labels = []
for i in range(len(train_gen)):
    _, batch_labels = train_gen[i]
    labels.extend(batch_labels)
labels = np.array(labels)
unique, counts = np.unique(labels, return_counts=True)
class_balance = dict(zip([k for k in train_gen.class_indices], counts))
print("Class balance in training set:", class_balance)

Class balance in training set: {'notumor': 3515, 'tumor': 3515}


---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
