# **(Modeling And Evaluation)**

## Objectives

* Build and train a custom convolutional neural network (CNN) from scratch for tumor detection in CT scans.
* Tune hyperparameters to optimize model performance.
* Evaluate the model using accuracy, recall, and inference time metrics.
* Generate model predictions and confidence scores for downstream visualization.
* Prepare the model and outputs for integration with the Streamlit dashboard.

## Inputs

* Preprocessed and augmented image data and metadata from the DataCollection notebook.
* Train/validation/test splits.
* Any configuration files or parameters for model training.

## Outputs

* Trained custom CNN model (saved in a suitable format, e.g., .h5 or .pb).
* Evaluation metrics (accuracy, recall, inference time) and confusion matrix.
* Model predictions and confidence scores for each sample.
* Artifacts for dashboard integration (e.g., prediction results, model files).

## Additional Comments

* The model should be compact enough for real-time inference (<1.5 sec/sample).
* Early stopping and validation loss monitoring should be used to prevent overfitting.
* All outputs will be used in the DataVisualization notebook and Streamlit dashboard.

---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [None]:
import os
current_dir = os.getcwd()
current_dir

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [None]:
os.chdir('/workspaces/brain-tumor-classification')
print("Current working directory:", os.getcwd())

Confirm the new current directory

**Environment Setup, Data loading and preparation**

Core libraries

In [None]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import random
import warnings

# Set random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
random.seed(SEED)

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

Data Loading & Splitting

In [None]:
train_dir = "inputs/brain_tumor_dataset/images/train"
val_dir = "inputs/brain_tumor_dataset/images/val"
test_dir = "inputs/brain_tumor_dataset/images/test"

In [None]:
def count_images(directory):
    total = 0
    for label in os.listdir(directory):
        class_path = os.path.join(directory, label)
        if os.path.isdir(class_path):
            total += len([f for f in os.listdir(class_path) if f.lower().endswith(('.png', '.jpg', '.jpeg'))])
    return total

print("Train images:", count_images(train_dir))
print("Validation images:", count_images(val_dir))
print("Test images:", count_images(test_dir))

---

Data Preparation & Normalization

Define Utility Function for tf.data Pipeline

In [None]:

IMG_SIZE = (224, 224)
BATCH_SIZE = 8



Build File Path and Label Lists

In [None]:
import glob

def get_file_paths_and_labels(data_dir):
    class_names = sorted(os.listdir(data_dir))
    file_paths = []
    labels = []
    for idx, class_name in enumerate(class_names):
        class_dir = os.path.join(data_dir, class_name)
        if os.path.isdir(class_dir):
            files = glob.glob(os.path.join(class_dir, '*'))
            file_paths.extend(files)
            labels.extend([idx] * len(files))
    return file_paths, labels, class_names

train_files, train_labels, class_names = get_file_paths_and_labels(train_dir)
val_files, val_labels, _ = get_file_paths_and_labels(val_dir)
test_files, test_labels, _ = get_file_paths_and_labels(test_dir)

print("Classes:", class_names)
print("Train samples:", len(train_files))
print("Validation samples:", len(val_files))
print("Test samples:", len(test_files))

because all images are PNG , this is a result of a debug. (34)

In [None]:
def preprocess_image(file_path, label):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.resize(img, IMG_SIZE)
    img = tf.cast(img, tf.float32) / 255.0
    return img, label

Create tf.data Datasets

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices((train_files, train_labels))
val_ds = tf.data.Dataset.from_tensor_slices((val_files, val_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_files, test_labels))

train_ds = train_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
test_ds = test_ds.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)

train_ds = train_ds.shuffle(buffer_size=1000).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

Confirm Class Balance in Training Set

In [None]:
import numpy as np

unique, counts = np.unique(train_labels, return_counts=True)
class_balance = dict(zip(class_names, counts))
print("Class balance in training set:", class_balance)

---

Model Architecture Design

In [None]:
from tensorflow.keras import layers, models

def build_custom_cnn(input_shape=(224, 224, 3)):
    model = models.Sequential([
        layers.InputLayer(input_shape=input_shape),
        layers.Normalization(),  # Input normalization layer

        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),

        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),

        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),

        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

model = build_custom_cnn()
model.summary()

Model Compilation.

Choose an appropriate optimizer (e.g., Adam), loss function (e.g., binary crossentropy), and evaluation metrics (accuracy, recall).
Compile the model with these settings.

Compile the Model

In [None]:
from tensorflow.keras import optimizers, metrics

model.compile(
    optimizer=optimizers.Adam(learning_rate=1e-4),
    loss='binary_crossentropy',
    metrics=['accuracy', metrics.Recall()]
)

Run this to check if tensorflow could not be resolved error means tensorflow is not installed in your instance, in this case it is installed and we have no problem .

In [None]:
import sys
print(sys.executable)

# Recall-Focused Model Training.

Train the model on the training set, validating on the validation set.
Use callbacks such as EarlyStopping and ModelCheckpoint, monitoring validation recall.
Log training and validation metrics for each epoch.
Track the precision-recall tradeoff.


Set Up Callbacks

In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

callbacks = [
    EarlyStopping(monitor='val_recall', patience=5, mode='max', restore_best_weights=True, verbose=1),
    ModelCheckpoint('best_model.keras', monitor='val_recall', mode='max', save_best_only=True, verbose=1)
]

In [None]:
with open(".gitignore", "a") as f:
    f.write("\n# Ignore model files\n*.h5\n*.keras\nbest_model.h5\nmy_model.keras\n")

In [None]:
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=30,
    callbacks=callbacks
)

In [None]:
model.save("my_model.keras")
print("Model saved locally as my_model.keras")