# Training & Evaluation
* In this notebook we are going to 2 kinds of object detection models,
    * An Object Detection CNN from scratch, the hypothesis is that since the dataset is simple, we won't need an expert model for object detection. A simple CNN should be able to give us a descent accuracy.
    * Object detection using transfer learning - This is going to more of a hands on experience of transfer learning. 

## Import Libraries

In [1]:
## import libraries
import pandas as pd
import numpy as np
from pathlib import Path
import tensorflow as tf
import matplotlib.pyplot as plt

2025-07-29 23:35:28.517643: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753857328.533077  438852 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753857328.537871  438852 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1753857328.550861  438852 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753857328.550888  438852 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753857328.550889  438852 computation_placer.cc:177] computation placer alr

In [2]:
## validate tensorflow 
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


## Constants

In [3]:
data_dir = Path("..","data")
models_dir = Path("..","models")

## Import Scripts

In [4]:
import os
import sys
# Build an absolute path from this notebook's parent directory
module_path = os.path.abspath(os.path.join('..'))

# Add to sys.path if not already present
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src import data_generator,training_utils

## logic to auto reload scripts without restarting the kernel
%load_ext autoreload
%autoreload 2

## Training Object Detection CNN

### Step 1: Import Data

In [5]:
data = pd.read_csv(Path(data_dir,"raw","raw_mnist_data.csv"))

### Step 2: Split Data

In [6]:
raw_images = data.drop(columns=["class"])
raw_labels = data["class"]
raw_images.shape, raw_labels.shape

((70000, 784), (70000,))

### Step 3: Initialize Map Pipeline

In [7]:
X_tensor = tf.convert_to_tensor(raw_images, dtype=tf.float32)
X_tensor = tf.reshape(X_tensor,shape=(-1,28,28,1))
y_tensor = tf.convert_to_tensor(raw_labels, dtype=tf.float32)


raw_dataset = tf.data.Dataset.from_tensor_slices((X_tensor,y_tensor))

processed_dataset = raw_dataset.map(lambda X,y: tf.numpy_function(data_generator.generate_training_example, inp=[X,y],Tout=(tf.float32,tf.float32)), num_parallel_calls=15);

I0000 00:00:1753857337.672225  438852 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6055 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:2d:00.0, compute capability: 7.5


### Step 4: Initialize the Model

In [8]:
model = tf.keras.Sequential([

    tf.keras.layers.Rescaling(scale=1./255),

    ## starting with a larger filter since we are dealing with 100x100x1 image
    tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    ## rest of the layers are same as our original mnist classifier
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    ## finaly layers to output 6x6x45 grid of predictions
    tf.keras.layers.Conv2D(filters=45, kernel_size=1, padding='same', activation='linear'),

])

In [9]:
model.summary()

### Step 4.1 Define Custom Loss Function
Right now our model output has the shape of 6x6x45. Which means we have 36 grid cells, each with 3 possible bounding box and each bounding box is defined by 1 cell for objectness i.e. is a object present this this box, 4 cells for bounding box dimensions (2 for center coordinates and 2 for width and height), and finally 10 cells for one hot encoded classification of digits from 0 to 9.

We can break down prediction of these slices into different machine learning problems, for e.g.

- 0/1 for objectness score is a binary classification problem
- 4 digit bounding box prediction is a regression problem
- 10 digits number prediction is multi-class classification problem

Since these are different ML problems we cannot have same activation function for them, so our final activation layer needs to be linear (i.e. no activation) and then we’ll apply specific activation to specific slice based on the prediction we want. So

- 0/1 problem will be activated using `Sigmoid Activation` function
- 4 digit bounding box will be same, a linear activation
- 10 digit number problem will be activated using `Softmax Activation`

Similarly we’ll use different loss functions for all 3, so

- 0/1 problem will use LogLoss function
- 4 digit bounding box will use RMSE or MSE loss function
- 10 digit class prediction we’ll use `SparseCategoricalCrossEntropy`

#### Match Making
* We'll also need to find the right grid to calculate the loss function against. We do that by finding the grid cell in which center of ground truth might be present.
* In our case we have a 6x6 grid, so 36 cells and lets say we just have 2 images in our ground truth and 3 anchor boxes.
* In our first step out of 36 cells we find one or may be 2 cells where the center of these cells lie
* For each of cell we'll have 3 anchor boxes and so we'll calculate the IOUs and the 2 with maximum IOUs will win
* We take those 2 anchor box, slice the output, apply activation function for objectness and 10 digit classification. Keep coordinates as it is since output of final layer is already linear.

Quick Notes on how to Match make
* Create a empty 100x100 grid (we can also create constant for grid cells)
* For each cell check if 

##### Cutom Training Loop

* The cell below manually runs the model for one instance of dataset. This was used to verify and debug custom loss and metrics function. 

In [10]:
# This is temp code to test the loss function do not use this for training.
X_tensor = tf.convert_to_tensor(raw_images.iloc[0:10], dtype=tf.float32)
X_tensor = tf.reshape(X_tensor, shape=(-1, 28, 28, 1))
y_tensor = tf.convert_to_tensor(raw_labels.iloc[0:10], dtype=tf.float32)


raw_dataset = tf.data.Dataset.from_tensor_slices((X_tensor, y_tensor))


def generative_py_function(func, inp, Tout, shape_out):
    # This is the bridge that calls your NumPy code
    y = tf.numpy_function(func, inp, Tout)
    # This is the crucial step: re-apply the shape information
    y[0].set_shape(shape_out[0]) # Set shape for the image
    y[1].set_shape(shape_out[1]) # Set shape for the labels
    return y

# Define the exact output shapes you expect
output_shapes = ([100, 100, 1], [5, 15]) 
# Define the exact output data types you expect
output_types = (tf.float32, tf.float32)


processed_dataset = raw_dataset.map(lambda X, y: tf.numpy_function(data_generator.generate_training_example, inp=[X, y], Tout=(
    tf.float32, tf.float32)), num_parallel_calls=15);

# Use the wrapper inside the map
processed_dataset = raw_dataset.map(lambda X, y: generative_py_function(
    data_generator.generate_training_example, 
    inp=[X, y], 
    Tout=output_types, # Pass the dtypes to Tout
    shape_out=output_shapes # Pass the shapes to our new argument
)).batch(batch_size=8)

model.compile(optimizer='adam',
              loss=training_utils.calculate_model_loss,
              metrics=[training_utils.objectness_metrics, training_utils.bounding_box_metrics, training_utils.classification_metrics])
# Step 1: Get one batch of data from your dataset pipeline
# The .take(1) method creates a new dataset with only the first element.
one_batch = processed_dataset.take(1)

# Step 2: Iterate over the single batch to get the tensors
for images, labels in one_batch:
    
    # --- THIS IS YOUR DEBUGGING ZONE ---
    # Now you have the concrete tensors for one batch.
    # You can inspect them with regular print() and .numpy()
    
    # print("--- Inspecting Data Before Loss Calculation ---")
    # print("Shape of images (X_batch):", images.shape)
    # print("Shape of labels (y_true_batch):", labels.shape)
    # print("\nSample y_true label tensor:\n", labels.numpy()[0]) # Print the first label in the batch
    # ------------------------------------

    # Step 3: Manually run the forward pass and gradient calculation
    with tf.GradientTape() as tape:
        
        # Get the model's raw predictions for this batch
        y_pred = model(images, training=True)  # Pass the images through the model
        
        # --- MORE DEBUGGING ---
        print("\n--- Inspecting Tensors Passed to Loss Function ---")
        print("Shape of y_pred from model:", y_pred.shape)
        # print("\nSample y_pred tensor (first 5 values of first anchor):\n", y_pred.numpy()[0, 0, 0, :5])
        # ----------------------

        # Call your custom loss function
        # You can now add print statements INSIDE your loss function too!
        loss_value = training_utils.calculate_model_loss(labels, y_pred)
        
        print("\n--- Final Calculated Loss ---")
        
        print("Total Loss for the batch:", loss_value.numpy())
        # -----------------------------

    # Step 4 (Optional): Calculate and apply gradients to see the full loop
    # grads = tape.gradient(loss_value, model.trainable_variables)
    # model.optimizer.apply_gradients(zip(grads, model.trainable_variables))
    
    print("\n--- Manual Step Complete ---")
    

I0000 00:00:1753857339.485985  438852 cuda_dnn.cc:529] Loaded cuDNN version 90300



--- Inspecting Tensors Passed to Loss Function ---
Shape of y_pred from model: (8, 6, 6, 45)
----- True Values -----
y_true.shape (8, 5, 15)
----- Pred Values -----
y_pred.shape (8, 6, 6, 45)
anchor_boxes.shape (8, 6, 6, 3, 15)
grid indices shape (8, 5, 2)
selected_anchor_boxes.shape :(8, 5, 3, 15)
anchor_boxes.shape (8, 6, 6, 3, 15)
selected_anchor_boxes.shape :(8, 5, 3, 15)
y_true_boxes.shape (8, 5, 1, 4)
y_pred_boxes.shape (8, 5, 3, 4)
intersection_box_corners.shape (8, 5, 3, 4)
intersection_area.shape (8, 5, 3)
union_area.shape (8, 5, 3)
iou.shape (8, 5, 3)
highest_iou_index.shape (8, 5)
expanded highest_iou_index.shape (8, 5, 1)
best_anchor_boxes.shape (8, 5, 1, 15)
true_values_with_objects.shape : (16, 15)
predicted_values_with_objects.shape : (16, 15)
y_pred_objectness.shape : (16,)
y_pred_bounding_box.shape : (16, 4)
y_pred_classification.shape : (16, 10)
y_true_objectness.shape : (16,)
y_true_bounding_box.shape : (16, 4)
y_true_classification.shape : (16, 10)
Post Activation 

2025-07-29 23:35:40.748492: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


### Step 4.2 Train the Model for 20 Epochs

In [11]:
## train test split
from sklearn.model_selection import train_test_split, StratifiedShuffleSplit

splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=42)

for train_idx, test_idx in splitter.split(data,data["class"]):
    mnist_dev_set = data.iloc[train_idx].reset_index(drop=True)
    mnist_test_set = data.iloc[test_idx].reset_index(drop=True)

mnist_dev_set.shape,mnist_test_set.shape

((63000, 785), (7000, 785))

In [12]:
raw_images = mnist_dev_set.drop(columns=["class"])
raw_labels = mnist_dev_set["class"]

In [13]:
X_tensor = tf.convert_to_tensor(raw_images, dtype=tf.float32)
X_tensor = tf.reshape(X_tensor,shape=(-1,28,28,1))
y_tensor = tf.convert_to_tensor(raw_labels, dtype=tf.float32)
batch_size = 32

raw_dataset = tf.data.Dataset.from_tensor_slices((X_tensor, y_tensor))

def generative_py_function(func, inp, Tout, shape_out):
    # This is the bridge that calls your NumPy code
    y = tf.numpy_function(func, inp, Tout)
    # This is the crucial step: re-apply the shape information
    y[0].set_shape(shape_out[0]) # Set shape for the image
    y[1].set_shape(shape_out[1]) # Set shape for the labels
    return y

# Define the exact output shapes you expect
output_shapes = ([100, 100, 1], [5,15]) 
# Define the exact output data types you expect
output_types = (tf.float32, tf.float32)

# Use the wrapper inside the map
processed_dataset = raw_dataset.map(lambda X, y: generative_py_function(
    data_generator.generate_training_example, 
    inp=[X, y], 
    Tout=output_types, # Pass the dtypes to Tout
    shape_out=output_shapes # Pass the shapes to our new argument
)).batch(batch_size=batch_size)

In [14]:
inputs = tf.keras.Input(shape=(100,100,1),batch_size=batch_size ,name="input_layer")

x = tf.keras.layers.Rescaling(scale=1./255, name="rescaling")(inputs)

x = tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D()(x)

x = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D()(x)

outputs = tf.keras.layers.Conv2D(filters=45, kernel_size=1, padding='same', activation='linear')(x)

# Define the final model by specifying its inputs and outputs
model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.summary()

In [15]:
# step 4: Define the callbacks
checkpoint_filepath = '../models/experiment_1_{epoch:02d}.keras'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='loss',
    mode='min',
    save_best_only=True,
    save_freq="epoch",
    verbose=1,
    )

In [29]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001,clipnorm=1.0),
              loss=training_utils.calculate_model_loss,
              metrics=[training_utils.objectness_metrics, training_utils.bounding_box_metrics, training_utils.classification_metrics])
## step 5: Fit the model
epochs=5

history = model.fit(
  processed_dataset,
  epochs=epochs,
  callbacks=[model_checkpoint_callback]
)

Epoch 1/5
----- True Values -----
y_true.shape (None, 5, 15)
----- Pred Values -----
y_pred.shape (None, 6, 6, 45)
anchor_boxes.shape (None, 6, 6, 3, 15)
grid indices shape (None, 5, 2)
selected_anchor_boxes.shape :(None, 5, 3, 15)
anchor_boxes.shape (None, 6, 6, 3, 15)
selected_anchor_boxes.shape :(None, 5, 3, 15)
y_true_boxes.shape (None, 5, 1, 4)
y_pred_boxes.shape (None, 5, 3, 4)
intersection_box_corners.shape (None, 5, 3, 4)
intersection_area.shape (None, 5, 3)
union_area.shape (None, 5, 3)
iou.shape (None, 5, 3)
highest_iou_index.shape (None, 5)
expanded highest_iou_index.shape (None, 5, 1)
best_anchor_boxes.shape (None, 5, 1, 15)
true_values_with_objects.shape : (None, 15)
predicted_values_with_objects.shape : (None, 15)
y_pred_objectness.shape : (None,)
y_pred_bounding_box.shape : (None, 4)
y_pred_classification.shape : (None, 10)
y_true_objectness.shape : (None,)
y_true_bounding_box.shape : (None, 4)
y_true_classification.shape : (None, 10)
Post Activation y_pred_objectness.sh

NotImplementedError: Cannot convert a symbolic tf.Tensor (compile_loss/calculate_model_loss/strided_slice_46:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.