# Training & Evaluation
* In this notebook we are going to 2 kinds of object detection models,
    * An Object Detection CNN from scratch, the hypothesis is that since the dataset is simple, we won't need an expert model for object detection. A simple CNN should be able to give us a descent accuracy.
    * Object detection using transfer learning - This is going to more of a hands on experience of transfer learning. 

## Import Libraries

In [1]:
## import libraries
import pandas as pd
import numpy as np
from pathlib import Path
import tensorflow as tf
import matplotlib.pyplot as plt

2025-07-23 14:09:02.199401: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753304942.213800   62514 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753304942.218254   62514 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1753304942.230164   62514 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753304942.230179   62514 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753304942.230181   62514 computation_placer.cc:177] computation placer alr

In [2]:
## validate tensorflow 
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


## Constants

In [3]:
data_dir = Path("..","data")
models_dir = Path("..","models")

## Import Scripts

In [4]:
import os
import sys
# Build an absolute path from this notebook's parent directory
module_path = os.path.abspath(os.path.join('..'))

# Add to sys.path if not already present
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src import data_generator

## logic to auto reload scripts without restarting the kernel
%load_ext autoreload
%autoreload 2

## Training Object Detection CNN

### Step 1: Import Data

In [5]:
data = pd.read_csv(Path(data_dir,"raw","raw_mnist_data.csv"))

### Step 2: Split Data

In [6]:
raw_images = data.drop(columns=["class"])
raw_labels = data["class"]
raw_images.shape, raw_labels.shape

((70000, 784), (70000,))

### Step 3: Initialize Map Pipeline

In [7]:
X_tensor = tf.convert_to_tensor(raw_images, dtype=tf.float32)
X_tensor = tf.reshape(X_tensor,shape=(-1,28,28,1))
y_tensor = tf.convert_to_tensor(raw_labels, dtype=tf.float32)


raw_dataset = tf.data.Dataset.from_tensor_slices((X_tensor,y_tensor))

processed_dataset = raw_dataset.map(lambda X,y: tf.numpy_function(data_generator.generate_training_example, inp=[X,y],Tout=(tf.float32,tf.float32)), num_parallel_calls=15);

I0000 00:00:1753304954.919051   62514 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6055 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:2d:00.0, compute capability: 7.5


### Step 4: Initialize the Model

In [85]:
model = tf.keras.Sequential([

    tf.keras.layers.Rescaling(scale=1./255),

    ## starting with a larger filter since we are dealing with 100x100x1 image
    tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    ## rest of the layers are same as our original mnist classifier
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    ## finaly layers to output 6x6x45 grid of predictions
    tf.keras.layers.Conv2D(filters=45, kernel_size=1, padding='same', activation='linear'),

])

In [86]:
model.summary()

### Step 4.1 Define Custom Loss Function
Right now our model output has the shape of 6x6x45. Which means we have 36 grid cells, each with 3 possible bounding box and each bounding box is defined by 1 cell for objectness i.e. is a object present this this box, 4 cells for bounding box dimensions (2 for center coordinates and 2 for width and height), and finally 10 cells for one hot encoded classification of digits from 0 to 9.

We can break down prediction of these slices into different machine learning problems, for e.g.

- 0/1 for objectness score is a binary classification problem
- 4 digit bounding box prediction is a regression problem
- 10 digits number prediction is multi-class classification problem

Since these are different ML problems we cannot have same activation function for them, so our final activation layer needs to be linear (i.e. no activation) and then we’ll apply specific activation to specific slice based on the prediction we want. So

- 0/1 problem will be activated using `Sigmoid Activation` function
- 4 digit bounding box will be same, a linear activation
- 10 digit number problem will be activated using `Softmax Activation`

Similarly we’ll use different loss functions for all 3, so

- 0/1 problem will use LogLoss function
- 4 digit bounding box will use RMSE or MSE loss function
- 10 digit class prediction we’ll use `SparseCategoricalCrossEntropy`

#### Match Making
* We'll also need to find the right grid to calculate the loss function against. We do that by finding the grid cell in which center of ground truth might be present.
* In our case we have a 6x6 grid, so 36 cells and lets say we just have 2 images in our ground truth and 3 anchor boxes.
* In our first step out of 36 cells we find one or may be 2 cells where the center of these cells lie
* For each of cell we'll have 3 anchor boxes and so we'll calculate the IOUs and the 2 with maximum IOUs will win
* We take those 2 anchor box, slice the output, apply activation function for objectness and 10 digit classification. Keep coordinates as it is since output of final layer is already linear.

In [87]:


def calculate_model_loss(y_true, y_pred):
    total_loss = 0
    print(f"y_true.shape {y_true.shape}")
    print(f"y_pred.shape {y_pred}")
    
    return tf.reduce_sum(y_pred)


def calculate_model_metrics(y_true, y_pred):
    return 0.9

In [88]:
# This is temp code to test the loss function do not use this for training.
X_tensor = tf.convert_to_tensor(raw_images.iloc[0:10], dtype=tf.float32)
X_tensor = tf.reshape(X_tensor, shape=(-1, 28, 28, 1))
y_tensor = tf.convert_to_tensor(raw_labels.iloc[0:10], dtype=tf.float32)


raw_dataset = tf.data.Dataset.from_tensor_slices((X_tensor, y_tensor))


def generative_py_function(func, inp, Tout, shape_out):
    # This is the bridge that calls your NumPy code
    y = tf.numpy_function(func, inp, Tout)
    # This is the crucial step: re-apply the shape information
    y[0].set_shape(shape_out[0]) # Set shape for the image
    y[1].set_shape(shape_out[1]) # Set shape for the labels
    return y

# Define the exact output shapes you expect
output_shapes = ([100, 100, 1], [5, 15]) 
# Define the exact output data types you expect
output_types = (tf.float32, tf.float32)


# processed_dataset = raw_dataset.map(lambda X, y: tf.numpy_function(data_generator.generate_training_example, inp=[X, y], Tout=(
#     tf.float32, tf.float32)), num_parallel_calls=15);

# Use the wrapper inside the map
processed_dataset = raw_dataset.map(lambda X, y: generative_py_function(
    data_generator.generate_training_example, 
    inp=[X, y], 
    Tout=output_types, # Pass the dtypes to Tout
    shape_out=output_shapes # Pass the shapes to our new argument
)).batch(batch_size=5)

In [91]:
model.compile(optimizer='adam',
              loss=calculate_model_loss,
              metrics=[calculate_model_metrics])

In [92]:
epochs = 1

model.fit(processed_dataset,epochs=epochs)

y_true.shape (None, 5, 15)
y_pred.shape Tensor("sequential_5_1/conv2d_53_1/BiasAdd:0", shape=(None, 6, 6, 45), dtype=float32)
y_true.shape (None, 5, 15)
y_pred.shape Tensor("sequential_5_1/conv2d_53_1/BiasAdd:0", shape=(None, 6, 6, 45), dtype=float32)


I0000 00:00:1753316503.916880   62889 service.cc:152] XLA service 0x7f0ecc027810 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1753316503.916925   62889 service.cc:160]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 SUPER, Compute Capability 7.5
2025-07-23 17:21:44.008538: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1753316504.369609   62889 cuda_dnn.cc:529] Loaded cuDNN version 90300
2025-07-23 17:21:45.261050: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.29 = (f32[5,8,50,50]{3,2,1,0}, u8[0]{0}) custom-call(f32[5,8,50,50]{3,2,1,0} %bitcast.4457, f32[8,8,3,3]{3,2,1,0} %bitcast.3642, f32[8]{0} %bitcast.4517), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="_

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 18ms/step - calculate_model_metrics: 0.9000 - loss: -17.2397


I0000 00:00:1753316507.235004   62889 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


<keras.src.callbacks.history.History at 0x7f0fc7009700>