# Deep Learning Pet Classifier: A Comprehensive Guide
## From First Principles to Production-Grade CNNs

---

**Original notebook:** Basic TF 1.x CNN with 2 convolutional layers for cats vs. dogs (40 training images).  
**This version:** A complete, modern TensorFlow 2 / Keras rewrite covering theory, three model tiers, interpretability, and production deployment.

---

## Section 1: Introduction & Learning Objectives

### What You Will Learn

This notebook is a self-contained deep-learning course built around a single, concrete task: **classifying images of cats and dogs**. By the time you reach the final cell you will have:

1. **Understood** how Convolutional Neural Networks (CNNs) see images -- from raw pixels, through learned filters, to high-level features.
2. **Built** three progressively more powerful classifiers and compared them head-to-head.
3. **Interpreted** model decisions with Grad-CAM heat-maps and layer-wise feature-map visualisations.
4. **Prepared** a model for production: SavedModel export, TensorFlow Lite conversion, and quantisation.

### Skill-Level Road Map

| Level | Sections | What You Will Do |
|-------|----------|------------------|
| **Beginner** | 1 -- 6 | Set up the environment, explore the data, build and train a simple CNN. |
| **Intermediate** | 7 | Add Batch Normalisation, learning-rate schedules, callbacks, and deeper architectures. |
| **Advanced** | 8 | Apply transfer learning with MobileNetV2 (feature extraction + fine-tuning). |
| **Architect** | 9 -- 12 | Evaluate models with Grad-CAM, export for mobile, discuss serving, and tackle design challenges. |

### Prerequisites

- Python 3.8+
- Basic familiarity with NumPy and Matplotlib
- Conceptual understanding of what a neural network is (activation functions, backpropagation)
- A GPU runtime is **strongly recommended** (Google Colab provides one for free)

### How to Use This Notebook

Run cells **in order**. Each section builds on the previous one. Markdown cells provide the theory; code cells let you experiment. Exercises at the end encourage you to go further.

---
## Section 2: Environment Setup

We use **TensorFlow 2.x** with the integrated `tf.keras` API throughout this notebook. No legacy `tf.compat.v1` code is needed.

In [None]:
# ============================================================
# 2-A  Core imports
# ============================================================
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import os
import pathlib

# Reproducibility
tf.random.set_seed(42)
np.random.seed(42)

# Version check
print(f"TensorFlow Version : {tf.__version__}")
print(f"NumPy Version      : {np.__version__}")
print(f"GPU available      : {tf.config.list_physical_devices('GPU')}")

# Mixed-precision (optional speed boost on modern GPUs)
# Uncomment the next line if you have a GPU with compute capability >= 7.0
# tf.keras.mixed_precision.set_global_policy('mixed_float16')

In [None]:
# ============================================================
# 2-B  Install tensorflow_datasets if not already available
# ============================================================
try:
    import tensorflow_datasets as tfds
    print(f"tensorflow_datasets version: {tfds.__version__}")
except ImportError:
    import subprocess, sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "tensorflow_datasets"])
    import tensorflow_datasets as tfds
    print(f"tensorflow_datasets version: {tfds.__version__}")

In [None]:
# ============================================================
# 2-C  Global constants used across the notebook
# ============================================================
IMG_SIZE_CUSTOM = 128          # For custom CNN models (Sections 6 & 7)
IMG_SIZE_TRANSFER = 224        # For transfer-learning models (Section 8)
BATCH_SIZE = 32
AUTOTUNE = tf.data.AUTOTUNE
CLASS_NAMES = ["cat", "dog"]
NUM_CLASSES = len(CLASS_NAMES)

---
## Section 3: Understanding CNNs -- Visual Theory

Before writing any model code, let us build intuition for the building blocks of a Convolutional Neural Network.

### 3.1 What Is a Convolution?

A **convolution** slides a small matrix (called a **kernel** or **filter**) across the input image. At every position it computes the element-wise product and sums the results to produce one value in the output **feature map**.

```
Input image (5x5)          Kernel (3x3)         Output feature map (3x3)
+---+---+---+---+---+      +---+---+---+        +----+----+----+
| 1 | 0 | 1 | 0 | 1 |      | 1 | 0 | 1 |        | 4  | 3  | 4  |
+---+---+---+---+---+      +---+---+---+        +----+----+----+
| 0 | 1 | 0 | 1 | 0 |      | 0 | 1 | 0 |        | 2  | 4  | 3  |
+---+---+---+---+---+      +---+---+---+        +----+----+----+
| 1 | 0 | 1 | 0 | 1 |      | 1 | 0 | 1 |        | 4  | 3  | 4  |
+---+---+---+---+---+                            +----+----+----+
| 0 | 1 | 0 | 1 | 0 |
+---+---+---+---+---+
| 1 | 0 | 1 | 0 | 1 |
+---+---+---+---+---+
```

With **valid** padding the output shrinks; with **same** padding we zero-pad the border so the output retains the input dimensions.

### 3.2 Stride

The **stride** controls how far the kernel moves at each step.

- Stride 1: move one pixel at a time (output size ~ input size).
- Stride 2: move two pixels at a time (output size ~ input size / 2).

```
Stride = 1                          Stride = 2
[X][X][X][ ][ ][ ]  -> step 1      [X][X][X][ ][ ][ ]  -> step 1
[ ][X][X][X][ ][ ]  -> step 2      [ ][ ][X][X][X][ ]  -> step 2
[ ][ ][X][X][X][ ]  -> step 3      [ ][ ][ ][ ][X][X]  -> step 3  (only 3 steps vs 4)
[ ][ ][ ][X][X][X]  -> step 4
```

### 3.3 Pooling

**Max pooling** (most common) picks the largest value in each local patch, reducing spatial dimensions while preserving the most activated features.

```
Input (4x4)            MaxPool 2x2, stride 2         Output (2x2)
+---+---+---+---+                                    +---+---+
| 1 | 3 | 2 | 1 |      max(1,3,0,2)=3               | 3 | 2 |
+---+---+---+---+      max(2,1,1,2)=2               +---+---+
| 0 | 2 | 1 | 2 |      max(3,1,0,2)=3               | 3 | 4 |
+---+---+---+---+      max(0,4,1,2)=4               +---+---+
| 3 | 1 | 0 | 4 |
+---+---+---+---+
| 0 | 2 | 1 | 2 |
+---+---+---+---+
```

### 3.4 Receptive Field

The **receptive field** of a neuron is the region of the original input that influences its value. Deeper layers have larger receptive fields, allowing them to capture more global patterns:

```
Layer 1 neuron  ->  sees a 3x3 patch of the input (edges, textures)
Layer 2 neuron  ->  sees a 5x5 patch of the input (parts of objects)
Layer 3 neuron  ->  sees a 7x7+ patch of the input (whole objects)
```

This hierarchical feature extraction is the key power of CNNs:

```
Input Image  -->  Edges/Textures  -->  Eyes/Ears/Fur  -->  Cat or Dog
  (pixels)        (low-level)          (mid-level)         (high-level)
```

### 3.5 Typical CNN Architecture

```
INPUT (224x224x3)
  |
  v
[CONV 3x3, 32 filters] -> [BatchNorm] -> [ReLU] -> [MaxPool 2x2]
  |  (112x112x32)
  v
[CONV 3x3, 64 filters] -> [BatchNorm] -> [ReLU] -> [MaxPool 2x2]
  |  (56x56x64)
  v
[CONV 3x3, 128 filters] -> [BatchNorm] -> [ReLU] -> [MaxPool 2x2]
  |  (28x28x128)
  v
[GlobalAveragePooling2D]  -> (128,)
  |
  v
[Dense 128] -> [Dropout 0.5] -> [Dense 1, sigmoid]
  |
  v
OUTPUT: probability(dog)
```

---
## Section 4: Dataset -- Cats vs. Dogs

We use the **cats_vs_dogs** dataset from `tensorflow_datasets`. It contains roughly **23,262 labelled JPEG images** -- orders of magnitude larger than the 40-image toy set in the original notebook. A realistic dataset size is essential for meaningful training and evaluation.

### Split strategy

| Split | Percentage | Purpose |
|-------|------------|---------|
| Train | 70 % | Model weight updates |
| Validation | 15 % | Hyperparameter tuning and early stopping |
| Test | 15 % | Final, unbiased evaluation |

In [None]:
# ============================================================
# 4-A  Download and split the dataset
# ============================================================
# tensorflow_datasets provides named splits.
# cats_vs_dogs only ships a single 'train' split, so we
# carve out our own train / val / test partitions.

(raw_train, raw_val, raw_test), ds_info = tfds.load(
    'cats_vs_dogs',
    split=['train[:70%]', 'train[70%:85%]', 'train[85%:]'],
    as_supervised=True,   # returns (image, label) tuples
    with_info=True,
)

print(ds_info.description)
print(f"\nTotal examples : {ds_info.splits['train'].num_examples}")
print(f"Label names    : {ds_info.features['label'].names}")
print(f"Train size     : {raw_train.cardinality().numpy()}")
print(f"Val size       : {raw_val.cardinality().numpy()}")
print(f"Test size      : {raw_test.cardinality().numpy()}")

In [None]:
# ============================================================
# 4-B  Explore sample images
# ============================================================
fig, axes = plt.subplots(3, 6, figsize=(18, 9))
fig.suptitle("Sample Images from the Training Set", fontsize=16, y=1.02)

for i, (image, label) in enumerate(raw_train.take(18)):
    ax = axes[i // 6, i % 6]
    ax.imshow(image.numpy())
    ax.set_title(CLASS_NAMES[label.numpy()], fontsize=12)
    ax.axis('off')

plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# 4-C  Class distribution analysis
# ============================================================
def count_classes(dataset):
    """Count number of samples per class in a tf.data.Dataset."""
    cats, dogs = 0, 0
    for _, label in dataset:
        if label.numpy() == 0:
            cats += 1
        else:
            dogs += 1
    return cats, dogs

train_cats, train_dogs = count_classes(raw_train)
val_cats, val_dogs = count_classes(raw_val)
test_cats, test_dogs = count_classes(raw_test)

print(f"Train -- cats: {train_cats}, dogs: {train_dogs}")
print(f"Val   -- cats: {val_cats},   dogs: {val_dogs}")
print(f"Test  -- cats: {test_cats},  dogs: {test_dogs}")

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, (title, counts) in zip(axes, [
    ("Train", [train_cats, train_dogs]),
    ("Validation", [val_cats, val_dogs]),
    ("Test", [test_cats, test_dogs]),
]):
    bars = ax.bar(CLASS_NAMES, counts, color=['#4C72B0', '#DD8452'])
    ax.set_title(title, fontsize=13)
    ax.set_ylabel("Count")
    for bar, c in zip(bars, counts):
        ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 20,
                str(c), ha='center', fontsize=11)

plt.suptitle("Class Distribution Across Splits", fontsize=15, y=1.04)
plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# 4-D  Image dimension analysis
# ============================================================
heights, widths = [], []
for image, _ in raw_train.take(500):
    h, w, _ = image.shape
    heights.append(h)
    widths.append(w)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(heights, bins=40, color='steelblue', edgecolor='white')
axes[0].set_title('Height Distribution (first 500 images)')
axes[0].set_xlabel('Pixels')
axes[1].hist(widths, bins=40, color='coral', edgecolor='white')
axes[1].set_title('Width Distribution (first 500 images)')
axes[1].set_xlabel('Pixels')
plt.tight_layout()
plt.show()

print(f"Height -- min: {min(heights)}, max: {max(heights)}, median: {np.median(heights):.0f}")
print(f"Width  -- min: {min(widths)},  max: {max(widths)},  median: {np.median(widths):.0f}")

---
## Section 5: Data Pipeline & Augmentation

### Why does this matter?

A well-built `tf.data` pipeline keeps the GPU fed with data and avoids I/O bottlenecks. **Data augmentation** artificially expands the effective training set, reducing overfitting.

### Pipeline recipe

```
raw images
  -> resize to fixed shape
  -> normalise pixel values to [0, 1]
  -> (training only) random augmentation
  -> batch
  -> prefetch
```

In [None]:
# ============================================================
# 5-A  Preprocessing functions
# ============================================================

def preprocess(image, label, img_size):
    """Resize and normalise an image."""
    image = tf.image.resize(image, [img_size, img_size])
    image = tf.cast(image, tf.float32) / 255.0
    return image, label


def make_dataset(raw_ds, img_size, batch_size, augment=False, shuffle=False, cache=True):
    """Build a performant tf.data pipeline."""
    ds = raw_ds.map(lambda img, lbl: preprocess(img, lbl, img_size),
                    num_parallel_calls=AUTOTUNE)
    if cache:
        ds = ds.cache()
    if shuffle:
        ds = ds.shuffle(buffer_size=1000)
    ds = ds.batch(batch_size)
    if augment:
        # Augmentation is applied *after* batching for efficiency
        data_augmentation = tf.keras.Sequential([
            tf.keras.layers.RandomFlip("horizontal"),
            tf.keras.layers.RandomRotation(0.15),
            tf.keras.layers.RandomZoom(0.15),
            tf.keras.layers.RandomContrast(0.2),
        ])
        ds = ds.map(lambda x, y: (data_augmentation(x, training=True), y),
                    num_parallel_calls=AUTOTUNE)
    ds = ds.prefetch(AUTOTUNE)
    return ds

In [None]:
# ============================================================
# 5-B  Build pipelines for custom CNN models (128x128)
# ============================================================
train_ds_128 = make_dataset(raw_train, IMG_SIZE_CUSTOM, BATCH_SIZE,
                            augment=True, shuffle=True)
val_ds_128   = make_dataset(raw_val,   IMG_SIZE_CUSTOM, BATCH_SIZE)
test_ds_128  = make_dataset(raw_test,  IMG_SIZE_CUSTOM, BATCH_SIZE)

# Quick sanity check
for images, labels in train_ds_128.take(1):
    print(f"Batch shape : {images.shape}")
    print(f"Label batch : {labels.numpy()[:10]}")
    print(f"Pixel range : [{images.numpy().min():.3f}, {images.numpy().max():.3f}]")

In [None]:
# ============================================================
# 5-C  Build pipelines for transfer learning models (224x224)
# ============================================================
train_ds_224 = make_dataset(raw_train, IMG_SIZE_TRANSFER, BATCH_SIZE,
                            augment=True, shuffle=True)
val_ds_224   = make_dataset(raw_val,   IMG_SIZE_TRANSFER, BATCH_SIZE)
test_ds_224  = make_dataset(raw_test,  IMG_SIZE_TRANSFER, BATCH_SIZE)

for images, labels in train_ds_224.take(1):
    print(f"Batch shape : {images.shape}")

In [None]:
# ============================================================
# 5-D  Visualise augmented samples
# ============================================================
# Take one original image and show it alongside several augmented versions.

augmenter = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.15),
    tf.keras.layers.RandomZoom(0.15),
    tf.keras.layers.RandomContrast(0.2),
])

# Grab a single image
for images, labels in train_ds_128.take(1):
    original = images[0]  # shape (128, 128, 3)
    break

fig, axes = plt.subplots(2, 5, figsize=(16, 7))
axes[0, 0].imshow(original.numpy())
axes[0, 0].set_title("Original", fontsize=12, fontweight='bold')
axes[0, 0].axis('off')

for i in range(1, 10):
    ax = axes[i // 5, i % 5]
    augmented = augmenter(tf.expand_dims(original, 0), training=True)
    ax.imshow(tf.squeeze(augmented).numpy())
    ax.set_title(f"Aug {i}", fontsize=11)
    ax.axis('off')

plt.suptitle("Data Augmentation Examples", fontsize=15, y=1.02)
plt.tight_layout()
plt.show()

---
## Section 6: Model 1 -- Basic CNN (Beginner Level)

Our first model is intentionally simple: three convolutional blocks followed by a dense classifier. This mirrors the architecture style from the original notebook but uses the modern `tf.keras.Sequential` API.

### Architecture

```
Conv2D(32, 3x3) -> ReLU -> MaxPool(2x2)
Conv2D(64, 3x3) -> ReLU -> MaxPool(2x2)
Conv2D(128, 3x3) -> ReLU -> MaxPool(2x2)
Flatten -> Dense(128) -> ReLU -> Dropout(0.5) -> Dense(1, sigmoid)
```

In [None]:
# ============================================================
# 6-A  Define Model 1
# ============================================================

def build_basic_cnn(input_shape=(IMG_SIZE_CUSTOM, IMG_SIZE_CUSTOM, 3)):
    """A straightforward 3-block CNN for binary classification."""
    model = tf.keras.Sequential([
        # Block 1
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
                               padding='same', input_shape=input_shape),
        tf.keras.layers.MaxPooling2D((2, 2)),

        # Block 2
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D((2, 2)),

        # Classifier head
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ], name='basic_cnn')
    return model

model1 = build_basic_cnn()
model1.summary()

In [None]:
# ============================================================
# 6-B  Compile Model 1
# ============================================================
model1.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy'],
)

In [None]:
# ============================================================
# 6-C  Train Model 1
# ============================================================
EPOCHS_M1 = 15

history1 = model1.fit(
    train_ds_128,
    validation_data=val_ds_128,
    epochs=EPOCHS_M1,
)

print(f"\nFinal train accuracy : {history1.history['accuracy'][-1]:.4f}")
print(f"Final val accuracy   : {history1.history['val_accuracy'][-1]:.4f}")

In [None]:
# ============================================================
# 6-D  Plot training curves -- reusable helper
# ============================================================

def plot_training_history(history, title="Training History"):
    """Plot accuracy and loss curves for training and validation."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Accuracy
    axes[0].plot(history.history['accuracy'], label='Train', linewidth=2)
    axes[0].plot(history.history['val_accuracy'], label='Validation', linewidth=2)
    axes[0].set_title('Accuracy')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    # Loss
    axes[1].plot(history.history['loss'], label='Train', linewidth=2)
    axes[1].plot(history.history['val_loss'], label='Validation', linewidth=2)
    axes[1].set_title('Loss')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)

    fig.suptitle(title, fontsize=15, y=1.02)
    plt.tight_layout()
    plt.show()


plot_training_history(history1, title="Model 1 -- Basic CNN")

### Observations (Model 1)

With only three convolutional blocks and no normalisation, this model is likely to overfit within a handful of epochs: training accuracy climbs sharply while validation accuracy plateaus or drops. This motivates the enhancements in Section 7.

---
## Section 7: Model 2 -- Enhanced CNN (Intermediate Level)

We improve on Model 1 by adding:

| Technique | Benefit |
|-----------|---------|
| **Batch Normalisation** | Stabilises and accelerates training by normalising layer inputs |
| **More convolutional blocks** | Increases model capacity and receptive field |
| **GlobalAveragePooling2D** | Replaces Flatten -- dramatically reduces parameters and overfitting |
| **Learning-rate schedule** | Reduces LR when validation loss plateaus |
| **EarlyStopping** | Halts training automatically when the model stops improving |
| **ModelCheckpoint** | Saves the best weights to disk |

In [None]:
# ============================================================
# 7-A  Define Model 2
# ============================================================

def build_enhanced_cnn(input_shape=(IMG_SIZE_CUSTOM, IMG_SIZE_CUSTOM, 3)):
    """A deeper CNN with BatchNorm, GlobalAveragePooling, and Dropout."""
    model = tf.keras.Sequential([
        # Block 1
        tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_shape=input_shape),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(32, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.25),

        # Block 2
        tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.25),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(128, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.25),

        # Block 4
        tf.keras.layers.Conv2D(256, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.Conv2D(256, (3, 3), padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation('relu'),
        tf.keras.layers.GlobalAveragePooling2D(),

        # Classifier head
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ], name='enhanced_cnn')
    return model

model2 = build_enhanced_cnn()
model2.summary()

In [None]:
# ============================================================
# 7-B  Compile Model 2 with callbacks
# ============================================================
model2.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy'],
)

callbacks_m2 = [
    # Reduce LR when val_loss stops improving
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6, verbose=1),

    # Stop training when no improvement for 7 epochs
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=7, restore_best_weights=True, verbose=1),

    # Save best model weights
    tf.keras.callbacks.ModelCheckpoint(
        'best_model2.keras', monitor='val_loss',
        save_best_only=True, verbose=1),
]

In [None]:
# ============================================================
# 7-C  Train Model 2
# ============================================================
EPOCHS_M2 = 30

history2 = model2.fit(
    train_ds_128,
    validation_data=val_ds_128,
    epochs=EPOCHS_M2,
    callbacks=callbacks_m2,
)

print(f"\nBest val accuracy achieved: {max(history2.history['val_accuracy']):.4f}")

In [None]:
# ============================================================
# 7-D  Plot training curves for Model 2
# ============================================================
plot_training_history(history2, title="Model 2 -- Enhanced CNN")

In [None]:
# ============================================================
# 7-E  Compare Model 1 vs Model 2 on the validation set
# ============================================================
m1_val_loss, m1_val_acc = model1.evaluate(val_ds_128, verbose=0)
m2_val_loss, m2_val_acc = model2.evaluate(val_ds_128, verbose=0)

print(f"Model 1 (Basic CNN)    -- Val Loss: {m1_val_loss:.4f}  Val Acc: {m1_val_acc:.4f}")
print(f"Model 2 (Enhanced CNN) -- Val Loss: {m2_val_loss:.4f}  Val Acc: {m2_val_acc:.4f}")

fig, ax = plt.subplots(figsize=(6, 4))
bars = ax.bar(['Model 1\n(Basic CNN)', 'Model 2\n(Enhanced CNN)'],
              [m1_val_acc, m2_val_acc],
              color=['#4C72B0', '#55A868'], edgecolor='white', width=0.5)
for bar, acc in zip(bars, [m1_val_acc, m2_val_acc]):
    ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.005,
            f"{acc:.3f}", ha='center', fontsize=13, fontweight='bold')
ax.set_ylim(0, 1.05)
ax.set_ylabel('Validation Accuracy')
ax.set_title('Model Comparison (so far)', fontsize=14)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

---
## Section 8: Model 3 -- Transfer Learning with MobileNetV2 (Advanced Level)

Training a CNN from scratch requires a large dataset and significant compute. **Transfer learning** lets us leverage a model that has already been trained on millions of images (ImageNet) and re-use its learned feature representations.

### Strategy

1. **Feature extraction** -- Freeze the entire MobileNetV2 base and only train the new classification head.
2. **Fine-tuning** -- Unfreeze the top layers of MobileNetV2 and train them with a very small learning rate.

We use **MobileNetV2** because it is:
- Lightweight (3.4 M parameters vs 138 M for VGG-16)
- Fast inference (designed for mobile devices)
- Still very accurate on ImageNet
- The natural choice if we later want to deploy on mobile (Section 11)

In [None]:
# ============================================================
# 8-A  Load MobileNetV2 as the base model
# ============================================================
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(IMG_SIZE_TRANSFER, IMG_SIZE_TRANSFER, 3),
    include_top=False,          # discard the ImageNet classifier head
    weights='imagenet',
)

# Freeze all base model layers
base_model.trainable = False

print(f"Base model layers      : {len(base_model.layers)}")
print(f"Trainable variables    : {len(base_model.trainable_variables)}")
print(f"Non-trainable variables: {len(base_model.non_trainable_variables)}")

In [None]:
# ============================================================
# 8-B  Build Model 3 (feature extraction head)
# ============================================================

def build_transfer_model(base):
    """Add a classification head on top of a frozen base model."""
    # MobileNetV2 expects pixels in [-1, 1]
    preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input

    inputs = tf.keras.Input(shape=(IMG_SIZE_TRANSFER, IMG_SIZE_TRANSFER, 3))
    x = preprocess_input(inputs)        # rescale from [0,1] to [-1,1]
    x = base(x, training=False)         # keep BN layers in inference mode
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    x = tf.keras.layers.Dropout(0.3)(x)
    outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

    model = tf.keras.Model(inputs, outputs, name='mobilenetv2_transfer')
    return model

model3 = build_transfer_model(base_model)
model3.summary()

In [None]:
# ============================================================
# 8-C  Phase 1: Feature extraction (frozen base)
# ============================================================
model3.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy'],
)

EPOCHS_FE = 5  # Feature extraction only needs a few epochs

history3_fe = model3.fit(
    train_ds_224,
    validation_data=val_ds_224,
    epochs=EPOCHS_FE,
)

print(f"\nAfter feature extraction -- Val Acc: {history3_fe.history['val_accuracy'][-1]:.4f}")

In [None]:
# ============================================================
# 8-D  Phase 2: Fine-tuning (unfreeze top layers)
# ============================================================
# Unfreeze the top portion of MobileNetV2.
# The model has 154 layers; we unfreeze from layer 120 onward.

base_model.trainable = True

FINE_TUNE_AT = 120
for layer in base_model.layers[:FINE_TUNE_AT]:
    layer.trainable = False

trainable_count = sum(1 for l in base_model.layers if l.trainable)
print(f"Fine-tuning {trainable_count} layers (out of {len(base_model.layers)})")

# Re-compile with a much lower learning rate to avoid catastrophic forgetting
model3.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss='binary_crossentropy',
    metrics=['accuracy'],
)

callbacks_m3 = [
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=2, min_lr=1e-7, verbose=1),
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=5, restore_best_weights=True, verbose=1),
    tf.keras.callbacks.ModelCheckpoint(
        'best_model3.keras', monitor='val_loss',
        save_best_only=True, verbose=1),
]

EPOCHS_FT = 15
total_epochs = EPOCHS_FE + EPOCHS_FT

history3_ft = model3.fit(
    train_ds_224,
    validation_data=val_ds_224,
    epochs=total_epochs,
    initial_epoch=EPOCHS_FE,  # continue from where feature extraction left off
    callbacks=callbacks_m3,
)

print(f"\nAfter fine-tuning -- Val Acc: {max(history3_ft.history['val_accuracy']):.4f}")

In [None]:
# ============================================================
# 8-E  Plot combined training history for Model 3
# ============================================================
# Merge the two phases into a single history for plotting.

def merge_histories(h1, h2):
    """Concatenate two Keras history objects."""
    merged = {}
    for key in h1.history:
        merged[key] = h1.history[key] + h2.history[key]
    class MergedHistory:
        pass
    mh = MergedHistory()
    mh.history = merged
    return mh

history3_full = merge_histories(history3_fe, history3_ft)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for ax, metric, title in zip(axes, ['accuracy', 'loss'], ['Accuracy', 'Loss']):
    ax.plot(history3_full.history[metric], label='Train', linewidth=2)
    ax.plot(history3_full.history[f'val_{metric}'], label='Validation', linewidth=2)
    ax.axvline(x=EPOCHS_FE - 1, color='gray', linestyle='--', label='Fine-tune start')
    ax.set_title(title)
    ax.set_xlabel('Epoch')
    ax.set_ylabel(title)
    ax.legend()
    ax.grid(True, alpha=0.3)

fig.suptitle("Model 3 -- MobileNetV2 Transfer Learning", fontsize=15, y=1.02)
plt.tight_layout()
plt.show()

---
## Section 9: Model Evaluation & Interpretation (Architect Level)

Accuracy alone rarely tells the full story. In this section we:

1. Evaluate all three models on the **held-out test set**.
2. Visualise the **confusion matrix**.
3. Generate a **classification report** (precision, recall, F1).
4. Implement **Grad-CAM** from scratch to see *where* the model is looking.
5. Visualise intermediate **feature maps**.

In [None]:
# ============================================================
# 9-A  Evaluate all models on the test set
# ============================================================
results = {}

for name, model, ds in [
    ('Model 1 (Basic CNN)',      model1, test_ds_128),
    ('Model 2 (Enhanced CNN)',   model2, test_ds_128),
    ('Model 3 (MobileNetV2)',    model3, test_ds_224),
]:
    loss, acc = model.evaluate(ds, verbose=0)
    results[name] = {'loss': loss, 'accuracy': acc}
    print(f"{name:<30s}  Test Loss: {loss:.4f}  Test Acc: {acc:.4f}")

In [None]:
# ============================================================
# 9-B  Model comparison bar chart
# ============================================================
names = list(results.keys())
accs  = [results[n]['accuracy'] for n in names]

fig, ax = plt.subplots(figsize=(8, 5))
colors = ['#4C72B0', '#55A868', '#C44E52']
bars = ax.barh(names, accs, color=colors, edgecolor='white', height=0.5)
for bar, acc in zip(bars, accs):
    ax.text(bar.get_width() + 0.005, bar.get_y() + bar.get_height() / 2,
            f"{acc:.3f}", va='center', fontsize=13, fontweight='bold')
ax.set_xlim(0, 1.08)
ax.set_xlabel('Test Accuracy', fontsize=12)
ax.set_title('All Models -- Test Set Comparison', fontsize=14)
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# 9-C  Confusion matrix & classification report
# ============================================================
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay

def evaluate_model_detailed(model, dataset, model_name, class_names=CLASS_NAMES):
    """Generate predictions and display confusion matrix + classification report."""
    y_true, y_pred_prob = [], []
    for images, labels in dataset:
        preds = model.predict(images, verbose=0)
        y_pred_prob.extend(preds.flatten())
        y_true.extend(labels.numpy().flatten())

    y_true = np.array(y_true)
    y_pred = (np.array(y_pred_prob) >= 0.5).astype(int)

    # Confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    fig, ax = plt.subplots(figsize=(5, 4))
    disp = ConfusionMatrixDisplay(cm, display_labels=class_names)
    disp.plot(ax=ax, cmap='Blues', values_format='d')
    ax.set_title(f'Confusion Matrix -- {model_name}', fontsize=13)
    plt.tight_layout()
    plt.show()

    # Classification report
    print(f"\nClassification Report -- {model_name}")
    print(classification_report(y_true, y_pred, target_names=class_names))

    return y_true, y_pred, y_pred_prob


# We focus the detailed evaluation on Model 3 (best expected performance)
y_true_m3, y_pred_m3, y_prob_m3 = evaluate_model_detailed(
    model3, test_ds_224, 'Model 3 (MobileNetV2)')

In [None]:
# ============================================================
# 9-D  Confusion matrices for Models 1 & 2 as well
# ============================================================
_ = evaluate_model_detailed(model1, test_ds_128, 'Model 1 (Basic CNN)')
_ = evaluate_model_detailed(model2, test_ds_128, 'Model 2 (Enhanced CNN)')

### 9.1 Grad-CAM: Where Is the Model Looking?

**Gradient-weighted Class Activation Mapping (Grad-CAM)** produces a heat-map highlighting the image regions that most influenced the model's prediction. The algorithm:

1. Forward-pass the image to get the prediction.
2. Compute the gradient of the predicted class score with respect to the feature maps of the last convolutional layer.
3. Global-average-pool these gradients to obtain per-channel importance weights.
4. Compute a weighted combination of the feature maps and apply ReLU.
5. Upscale and overlay the heat-map on the original image.

Reference: Selvaraju et al., "Grad-CAM: Visual Explanations from Deep Networks" (ICCV 2017).

In [None]:
# ============================================================
# 9-E  Grad-CAM implementation
# ============================================================

def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    """
    Generate a Grad-CAM heatmap for a given image and model.

    Parameters
    ----------
    img_array : tf.Tensor
        Preprocessed image tensor of shape (1, H, W, 3).
    model : tf.keras.Model
        The trained Keras model.
    last_conv_layer_name : str
        Name of the last convolutional layer to compute Grad-CAM for.
    pred_index : int or None
        Class index.  For binary (sigmoid) output, use 0.

    Returns
    -------
    heatmap : np.ndarray
        2-D array of values in [0, 1].
    """
    # Build a sub-model that outputs both the conv layer output and the final prediction
    grad_model = tf.keras.Model(
        inputs=model.inputs,
        outputs=[model.get_layer(last_conv_layer_name).output, model.output],
    )

    with tf.GradientTape() as tape:
        conv_outputs, predictions = grad_model(img_array)
        if pred_index is None:
            pred_index = 0  # binary classification
        class_channel = predictions[:, pred_index]

    # Gradient of the predicted class w.r.t. the conv layer output
    grads = tape.gradient(class_channel, conv_outputs)

    # Global average pooling of gradients -> channel importance weights
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    # Weighted combination of feature maps
    conv_outputs = conv_outputs[0]
    heatmap = conv_outputs @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)

    # Normalise to [0, 1]
    heatmap = tf.maximum(heatmap, 0) / (tf.math.reduce_max(heatmap) + 1e-8)
    return heatmap.numpy()


def display_gradcam(img, heatmap, alpha=0.4):
    """
    Overlay the Grad-CAM heatmap on the original image.
    """
    # Resize heatmap to match image
    heatmap_resized = tf.image.resize(
        heatmap[..., tf.newaxis], (img.shape[0], img.shape[1])
    ).numpy().squeeze()

    # Apply a colourmap
    heatmap_colour = cm.jet(heatmap_resized)[:, :, :3]  # drop alpha channel

    # Superimpose
    superimposed = heatmap_colour * alpha + img * (1 - alpha)
    superimposed = np.clip(superimposed, 0, 1)
    return superimposed

In [None]:
# ============================================================
# 9-F  Apply Grad-CAM to Model 3 (MobileNetV2)
# ============================================================

# Find the last convolutional layer in the base model
last_conv_layer = None
for layer in reversed(base_model.layers):
    if isinstance(layer, tf.keras.layers.Conv2D):
        last_conv_layer = layer.name
        break
print(f"Last conv layer for Grad-CAM: {last_conv_layer}")

# We need a model that can route gradients through the base model.
# Since model3 wraps base_model as a single layer, we build a
# "flat" version for Grad-CAM purposes.

gradcam_model = tf.keras.Model(
    inputs=model3.inputs,
    outputs=model3.output,
)

# Grab a batch from the test set
for test_images, test_labels in test_ds_224.take(1):
    break

fig, axes = plt.subplots(3, 4, figsize=(16, 12))
fig.suptitle("Grad-CAM Visualisations (Model 3 -- MobileNetV2)", fontsize=16, y=1.02)

for i in range(min(6, test_images.shape[0])):
    img = test_images[i]
    label = test_labels[i].numpy()
    img_batch = tf.expand_dims(img, 0)

    # Build a temporary model exposing the inner conv layer
    # We access through the nested base_model layer
    inner_model = model3.get_layer('mobilenetv2_1.00_224')
    grad_sub = tf.keras.Model(
        inputs=model3.inputs,
        outputs=[inner_model.get_layer(last_conv_layer).output, model3.output],
    )

    with tf.GradientTape() as tape:
        conv_out, pred = grad_sub(img_batch)
        class_channel = pred[:, 0]

    grads = tape.gradient(class_channel, conv_out)
    pooled = tf.reduce_mean(grads, axis=(0, 1, 2))
    heatmap = tf.squeeze(conv_out[0] @ pooled[..., tf.newaxis])
    heatmap = tf.maximum(heatmap, 0) / (tf.math.reduce_max(heatmap) + 1e-8)
    heatmap = heatmap.numpy()

    superimposed = display_gradcam(img.numpy(), heatmap)

    pred_val = pred.numpy()[0, 0]
    pred_label = CLASS_NAMES[int(pred_val >= 0.5)]
    true_label = CLASS_NAMES[label]

    row = i // 2
    col = (i % 2) * 2

    axes[row, col].imshow(img.numpy())
    axes[row, col].set_title(f"True: {true_label}", fontsize=11)
    axes[row, col].axis('off')

    axes[row, col + 1].imshow(superimposed)
    axes[row, col + 1].set_title(f"Pred: {pred_label} ({pred_val:.2f})", fontsize=11)
    axes[row, col + 1].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# 9-G  Feature map visualisation (Model 1 -- Basic CNN)
# ============================================================
# Show what the first few convolutional layers "see".

# Pick a sample image
for sample_images, sample_labels in test_ds_128.take(1):
    sample_img = sample_images[0:1]  # keep batch dim
    sample_label = sample_labels[0].numpy()
    break

# Build sub-models that output each conv layer's activations
conv_layer_names = [l.name for l in model1.layers if 'conv2d' in l.name]
print(f"Conv layers in Model 1: {conv_layer_names}")

fig_cols = 8
for layer_name in conv_layer_names:
    sub_model = tf.keras.Model(inputs=model1.inputs,
                                outputs=model1.get_layer(layer_name).output)
    feature_maps = sub_model.predict(sample_img, verbose=0)
    n_filters = feature_maps.shape[-1]
    n_show = min(n_filters, fig_cols * 2)  # show up to 16 filters
    fig_rows = (n_show + fig_cols - 1) // fig_cols

    fig, axes = plt.subplots(fig_rows, fig_cols, figsize=(16, 2.5 * fig_rows))
    fig.suptitle(f"Feature Maps -- {layer_name}  (shape: {feature_maps.shape[1:]})",
                 fontsize=13, y=1.02)
    axes = axes.flatten() if fig_rows > 1 else [axes] if fig_rows == 1 and fig_cols == 1 else axes

    for j in range(n_show):
        axes[j].imshow(feature_maps[0, :, :, j], cmap='viridis')
        axes[j].axis('off')
        axes[j].set_title(f'Filter {j}', fontsize=9)
    # Hide unused axes
    for j in range(n_show, len(axes)):
        axes[j].axis('off')

    plt.tight_layout()
    plt.show()

In [None]:
# ============================================================
# 9-H  Model comparison summary table
# ============================================================

comparison_data = []
for name, model, ds in [
    ('Model 1 (Basic CNN)',      model1, test_ds_128),
    ('Model 2 (Enhanced CNN)',   model2, test_ds_128),
    ('Model 3 (MobileNetV2)',    model3, test_ds_224),
]:
    loss, acc = model.evaluate(ds, verbose=0)
    n_params = model.count_params()
    comparison_data.append({
        'Model': name,
        'Parameters': f"{n_params:,}",
        'Test Accuracy': f"{acc:.4f}",
        'Test Loss': f"{loss:.4f}",
    })

# Print as a formatted table
header = f"{'Model':<30s} {'Parameters':>15s} {'Test Accuracy':>15s} {'Test Loss':>12s}"
print(header)
print('-' * len(header))
for row in comparison_data:
    print(f"{row['Model']:<30s} {row['Parameters']:>15s} {row['Test Accuracy']:>15s} {row['Test Loss']:>12s}")

---
## Section 10: Hyperparameter Tuning

Hyperparameters are the knobs you turn *before* training starts. Finding good settings can make the difference between a mediocre model and a great one.

### 10.1 Learning Rate -- The Single Most Important Hyperparameter

A **learning rate finder** trains for a few hundred steps while gradually increasing the learning rate from a very small value to a very large one, recording the loss at each step. The ideal learning rate sits in the steepest-descent region of the resulting curve -- typically one order of magnitude below the point where the loss starts exploding.

```
Loss
 |
 |\                          <- too high: loss explodes
 | \        .--.             <- unstable
 |  \      /    \
 |   \____/      \          <- sweet spot is around the bottom
 |                \
 +----+----+----+--+---->
   1e-7  1e-5  1e-3  1e-1   Learning Rate (log scale)
```

In [None]:
# ============================================================
# 10-A  Learning rate finder
# ============================================================

def lr_finder(model_fn, train_dataset, start_lr=1e-7, end_lr=1.0, num_steps=200):
    """
    Train a fresh model for `num_steps` steps while exponentially
    increasing the learning rate from `start_lr` to `end_lr`.
    Returns the learning rates and corresponding losses.
    """
    model = model_fn()
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=start_lr),
        loss='binary_crossentropy',
    )

    # Compute the multiplicative factor per step
    factor = (end_lr / start_lr) ** (1.0 / num_steps)

    lrs, losses = [], []
    step = 0
    best_loss = float('inf')

    for images, labels in train_dataset.repeat():
        if step >= num_steps:
            break

        with tf.GradientTape() as tape:
            preds = model(images, training=True)
            loss = tf.keras.losses.binary_crossentropy(labels, tf.squeeze(preds))
            loss = tf.reduce_mean(loss)

        grads = tape.gradient(loss, model.trainable_variables)
        model.optimizer.apply_gradients(zip(grads, model.trainable_variables))

        current_lr = start_lr * (factor ** step)
        model.optimizer.learning_rate.assign(current_lr)

        lrs.append(current_lr)
        losses.append(loss.numpy())

        # Stop if loss explodes (> 4x the best)
        if loss.numpy() < best_loss:
            best_loss = loss.numpy()
        if loss.numpy() > best_loss * 4:
            break

        step += 1

    return lrs, losses


# Run the finder on a fresh Basic CNN
lrs, losses = lr_finder(build_basic_cnn, train_ds_128, num_steps=300)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(lrs, losses, linewidth=2)
ax.set_xscale('log')
ax.set_xlabel('Learning Rate (log scale)', fontsize=12)
ax.set_ylabel('Loss', fontsize=12)
ax.set_title('Learning Rate Finder', fontsize=14)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Tip: Pick a learning rate from the steepest downward slope, "
      "typically 1/10 of the LR where loss begins rising.")

### 10.2 Batch Size Impact

| Batch Size | Pros | Cons |
|------------|------|------|
| Small (8--16) | Better generalisation, acts as regulariser | Slower training, noisier gradients |
| Medium (32--64) | Good balance of speed and generalisation | -- |
| Large (128--512) | Faster per-epoch time on GPUs | May generalise worse, requires LR scaling |

**Rule of thumb:** When you double the batch size, double the learning rate ("linear scaling rule").

### 10.3 Architecture Search (Discussion)

Beyond manual tuning, systematic approaches include:

- **Grid search**: exhaustive but expensive.
- **Random search**: surprisingly effective (Bergstra & Bengio, 2012).
- **Bayesian optimisation**: models the objective function to pick the next trial intelligently (e.g., Keras Tuner with `BayesianOptimization`).
- **Neural Architecture Search (NAS)**: uses RL or evolutionary methods to design architectures automatically (e.g., EfficientNet was discovered via NAS).

---
## Section 11: Production Considerations (Architect Level)

Training a model is only half the battle. Getting it into users' hands reliably, efficiently, and at scale requires additional engineering.

### 11.1 Model Export Formats

| Format | Use Case |
|--------|----------|
| **SavedModel** | TensorFlow Serving, cloud deployment |
| **TF Lite (.tflite)** | Mobile and edge devices |
| **TF.js** | In-browser inference |
| **ONNX** | Cross-framework interoperability |

In [None]:
# ============================================================
# 11-A  Export as SavedModel
# ============================================================
SAVED_MODEL_DIR = 'pet_classifier_savedmodel'

model3.save(SAVED_MODEL_DIR)
print(f"SavedModel exported to: {SAVED_MODEL_DIR}/")

# Verify we can reload it
reloaded = tf.keras.models.load_model(SAVED_MODEL_DIR)
reloaded_loss, reloaded_acc = reloaded.evaluate(test_ds_224, verbose=0)
print(f"Reloaded model -- Test Acc: {reloaded_acc:.4f} (should match original)")

In [None]:
# ============================================================
# 11-B  Convert to TensorFlow Lite
# ============================================================
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_DIR)
tflite_model = converter.convert()

tflite_path = 'pet_classifier.tflite'
with open(tflite_path, 'wb') as f:
    f.write(tflite_model)

tflite_size_mb = os.path.getsize(tflite_path) / (1024 * 1024)
print(f"TFLite model saved to: {tflite_path}")
print(f"TFLite model size    : {tflite_size_mb:.2f} MB")

In [None]:
# ============================================================
# 11-C  Post-training quantisation (INT8)
# ============================================================
# Dynamic range quantisation -- simplest form, no calibration data needed.

converter_quant = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_DIR)
converter_quant.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant = converter_quant.convert()

quant_path = 'pet_classifier_quant.tflite'
with open(quant_path, 'wb') as f:
    f.write(tflite_quant)

quant_size_mb = os.path.getsize(quant_path) / (1024 * 1024)
print(f"Quantised model saved to: {quant_path}")
print(f"Quantised model size    : {quant_size_mb:.2f} MB")
print(f"Size reduction          : {(1 - quant_size_mb / tflite_size_mb) * 100:.1f}%")

In [None]:
# ============================================================
# 11-D  TFLite inference benchmark
# ============================================================
import time

def tflite_predict(tflite_path, images):
    """Run inference with a TFLite model on a batch of images."""
    interpreter = tf.lite.Interpreter(model_path=tflite_path)
    interpreter.allocate_tensors()

    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    predictions = []
    for img in images:
        img_input = np.expand_dims(img, axis=0).astype(np.float32)
        interpreter.set_tensor(input_details[0]['index'], img_input)
        interpreter.invoke()
        output = interpreter.get_tensor(output_details[0]['index'])
        predictions.append(output[0])

    return np.array(predictions)


# Take a small batch for benchmarking
for bench_images, bench_labels in test_ds_224.take(1):
    break

# Time the full-precision TFLite model
start = time.time()
preds_fp = tflite_predict(tflite_path, bench_images.numpy())
time_fp = time.time() - start

# Time the quantised TFLite model
start = time.time()
preds_q = tflite_predict(quant_path, bench_images.numpy())
time_q = time.time() - start

print(f"Full-precision TFLite -- {bench_images.shape[0]} images in {time_fp:.3f}s  "
      f"({time_fp / bench_images.shape[0] * 1000:.1f} ms/image)")
print(f"Quantised TFLite      -- {bench_images.shape[0]} images in {time_q:.3f}s  "
      f"({time_q / bench_images.shape[0] * 1000:.1f} ms/image)")

### 11.2 Full-Integer Quantisation (with representative dataset)

For maximum speed on edge TPUs or integer-only hardware, you can perform **full integer quantisation** by providing a representative calibration dataset:

In [None]:
# ============================================================
# 11-E  Full integer quantisation
# ============================================================

def representative_dataset():
    """Yield representative samples for calibration."""
    for images, _ in test_ds_224.take(50):
        for img in images:
            yield [tf.expand_dims(img, 0)]

converter_full = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_DIR)
converter_full.optimizations = [tf.lite.Optimize.DEFAULT]
converter_full.representative_dataset = representative_dataset
# Ensure input/output remain float for ease of use
converter_full.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS_INT8
]
converter_full.inference_input_type = tf.uint8
converter_full.inference_output_type = tf.uint8

try:
    tflite_full_int = converter_full.convert()
    full_int_path = 'pet_classifier_full_int8.tflite'
    with open(full_int_path, 'wb') as f:
        f.write(tflite_full_int)
    print(f"Full INT8 model saved: {full_int_path}  "
          f"({os.path.getsize(full_int_path) / 1024 / 1024:.2f} MB)")
except Exception as e:
    print(f"Full INT8 conversion failed (this is expected on some ops): {e}")
    print("Dynamic-range quantisation (11-C) is still available.")

### 11.3 Serving Architecture (Discussion)

A production serving stack for an image classification model typically looks like this:

```
                       +------------------+
  Client (mobile/web)  |   API Gateway    |   (rate limiting, auth)
         |             +--------+---------+
         |                      |
         v                      v
  +------+------+      +-------+--------+
  | TFLite model |      | TF Serving     |   (SavedModel, gRPC/REST)
  | (on device)  |      | (in container) |
  +--------------+      +-------+--------+
                                |
                        +-------+--------+
                        | Model Registry  |   (versioning, A/B tests)
                        +----------------+
```

**Key considerations:**

- **Latency budget:** On-device TFLite eliminates network round-trips; server-side allows larger models.
- **Batching:** TF Serving can batch multiple requests together for higher GPU utilisation.
- **Model versioning:** Roll out new models gradually; roll back instantly if metrics degrade.
- **Monitoring:** Track prediction distributions, latency percentiles, and data drift in production.
- **Preprocessing consistency:** The same resize/normalise pipeline must run in both training and serving.

---
## Section 12: Exercises & Challenges

### Beginner Exercises

1. **Experiment with Model 1:** Change the number of filters in each convolutional layer (e.g., 16-32-64 instead of 32-64-128). How does this affect accuracy and training speed?

2. **Augmentation ablation:** Remove one augmentation at a time from the pipeline (e.g., disable RandomRotation). Re-train Model 1 and observe the effect on validation accuracy.

3. **Learning rate experiment:** Train Model 1 with three different learning rates: 1e-2, 1e-3, and 1e-4. Plot all three training curves on the same graph. What do you observe?

### Intermediate Exercises

4. **Kernel size comparison:** In Model 2, replace all 3x3 kernels with 5x5 kernels. Compare the parameter count and accuracy. Why do modern architectures prefer 3x3 kernels?

5. **Regularisation showdown:** Add L2 weight regularisation (`kernel_regularizer=tf.keras.regularizers.l2(1e-4)`) to Model 2's convolutional layers. Does it improve generalisation beyond what BatchNorm and Dropout already provide?

6. **Custom learning rate schedule:** Implement a **cosine annealing** schedule using `tf.keras.optimizers.schedules.CosineDecay`. Compare it with ReduceLROnPlateau.

### Advanced Exercises

7. **Different backbones:** Replace MobileNetV2 with **EfficientNetB0** (`tf.keras.applications.EfficientNetB0`). Compare accuracy, parameter count, and inference speed.

8. **Multi-class extension:** Extend the pipeline to classify **cats, dogs, and horses** using the Stanford Dogs dataset or a custom dataset. Change the output layer to softmax with 3 classes.

9. **Grad-CAM for all models:** Apply the Grad-CAM implementation to Models 1 and 2. How do the attention maps differ between a shallow CNN and a deep pre-trained model?

### Architect Challenges

10. **Serving pipeline:** Deploy Model 3 using TensorFlow Serving in a Docker container. Write a Python client that sends an image and receives a prediction.

11. **Quantisation-aware training (QAT):** Instead of post-training quantisation, use `tf.quantization` APIs to insert fake-quantisation nodes during training. Compare accuracy with post-training quantisation.

12. **Design a real-world system:** You are asked to build a pet breed classifier (120 breeds) that runs on both a mobile app and a web dashboard. Write a one-page architecture document covering:
    - Data collection and labelling strategy
    - Model selection and training infrastructure
    - On-device vs. server-side inference trade-offs
    - Monitoring, retraining triggers, and A/B testing
    - Privacy considerations (user-uploaded pet photos)

---
## Summary

| Section | Key Takeaway |
|---------|-------------|
| 3. CNN Theory | Convolutions extract hierarchical features: edges -> parts -> objects |
| 4. Dataset | Real-world datasets are large and messy; proper splits prevent data leakage |
| 5. Pipeline | `tf.data` with caching, prefetching, and augmentation keeps GPUs busy |
| 6. Basic CNN | A simple model establishes a baseline but overfits quickly |
| 7. Enhanced CNN | BatchNorm, deeper architecture, and callbacks significantly improve results |
| 8. Transfer Learning | Pre-trained models deliver the best accuracy with minimal training |
| 9. Evaluation | Confusion matrices, classification reports, and Grad-CAM reveal *why* a model succeeds or fails |
| 10. Tuning | The learning rate finder is the single most impactful tuning technique |
| 11. Production | SavedModel for serving, TFLite for mobile, quantisation for speed |
| 12. Exercises | Progressive challenges reinforce concepts and prepare you for real-world projects |

---

**Congratulations!** You have progressed from a basic 40-image classifier to a production-ready, interpretable deep learning system. The skills you have practised -- data engineering, model design, transfer learning, interpretability, and deployment -- form the core toolkit of a modern ML engineer.