<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Final%20DNN%20Code%20Examples/CatVsDog/CatVsDog%20-%20TFDS%20Color%20Binary%20Image%20Classification%20Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CatVsDog - TFDS Colour Binary Image Classification Example

This notebook demonstrates the **Universal ML Workflow** for binary image classification using TensorFlow Datasets.

## Learning Objectives

By the end of this notebook, you will be able to:
- Load image datasets from TensorFlow Datasets
- Apply binary classification to image data
- Preprocess colour images: Resize → Grayscale → Flatten → Normalise
- Compare binary image classification with multi-class approaches

---

## Technique Scope

| Aspect | What We Use | What We Don't Use (Yet) |
|--------|-------------|------------------------|
| **Architecture** | Dense layers only | CNNs, pooling, feature extractors |
| **Regularisation** | L2 + Dropout | Early stopping, data augmentation |
| **Optimiser** | Adam | SGD with momentum, learning rate schedules |
| **Tuning** | Hyperband | Bayesian optimisation, neural architecture search |

> **Note**: Dense networks applied to flattened images serve as a baseline. CNNs (Chapter 8) are the standard approach for image classification and would significantly improve performance.

---

## Dataset Overview

| Attribute | Description |
|-----------|-------------|
| **Source** | [TensorFlow Datasets - cats_vs_dogs](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs) |
| **Problem Type** | Binary Classification (Cat vs Dog) |
| **Data Balance** | Nearly Balanced (~50:50) |
| **Total Images** | ~23,000 images |
| **Preprocessing** | Resize to 16×16 → Grayscale → Flatten (256 features) |

### Binary Image Classification vs Multi-class

| Aspect | Binary (CatVsDog) | Multi-class (Fashion MNIST) |
|--------|-------------------|----------------------------|
| **Output Layer** | 1 neuron, sigmoid | K neurons, softmax |
| **Loss Function** | binary_crossentropy | categorical_crossentropy |
| **Prediction** | Threshold at 0.5 | argmax of probabilities |
| **Primary Metric** | Accuracy (balanced) | Accuracy |

---

## Code Reuse Philosophy

This notebook follows a **"Same Code, Different Data"** philosophy. The core ML pipeline remains consistent across different classification tasks:

```
┌─────────────────────────────────────────────────────────────────┐
│                    UNIVERSAL ML PIPELINE                        │
├─────────────────────────────────────────────────────────────────┤
│  Data Loading → Preprocessing → Train/Val/Test Split → Model   │
│  → Baseline → Overfitting → Regularisation → Evaluation        │
└─────────────────────────────────────────────────────────────────┘
```

**What changes:** Data source, preprocessing, output configuration  
**What stays the same:** Model architecture pattern, training loop, evaluation code

---

## 1. Defining the Problem and Assembling a Dataset

**Problem:** Classify images as either cats or dogs - a classic binary image classification task.

**Why this dataset?** Cats vs. Dogs is one of the most well-known binary image classification benchmarks. It's challenging because:
- High intra-class variation (many breeds, poses, backgrounds)
- Requires distinguishing subtle features between two similar-looking animals
- Real-world photographs with varying quality and composition

## 2. Choosing a Measure of Success

### Data-Driven Metric Selection

| Criterion | This Dataset | Decision |
|-----------|--------------|----------|
| **Class Balance** | ~50:50 (Cat:Dog) | Balanced |
| **Imbalance Ratio** | ~1:1 | < 3:1 threshold |
| **Primary Metric** | Accuracy | Standard for balanced data |
| **Secondary Metrics** | Precision, Recall, AUC | Additional insight |

**Decision:** Since the dataset is balanced, **Accuracy** is an appropriate primary metric.

## 3. Deciding on an Evaluation Protocol

### Data-Driven Protocol Selection

| Criterion | This Dataset | Decision |
|-----------|--------------|----------|
| **Sample Size** | ~23,000 images | Large |
| **Threshold** | > 10,000 | Use Hold-Out |
| **Protocol** | Train/Validation/Test | 80%/10%/10% split |

**Decision:** With ~23,000 samples, **Hold-Out validation** is appropriate.

## 4. Preparing Your Data

### 4.1 Import Libraries and Load TFDS Dataset

In [None]:
import os
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

from skimage.color import rgb2gray
from skimage.transform import resize

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
import tensorflow_datasets as tfds
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Keras Tuner for hyperparameter search
%pip install -q -U keras-tuner
import keras_tuner as kt

import matplotlib.pyplot as plt

# ============================================================
# RANDOM SEED - Set once, use everywhere
# ============================================================
SEED = 204

tf.random.set_seed(SEED)
np.random.seed(SEED)

import warnings
warnings.filterwarnings('ignore')

### 4.2 Configuration

In [None]:
# ============================================================
# DATASET CONFIGURATION
# ============================================================
DATASET = 'cats_vs_dogs'
RESIZE = (16, 16)  # Smaller size for dense network
GRAY_SCALE = True  # Convert to grayscale for simplicity

# ============================================================
# CLASS NAMES
# ============================================================
CLASS_NAMES = ['Cat', 'Dog']

### 4.3 Load and Preprocess Data

In [None]:
# Load dataset from TensorFlow Datasets
ds = tfds.load(DATASET, split='train', shuffle_files=True)

# Process images
images, labels = [], []
for entry in ds:
    image, label = entry['image'].numpy(), entry['label'].numpy()
    
    # Resize to target size
    image = resize(image, (*RESIZE, 3), anti_aliasing=True)
    
    # Convert to grayscale if specified
    if GRAY_SCALE:
        image = rgb2gray(image)
    
    images.append(image)
    labels.append(label)

print(f"Loaded {len(images)} images")

In [None]:
# Convert to numpy arrays
X = np.array(images)
y = np.array(labels)

# Flatten images: (N, 16, 16) -> (N, 256)
X = X.reshape((X.shape[0], -1))

print(f"Features shape: {X.shape}")
print(f"Labels shape: {y.shape}")

### 4.4 Verify Class Balance

In [None]:
# Check class distribution
unique, counts = np.unique(y, return_counts=True)
class_dist = dict(zip([CLASS_NAMES[i] for i in unique], counts))

print("Class Distribution:")
for class_name, count in class_dist.items():
    print(f"  {class_name}: {count} ({100*count/len(y):.1f}%)")

# Calculate imbalance ratio
imbalance_ratio = max(counts) / min(counts)
print(f"\nImbalance Ratio: {imbalance_ratio:.2f}:1")
print(f"Decision: {'Use Accuracy (balanced)' if imbalance_ratio < 3 else 'Use F1-Score (imbalanced)'}")

### 4.5 Train/Test Split

In [None]:
# ============================================================
# TRAIN/TEST SPLIT (90%/10%)
# ============================================================
TEST_SIZE = 0.10

X_train_full, X_test, y_train_full, y_test = train_test_split(
    X, y, 
    test_size=TEST_SIZE, 
    stratify=y,
    random_state=SEED, 
    shuffle=True
)

print(f"Training + Validation: {X_train_full.shape[0]} samples")
print(f"Test: {X_test.shape[0]} samples")

### 4.6 Normalise Features

In [None]:
# ============================================================
# NORMALISE PIXEL VALUES [0, 1]
# ============================================================
# Note: skimage resize already normalises to [0, 1], but we ensure it
X_train_full = X_train_full.astype('float32')
X_test = X_test.astype('float32')

# Verify normalisation
print(f"Feature range: [{X_train_full.min():.3f}, {X_train_full.max():.3f}]")

### 4.7 Train/Validation Split

In [None]:
# ============================================================
# TRAIN/VALIDATION SPLIT
# ============================================================
# Use same number of samples for validation as test
VALIDATION_SIZE = X_test.shape[0]

X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full,
    test_size=VALIDATION_SIZE,
    stratify=y_train_full,
    random_state=SEED,
    shuffle=True
)

print(f"Training: {X_train.shape[0]} samples")
print(f"Validation: {X_val.shape[0]} samples")
print(f"Test: {X_test.shape[0]} samples")

### 4.8 Visualise Sample Images

In [None]:
# Display sample images from each class
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
fig.suptitle('Sample Images (16×16 Grayscale)', fontsize=14)

for class_idx, class_name in enumerate(CLASS_NAMES):
    # Get indices for this class
    class_indices = np.where(y_train == class_idx)[0][:5]
    
    for i, idx in enumerate(class_indices):
        ax = axes[class_idx, i]
        # Reshape flattened image back to 2D
        img = X_train[idx].reshape(RESIZE)
        ax.imshow(img, cmap='gray')
        ax.axis('off')
        if i == 0:
            ax.set_ylabel(class_name, fontsize=12)

plt.tight_layout()
plt.show()

## 5. Developing a Model That Does Better Than a Baseline

**Baseline for balanced binary classification:** 50% accuracy (random guessing)

In [None]:
# ============================================================
# MODEL CONFIGURATION
# ============================================================
INPUT_DIMENSION = X_train.shape[1]  # 256 features (16x16)
OUTPUT_DIMENSION = 1  # Binary classification

OPTIMIZER = 'adam'
LOSS_FUNC = 'binary_crossentropy'
METRICS = ['accuracy']

# Training configuration
BATCH_SIZE = 512
EPOCHS_BASELINE = 100
EPOCHS_REGULARIZED = 150

print(f"Input Dimension: {INPUT_DIMENSION}")
print(f"Output Dimension: {OUTPUT_DIMENSION}")
print(f"Batch Size: {BATCH_SIZE}")

In [None]:
# ============================================================
# ESTABLISH BASELINE
# ============================================================
# For balanced binary classification, random guessing = 50%
baseline_accuracy = 0.5

print(f"Baseline Accuracy (random guessing): {baseline_accuracy:.2f}")

In [None]:
# ============================================================
# CLASS WEIGHTS (for balanced training)
# ============================================================
weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
CLASS_WEIGHTS = dict(enumerate(weights))

print("Class Weights:")
for class_idx, weight in CLASS_WEIGHTS.items():
    print(f"  {CLASS_NAMES[class_idx]}: {weight:.4f}")

In [None]:
# ============================================================
# SINGLE LAYER PERCEPTRON (SLP) - Simplest possible model
# ============================================================
slp_model = Sequential(name='Single_Layer_Perceptron')
slp_model.add(Dense(OUTPUT_DIMENSION, activation='sigmoid', input_shape=(INPUT_DIMENSION,)))
slp_model.compile(optimizer=OPTIMIZER, loss=LOSS_FUNC, metrics=METRICS)

slp_model.summary()

In [None]:
# Train SLP
slp_history = slp_model.fit(
    X_train, y_train,
    class_weight=CLASS_WEIGHTS,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS_BASELINE,
    validation_data=(X_val, y_val),
    verbose=0
)

slp_val_acc = slp_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"SLP Validation Accuracy: {slp_val_acc:.4f} (baseline: {baseline_accuracy:.2f})")

In [None]:
# ============================================================
# PLOT TRAINING HISTORY
# ============================================================
def plot_training_history(history, title='Training History'):
    """Plot training and validation loss/accuracy curves."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Loss
    axes[0].plot(history.history['loss'], 'b-', label='Training Loss')
    axes[0].plot(history.history['val_loss'], 'r-', label='Validation Loss')
    axes[0].set_title('Training and Validation Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Accuracy
    axes[1].plot(history.history['accuracy'], 'b-', label='Training Accuracy')
    axes[1].plot(history.history['val_accuracy'], 'r-', label='Validation Accuracy')
    axes[1].set_title('Training and Validation Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.suptitle(title, fontsize=14)
    plt.tight_layout()
    plt.show()

plot_training_history(slp_history, 'Single Layer Perceptron')

## 6. Scaling Up: Developing a Model That Overfits

Adding a hidden layer to learn more complex features for distinguishing cats from dogs.

In [None]:
# ============================================================
# MULTI-LAYER PERCEPTRON (MLP) - Standard architecture
# ============================================================
HIDDEN_NEURONS = 64

mlp_model = Sequential(name='Multi_Layer_Perceptron')
mlp_model.add(Dense(HIDDEN_NEURONS, activation='relu', input_shape=(INPUT_DIMENSION,)))
mlp_model.add(Dense(OUTPUT_DIMENSION, activation='sigmoid'))
mlp_model.compile(optimizer=OPTIMIZER, loss=LOSS_FUNC, metrics=METRICS)

mlp_model.summary()

In [None]:
# Train MLP
mlp_history = mlp_model.fit(
    X_train, y_train,
    class_weight=CLASS_WEIGHTS,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS_BASELINE,
    validation_data=(X_val, y_val),
    verbose=0
)

mlp_val_acc = mlp_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"MLP Validation Accuracy: {mlp_val_acc:.4f} (baseline: {baseline_accuracy:.2f})")
print(f"Improvement over SLP: {(mlp_val_acc - slp_val_acc)*100:.2f}%")

In [None]:
plot_training_history(mlp_history, 'Multi-Layer Perceptron (1 Hidden Layer)')

## 7. Regularising Your Model and Tuning Hyperparameters

Using **Hyperband** for efficient hyperparameter tuning with L2 regularisation and Dropout.

### Why Hyperband?

**Hyperband** is more efficient than grid search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

### Regularisation Strategy

| Technique | Purpose | How It Works |
|-----------|---------|-------------|
| **L2 Regularisation** | Prevent large weights | Adds penalty term to loss |
| **Dropout** | Prevent co-adaptation | Randomly zeros neurons during training |

In [None]:
# ============================================================
# HYPERBAND MODEL BUILDER
# ============================================================
def build_model_hyperband(hp):
    """
    Build CatVsDog model with FROZEN architecture (1 hidden layer, 64 neurons).
    Tunes: L2 regularisation, Dropout rate, Learning rate.
    """
    model = keras.Sequential()
    model.add(layers.Input(shape=(INPUT_DIMENSION,)))
    
    # Hyperparameters to tune
    l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')
    dropout_rate = hp.Float('dropout_rate', min_value=0.0, max_value=0.5, step=0.1)
    learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='log')
    
    # Hidden layer with L2 regularisation
    model.add(layers.Dense(
        HIDDEN_NEURONS,
        activation='relu',
        kernel_regularizer=regularizers.l2(l2_reg)
    ))
    model.add(layers.Dropout(dropout_rate))
    
    # Output layer
    model.add(layers.Dense(OUTPUT_DIMENSION, activation='sigmoid'))
    
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss=LOSS_FUNC,
        metrics=METRICS
    )
    
    return model

In [None]:
# ============================================================
# CONFIGURE AND RUN HYPERBAND TUNER
# ============================================================
tuner = kt.Hyperband(
    build_model_hyperband,
    objective='val_accuracy',
    max_epochs=50,
    factor=3,
    directory='catvsdog_hyperband',
    project_name='catvsdog_tuning',
    overwrite=True
)

# Run search
tuner.search(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=BATCH_SIZE,
    class_weight=CLASS_WEIGHTS,
    verbose=0
)

In [None]:
# ============================================================
# GET BEST HYPERPARAMETERS
# ============================================================
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]

print("Best Hyperparameters:")
print(f"  L2 Regularisation: {best_hp.get('l2_reg'):.6f}")
print(f"  Dropout Rate: {best_hp.get('dropout_rate'):.2f}")
print(f"  Learning Rate: {best_hp.get('learning_rate'):.6f}")

In [None]:
# ============================================================
# BUILD AND TRAIN BEST MODEL
# ============================================================
best_model = tuner.hypermodel.build(best_hp)
best_model.summary()

In [None]:
# Train the best model with more epochs
best_history = best_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=EPOCHS_REGULARIZED,
    batch_size=BATCH_SIZE,
    class_weight=CLASS_WEIGHTS,
    verbose=0
)

best_val_acc = best_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"Best Model Validation Accuracy: {best_val_acc:.4f}")
print(f"Improvement over MLP: {(best_val_acc - mlp_val_acc)*100:.2f}%")

In [None]:
plot_training_history(best_history, 'Regularised Model (L2 + Dropout)')

## 8. Final Evaluation

Evaluate the best model on the held-out test set.

In [None]:
# ============================================================
# TEST SET EVALUATION
# ============================================================
# Get predictions
y_pred_proba = best_model.predict(X_test, verbose=0)
y_pred = (y_pred_proba > 0.5).astype('int32').flatten()

# Calculate metrics
test_accuracy = accuracy_score(y_test, y_pred)
test_precision = precision_score(y_test, y_pred)
test_recall = recall_score(y_test, y_pred)
test_auc = roc_auc_score(y_test, y_pred_proba)

print("="*50)
print("FINAL TEST SET RESULTS")
print("="*50)
print(f"Accuracy:  {test_accuracy:.4f} (baseline: {baseline_accuracy:.2f})")
print(f"Precision: {test_precision:.4f}")
print(f"Recall:    {test_recall:.4f}")
print(f"AUC:       {test_auc:.4f}")
print("="*50)

In [None]:
# ============================================================
# CONFUSION MATRIX
# ============================================================
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=CLASS_NAMES)

fig, ax = plt.subplots(figsize=(8, 6))
disp.plot(ax=ax, cmap='Blues')
plt.title('Confusion Matrix - Test Set')
plt.show()

# Print detailed breakdown
print("\nConfusion Matrix Breakdown:")
print(f"  True Cats correctly classified: {cm[0,0]}")
print(f"  True Dogs correctly classified: {cm[1,1]}")
print(f"  Cats misclassified as Dogs: {cm[0,1]}")
print(f"  Dogs misclassified as Cats: {cm[1,0]}")

## Model Comparison Summary

In [None]:
# ============================================================
# MODEL COMPARISON
# ============================================================
print("\nModel Comparison (Validation Accuracy):")
print("="*50)
print(f"{'Model':<30} {'Accuracy':>10}")
print("-"*50)
print(f"{'Baseline (random)':<30} {baseline_accuracy:>10.4f}")
print(f"{'Single Layer Perceptron':<30} {slp_val_acc:>10.4f}")
print(f"{'Multi-Layer Perceptron':<30} {mlp_val_acc:>10.4f}")
print(f"{'Regularised (L2 + Dropout)':<30} {best_val_acc:>10.4f}")
print("="*50)

---

## Key Takeaways

1. **Binary Image Classification:** Uses sigmoid output with a single neuron and binary cross-entropy loss

2. **Preprocessing Pipeline:** Resize → Grayscale → Flatten → Normalise converts images to vectors for dense networks

3. **Balanced Data:** With ~50:50 class distribution, accuracy is an appropriate metric

4. **Dense Network Limitations:** DNNs struggle with complex image features - CNNs would significantly improve performance

5. **Regularisation Helps:** L2 + Dropout reduces overfitting and improves generalisation

### Next Steps for Better Performance

- **Use CNNs** (Chapter 8) - preserves spatial structure
- **Data augmentation** - artificially increase training data
- **Transfer learning** - use pre-trained models (VGG, ResNet)
- **Higher resolution** - more detail, but requires more compute