<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Final%20DNN%20Code%20Examples/Imagenette/Imagenette%20-%20TFDS%20Color%20Image%20Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Imagenette - TFDS Colour Image Example

This notebook demonstrates the **Universal ML Workflow** for multi-class image classification using TensorFlow Datasets (TFDS).

## Learning Objectives

By the end of this notebook, you will be able to:
- Load image datasets from **TensorFlow Datasets (TFDS)**
- Preprocess colour images: Resize → Grayscale → Flatten → Normalise
- Handle 10-class image classification with Top-K accuracy
- Apply Hyperband for hyperparameter tuning on image data

---

## Technique Scope

| Aspect | What We Use | What We Don't Use (Yet) |
|--------|-------------|------------------------|
| **Architecture** | Dense layers only | CNNs, pooling, feature extractors |
| **Regularisation** | L2 + Dropout | Early stopping, data augmentation |
| **Optimiser** | Adam | SGD with momentum, learning rate schedules |
| **Tuning** | Hyperband | Bayesian optimisation, neural architecture search |

> **Note**: Dense networks applied to flattened images serve as a baseline. CNNs (Chapter 8) are the standard approach for image classification and would significantly improve performance on this challenging dataset.

---

## Dataset Overview

| Attribute | Description |
|-----------|-------------|
| **Source** | [TensorFlow Datasets - imagenette/160px](https://www.tensorflow.org/datasets/catalog/imagenette) |
| **Problem Type** | Multi-Class Classification (10 classes) |
| **Classes** | Tench, English springer, Cassette player, Chain saw, Church, French horn, Garbage truck, Gas pump, Golf ball, Parachute |
| **Data Balance** | Nearly Balanced |
| **Total Images** | ~13,000 images |
| **Preprocessing** | Resize to 16×16 → Grayscale → Flatten (256 features) |

### Imagenette vs Fashion MNIST

| Aspect | Imagenette | Fashion MNIST |
|--------|------------|---------------|
| **Original Images** | 160×160 colour | 28×28 grayscale |
| **Number of Classes** | 10 | 10 |
| **Image Content** | Real-world photos | Synthetic clothing |
| **Difficulty** | Harder (high variation) | Moderate |
| **Dataset Size** | ~13,000 | 70,000 |

---

## Code Reuse Philosophy

This notebook follows a **"Same Code, Different Data"** philosophy. The core ML pipeline remains consistent across different classification tasks:

```
┌─────────────────────────────────────────────────────────────────┐
│                    UNIVERSAL ML PIPELINE                        │
├─────────────────────────────────────────────────────────────────┤
│  Data Loading → Preprocessing → Train/Val/Test Split → Model   │
│  → Baseline → Overfitting → Regularisation → Evaluation        │
└─────────────────────────────────────────────────────────────────┘
```

**What changes:** Data source, preprocessing, number of output classes  
**What stays the same:** Model architecture pattern, training loop, evaluation code

---

## 1. Defining the Problem and Assembling a Dataset

**Problem:** Classify images of 10 different objects from the Imagenette dataset - a smaller subset of ImageNet designed for faster experimentation.

**Why Imagenette?** ImageNet has 1000 classes and millions of images, making it slow to iterate on. Imagenette provides a 10-class subset that's large enough to be challenging but small enough for rapid prototyping.

**Why TFDS?** TensorFlow Datasets provides easy access to common ML datasets with consistent APIs, automatic caching, and preprocessing utilities.

## 2. Choosing a Measure of Success

### Data-Driven Metric Selection

| Criterion | This Dataset | Decision |
|-----------|--------------|----------|
| **Class Balance** | ~Equal across 10 classes | Balanced |
| **Number of Classes** | 10 | Multi-class |
| **Primary Metric** | Accuracy | Standard for balanced multi-class |
| **Secondary Metrics** | Top-K Accuracy | Additional insight for multi-class |

**Decision:** Since the dataset is balanced, **Accuracy** is the primary metric. **Top-K Accuracy** shows if the correct class was among the model's top K predictions.

## 3. Deciding on an Evaluation Protocol

### Data-Driven Protocol Selection

| Criterion | This Dataset | Decision |
|-----------|--------------|----------|
| **Sample Size** | ~13,000 images | Large |
| **Threshold** | > 10,000 | Use Hold-Out |
| **Protocol** | Train/Validation/Test | 80%/10%/10% split |

**Decision:** With ~13,000 samples, **Hold-Out validation** is appropriate.

## 4. Preparing Your Data

### 4.1 Import Libraries and Load TFDS Dataset

In [None]:
import os
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

from skimage.color import rgb2gray
from skimage.transform import resize

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
import tensorflow_datasets as tfds
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Keras Tuner for hyperparameter search
%pip install -q -U keras-tuner
import keras_tuner as kt

import matplotlib.pyplot as plt

# ============================================================
# RANDOM SEED - Set once, use everywhere
# ============================================================
SEED = 204

tf.random.set_seed(SEED)
np.random.seed(SEED)

import warnings
warnings.filterwarnings('ignore')

### 4.2 Configuration

In [None]:
# ============================================================
# DATASET CONFIGURATION
# ============================================================
DATASET = 'imagenette/160px'
RESIZE = (16, 16)  # Smaller size for dense network
GRAY_SCALE = True  # Convert to grayscale for simplicity

# ============================================================
# CLASS NAMES
# ============================================================
CLASS_NAMES = [
    'Tench', 'English springer', 'Cassette player', 'Chain saw', 'Church',
    'French horn', 'Garbage truck', 'Gas pump', 'Golf ball', 'Parachute'
]

### 4.3 Load and Preprocess Data

In [None]:
# Load dataset from TensorFlow Datasets
# Combine train and validation splits for custom splitting
ds_train = tfds.load(DATASET, split='train', shuffle_files=True)
ds_val = tfds.load(DATASET, split='validation', shuffle_files=True)

# Process images from both splits
images, labels = [], []

for ds in [ds_train, ds_val]:
    for entry in ds:
        image, label = entry['image'].numpy(), entry['label'].numpy()
        
        # Resize to target size
        image = resize(image, (*RESIZE, 3), anti_aliasing=True)
        
        # Convert to grayscale if specified
        if GRAY_SCALE:
            image = rgb2gray(image)
        
        images.append(image)
        labels.append(label)

print(f"Loaded {len(images)} images")

In [None]:
# Convert to numpy arrays
X = np.array(images)
y_raw = np.array(labels)

# Flatten images: (N, 16, 16) -> (N, 256)
X = X.reshape((X.shape[0], -1))

# One-hot encode labels
y = to_categorical(y_raw)

print(f"Features shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Number of classes: {y.shape[1]}")

### 4.4 Verify Class Balance

In [None]:
# Check class distribution
unique, counts = np.unique(y_raw, return_counts=True)

print("Class Distribution:")
for class_idx, count in zip(unique, counts):
    print(f"  {CLASS_NAMES[class_idx]}: {count} ({100*count/len(y_raw):.1f}%)")

# Calculate imbalance ratio
imbalance_ratio = max(counts) / min(counts)
print(f"\nImbalance Ratio: {imbalance_ratio:.2f}:1")
print(f"Decision: {'Use Accuracy (balanced)' if imbalance_ratio < 3 else 'Use F1-Score (imbalanced)'}")

### 4.5 Train/Test Split

In [None]:
# ============================================================
# TRAIN/TEST SPLIT (90%/10%)
# ============================================================
TEST_SIZE = 0.10

X_train_full, X_test, y_train_full, y_test = train_test_split(
    X, y,
    test_size=TEST_SIZE,
    stratify=y_raw,
    random_state=SEED,
    shuffle=True
)

# Also keep raw labels for test set
_, _, y_train_full_raw, y_test_raw = train_test_split(
    X, y_raw,
    test_size=TEST_SIZE,
    stratify=y_raw,
    random_state=SEED,
    shuffle=True
)

print(f"Training + Validation: {X_train_full.shape[0]} samples")
print(f"Test: {X_test.shape[0]} samples")

### 4.6 Normalise Features

In [None]:
# ============================================================
# NORMALISE PIXEL VALUES [0, 1]
# ============================================================
# Note: skimage resize already normalises to [0, 1], but we ensure it
X_train_full = X_train_full.astype('float32')
X_test = X_test.astype('float32')

# Verify normalisation
print(f"Feature range: [{X_train_full.min():.3f}, {X_train_full.max():.3f}]")

### 4.7 Train/Validation Split

In [None]:
# ============================================================
# TRAIN/VALIDATION SPLIT
# ============================================================
# Use same number of samples for validation as test
VALIDATION_SIZE = X_test.shape[0]

X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full,
    test_size=VALIDATION_SIZE,
    stratify=y_train_full.argmax(axis=1),
    random_state=SEED,
    shuffle=True
)

# Keep raw labels for train set (for class weights)
y_train_raw = y_train.argmax(axis=1)

print(f"Training: {X_train.shape[0]} samples")
print(f"Validation: {X_val.shape[0]} samples")
print(f"Test: {X_test.shape[0]} samples")

### 4.8 Visualise Sample Images

In [None]:
# Display sample images from each class
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
fig.suptitle('Sample Images (16×16 Grayscale)', fontsize=14)

for class_idx in range(10):
    # Get first sample of this class
    sample_idx = np.where(y_train_raw == class_idx)[0][0]
    
    ax = axes[class_idx // 5, class_idx % 5]
    # Reshape flattened image back to 2D
    img = X_train[sample_idx].reshape(RESIZE)
    ax.imshow(img, cmap='gray')
    ax.axis('off')
    ax.set_title(CLASS_NAMES[class_idx], fontsize=9)

plt.tight_layout()
plt.show()

## 5. Developing a Model That Does Better Than a Baseline

**Baseline for 10-class balanced problem:** 10% accuracy (random guessing)

In [None]:
# ============================================================
# MODEL CONFIGURATION
# ============================================================
INPUT_DIMENSION = X_train.shape[1]  # 256 features (16x16)
OUTPUT_CLASSES = y_train.shape[1]   # 10 classes

OPTIMIZER = 'adam'
LOSS_FUNC = 'categorical_crossentropy'
METRICS = ['accuracy']

# Training configuration
BATCH_SIZE = 512
EPOCHS_BASELINE = 100
EPOCHS_REGULARIZED = 150

print(f"Input Dimension: {INPUT_DIMENSION}")
print(f"Output Classes: {OUTPUT_CLASSES}")
print(f"Batch Size: {BATCH_SIZE}")

In [None]:
# ============================================================
# ESTABLISH BASELINE
# ============================================================
# For balanced 10-class classification, random guessing = 10%
baseline_accuracy = 1.0 / OUTPUT_CLASSES

print(f"Baseline Accuracy (random guessing): {baseline_accuracy:.2f}")

In [None]:
# ============================================================
# CLASS WEIGHTS (for balanced training)
# ============================================================
weights = compute_class_weight('balanced', classes=np.unique(y_train_raw), y=y_train_raw)
CLASS_WEIGHTS = dict(enumerate(weights))

print("Class Weights (sample):")
for class_idx in [0, 1, 2]:
    print(f"  {CLASS_NAMES[class_idx]}: {CLASS_WEIGHTS[class_idx]:.4f}")
print("  ...")

In [None]:
# ============================================================
# SINGLE LAYER PERCEPTRON (SLP) - Simplest possible model
# ============================================================
slp_model = Sequential(name='Single_Layer_Perceptron')
slp_model.add(Dense(OUTPUT_CLASSES, activation='softmax', input_shape=(INPUT_DIMENSION,)))
slp_model.compile(optimizer=OPTIMIZER, loss=LOSS_FUNC, metrics=METRICS)

slp_model.summary()

In [None]:
# Train SLP
slp_history = slp_model.fit(
    X_train, y_train,
    class_weight=CLASS_WEIGHTS,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS_BASELINE,
    validation_data=(X_val, y_val),
    verbose=0
)

slp_val_acc = slp_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"SLP Validation Accuracy: {slp_val_acc:.4f} (baseline: {baseline_accuracy:.2f})")

In [None]:
# ============================================================
# PLOT TRAINING HISTORY
# ============================================================
def plot_training_history(history, title='Training History'):
    """Plot training and validation loss/accuracy curves."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Loss
    axes[0].plot(history.history['loss'], 'b-', label='Training Loss')
    axes[0].plot(history.history['val_loss'], 'r-', label='Validation Loss')
    axes[0].set_title('Training and Validation Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Accuracy
    axes[1].plot(history.history['accuracy'], 'b-', label='Training Accuracy')
    axes[1].plot(history.history['val_accuracy'], 'r-', label='Validation Accuracy')
    axes[1].set_title('Training and Validation Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.suptitle(title, fontsize=14)
    plt.tight_layout()
    plt.show()

plot_training_history(slp_history, 'Single Layer Perceptron')

## 6. Scaling Up: Developing a Model That Overfits

Adding a hidden layer to learn more complex features for distinguishing between 10 diverse object classes.

In [None]:
# ============================================================
# MULTI-LAYER PERCEPTRON (MLP) - Standard architecture
# ============================================================
HIDDEN_NEURONS = 64

mlp_model = Sequential(name='Multi_Layer_Perceptron')
mlp_model.add(Dense(HIDDEN_NEURONS, activation='relu', input_shape=(INPUT_DIMENSION,)))
mlp_model.add(Dense(OUTPUT_CLASSES, activation='softmax'))
mlp_model.compile(optimizer=OPTIMIZER, loss=LOSS_FUNC, metrics=METRICS)

mlp_model.summary()

In [None]:
# Train MLP
mlp_history = mlp_model.fit(
    X_train, y_train,
    class_weight=CLASS_WEIGHTS,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS_BASELINE,
    validation_data=(X_val, y_val),
    verbose=0
)

mlp_val_acc = mlp_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"MLP Validation Accuracy: {mlp_val_acc:.4f} (baseline: {baseline_accuracy:.2f})")
print(f"Improvement over SLP: {(mlp_val_acc - slp_val_acc)*100:.2f}%")

In [None]:
plot_training_history(mlp_history, 'Multi-Layer Perceptron (1 Hidden Layer)')

## 7. Regularising Your Model and Tuning Hyperparameters

Using **Hyperband** for efficient hyperparameter tuning with L2 regularisation and Dropout.

### Why Hyperband?

**Hyperband** is more efficient than grid search because it:
1. Starts training many configurations for a few epochs
2. Eliminates poor performers early
3. Allocates more resources to promising configurations

### Regularisation Strategy

| Technique | Purpose | How It Works |
|-----------|---------|-------------|
| **L2 Regularisation** | Prevent large weights | Adds penalty term to loss |
| **Dropout** | Prevent co-adaptation | Randomly zeros neurons during training |

In [None]:
# ============================================================
# HYPERBAND MODEL BUILDER
# ============================================================
def build_model_hyperband(hp):
    """
    Build Imagenette model with FROZEN architecture (1 hidden layer, 64 neurons).
    Tunes: L2 regularisation, Dropout rate, Learning rate.
    """
    model = keras.Sequential()
    model.add(layers.Input(shape=(INPUT_DIMENSION,)))
    
    # Hyperparameters to tune
    l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')
    dropout_rate = hp.Float('dropout_rate', min_value=0.0, max_value=0.5, step=0.1)
    learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='log')
    
    # Hidden layer with L2 regularisation
    model.add(layers.Dense(
        HIDDEN_NEURONS,
        activation='relu',
        kernel_regularizer=regularizers.l2(l2_reg)
    ))
    model.add(layers.Dropout(dropout_rate))
    
    # Output layer
    model.add(layers.Dense(OUTPUT_CLASSES, activation='softmax'))
    
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss=LOSS_FUNC,
        metrics=METRICS
    )
    
    return model

In [None]:
# ============================================================
# CONFIGURE AND RUN HYPERBAND TUNER
# ============================================================
tuner = kt.Hyperband(
    build_model_hyperband,
    objective='val_accuracy',
    max_epochs=50,
    factor=3,
    directory='imagenette_hyperband',
    project_name='imagenette_tuning',
    overwrite=True
)

# Run search
tuner.search(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=BATCH_SIZE,
    class_weight=CLASS_WEIGHTS,
    verbose=0
)

In [None]:
# ============================================================
# GET BEST HYPERPARAMETERS
# ============================================================
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]

print("Best Hyperparameters:")
print(f"  L2 Regularisation: {best_hp.get('l2_reg'):.6f}")
print(f"  Dropout Rate: {best_hp.get('dropout_rate'):.2f}")
print(f"  Learning Rate: {best_hp.get('learning_rate'):.6f}")

In [None]:
# ============================================================
# BUILD AND TRAIN BEST MODEL
# ============================================================
best_model = tuner.hypermodel.build(best_hp)
best_model.summary()

In [None]:
# Train the best model with more epochs
best_history = best_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=EPOCHS_REGULARIZED,
    batch_size=BATCH_SIZE,
    class_weight=CLASS_WEIGHTS,
    verbose=0
)

best_val_acc = best_model.evaluate(X_val, y_val, verbose=0)[1]
print(f"Best Model Validation Accuracy: {best_val_acc:.4f}")
print(f"Improvement over MLP: {(best_val_acc - mlp_val_acc)*100:.2f}%")

In [None]:
plot_training_history(best_history, 'Regularised Model (L2 + Dropout)')

## 8. Final Evaluation

Evaluate the best model on the held-out test set.

In [None]:
# ============================================================
# TOP-K ACCURACY FUNCTION
# ============================================================
def top_k_accuracy(y_true, y_pred_proba, k):
    """
    Calculate Top-K accuracy: was the true class in the model's top K predictions?
    
    Args:
        y_true: True class labels (integer indices)
        y_pred_proba: Predicted probabilities (N x num_classes)
        k: Number of top predictions to consider
    
    Returns:
        Top-K accuracy score
    """
    top_k_preds = np.argsort(y_pred_proba, axis=1)[:, -k:]
    correct = sum(y_true[i] in top_k_preds[i] for i in range(len(y_true)))
    return correct / len(y_true)

In [None]:
# ============================================================
# TEST SET EVALUATION
# ============================================================
# Get predictions
y_pred_proba = best_model.predict(X_test, verbose=0)
y_pred = y_pred_proba.argmax(axis=1)

# Calculate metrics
test_accuracy = accuracy_score(y_test_raw, y_pred)
top_3_accuracy = top_k_accuracy(y_test_raw, y_pred_proba, k=3)
top_5_accuracy = top_k_accuracy(y_test_raw, y_pred_proba, k=5)

print("="*50)
print("FINAL TEST SET RESULTS")
print("="*50)
print(f"Top-1 Accuracy: {test_accuracy:.4f} (baseline: {baseline_accuracy:.2f})")
print(f"Top-3 Accuracy: {top_3_accuracy:.4f}")
print(f"Top-5 Accuracy: {top_5_accuracy:.4f}")
print("="*50)

In [None]:
# ============================================================
# CONFUSION MATRIX
# ============================================================
cm = confusion_matrix(y_test_raw, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=CLASS_NAMES)

fig, ax = plt.subplots(figsize=(12, 10))
disp.plot(ax=ax, cmap='Blues', xticks_rotation=45)
plt.title('Confusion Matrix - Test Set')
plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# PER-CLASS ACCURACY
# ============================================================
print("\nPer-Class Accuracy:")
print("-"*40)
for class_idx in range(OUTPUT_CLASSES):
    class_mask = y_test_raw == class_idx
    class_correct = (y_pred[class_mask] == class_idx).sum()
    class_total = class_mask.sum()
    class_acc = class_correct / class_total
    print(f"{CLASS_NAMES[class_idx]:<20}: {class_acc:.2f} ({class_correct}/{class_total})")

## Model Comparison Summary

In [None]:
# ============================================================
# MODEL COMPARISON
# ============================================================
print("\nModel Comparison (Validation Accuracy):")
print("="*50)
print(f"{'Model':<30} {'Accuracy':>10}")
print("-"*50)
print(f"{'Baseline (random)':<30} {baseline_accuracy:>10.4f}")
print(f"{'Single Layer Perceptron':<30} {slp_val_acc:>10.4f}")
print(f"{'Multi-Layer Perceptron':<30} {mlp_val_acc:>10.4f}")
print(f"{'Regularised (L2 + Dropout)':<30} {best_val_acc:>10.4f}")
print("="*50)

---

## Key Takeaways

1. **TFDS simplifies data loading:** `tfds.load()` handles download, caching, and parsing automatically

2. **Top-K Accuracy:** Useful metric for multi-class problems - shows if the correct answer was in the model's top K guesses

3. **Image preprocessing:** Resize → Grayscale → Flatten sacrifices spatial information for simplicity (CNNs preserve it)

4. **Challenging dataset:** Real-world photos with high variation are harder than synthetic datasets like Fashion MNIST

5. **Dense network limitations:** DNNs struggle with complex image features - CNNs would significantly improve performance

### Next Steps for Better Performance

- **Use CNNs** (Chapter 8) - preserves spatial structure, learns hierarchical features
- **Higher resolution** - 16×16 loses significant detail from 160×160 originals
- **Transfer learning** - use pre-trained models (ResNet, EfficientNet)
- **Data augmentation** - artificially increase training data variety