# MNIST Deep Learning Notebook

**Goal:** Use the MNIST handwritten digits dataset to demonstrate:

- Artificial Neural Network (ANN)
- Activation Functions (ReLU, Softmax, etc.)
- Loss Functions (Categorical Crossentropy)
- Optimization Techniques (SGD, Adam, etc.)
- Regularization Techniques (L2, Dropout, Early Stopping)
- Callbacks (EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, CSVLogger, TensorBoard)

This notebook is written for **technical and non-technical learners** with **detailed comments** in each step.

In [None]:
#!/usr/bin/env python
"""MNIST Deep Learning Demo with ANN, Activations, Losses, Optimizers, Regularization, and Callbacks.

This notebook can be run cell-by-cell for teaching and learning.
"""

import os
import datetime

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import (
    EarlyStopping,
    ModelCheckpoint,
    ReduceLROnPlateau,
    CSVLogger,
    TensorBoard,
)

print("TensorFlow version:", tf.__version__)

## 1. Load and Explore the MNIST Dataset

MNIST is a classic dataset of **28×28 grayscale images** of handwritten digits (0–9).

In [None]:
# Load MNIST data from Keras datasets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("Training data shape:", X_train.shape, y_train.shape)
print("Test data shape:", X_test.shape, y_test.shape)

# Number of classes (digits 0–9)
num_classes = len(np.unique(y_train))
print("Number of classes:", num_classes)

### Visualize Sample Digits

We plot a few sample images to understand how the raw data looks.

In [None]:
# Plot some sample images with labels
plt.figure(figsize=(8, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## 2. Data Preprocessing

- Flatten 28×28 images into 784-dimensional vectors
- Scale pixel values to `[0, 1]`
- Convert labels to **one-hot encoded vectors** for multi-class classification.

In [None]:
# Flatten images: (n_samples, 28, 28) -> (n_samples, 784)
X_train_flat = X_train.reshape(-1, 28 * 28).astype("float32")
X_test_flat = X_test.reshape(-1, 28 * 28).astype("float32")

# Normalize pixel values to [0, 1]
X_train_flat /= 255.0
X_test_flat /= 255.0

# One-hot encode labels
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print("X_train_flat shape:", X_train_flat.shape)
print("y_train_cat shape:", y_train_cat.shape)

## 3. Utility: Plot Training History

We define a helper function to easily compare **training vs. validation loss and accuracy**.

In [None]:
def plot_history(history, title_prefix="Model"):
    """Plot training & validation loss and accuracy from a Keras History object."""
    if history is None:
        print("No history to plot.")
        return

    hist = history.history
    epochs = range(1, len(hist.get('loss', [])) + 1)

    plt.figure(figsize=(12, 4))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, hist.get('loss', []), label='Train Loss')
    if 'val_loss' in hist:
        plt.plot(epochs, hist['val_loss'], label='Val Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title(f"{title_prefix} - Loss")
    plt.legend()

    # Plot accuracy (if available)
    plt.subplot(1, 2, 2)
    if 'accuracy' in hist:
        plt.plot(epochs, hist['accuracy'], label='Train Acc')
    if 'val_accuracy' in hist:
        plt.plot(epochs, hist['val_accuracy'], label='Val Acc')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title(f"{title_prefix} - Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()

## 4. Baseline ANN (No Regularization, Simple Optimizer)

Here we build a simple fully-connected network:

- Input: 784 features
- Hidden layer: 128 neurons, **ReLU activation**
- Output layer: 10 neurons, **Softmax activation**
- Loss: **Categorical Crossentropy** (multi-class classification)
- Optimizer: **SGD** (Stochastic Gradient Descent)

This model helps us understand a basic ANN before adding optimizations and regularization.

In [None]:
# Build a simple baseline ANN model
baseline_model = Sequential([
    Dense(128, activation='relu', input_shape=(28 * 28,)),  # ReLU activation in hidden layer
    Dense(num_classes, activation='softmax')                # Softmax for multi-class output
])

# Compile the model
baseline_model.compile(
    optimizer='sgd',                      # Basic optimizer
    loss='categorical_crossentropy',      # Standard loss for multi-class classification
    metrics=['accuracy']
)

baseline_model.summary()

In [None]:
# Train the baseline model (few epochs for demo)
history_baseline = baseline_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=5,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

plot_history(history_baseline, title_prefix="Baseline ANN (SGD)")

## 5. Improved ANN with Better Optimizer (Adam)

Now we switch to a more powerful optimizer:

- **Adam** combines the benefits of Momentum + RMSProp.
- Often converges faster and to a better solution.

We also increase the network depth slightly.

In [None]:
adam_model = Sequential([
    Dense(256, activation='relu', input_shape=(28 * 28,)),
    Dense(128, activation='relu'),
    Dense(num_classes, activation='softmax')
])

adam_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

adam_model.summary()

In [None]:
history_adam = adam_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=10,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

plot_history(history_adam, title_prefix="Improved ANN (Adam)")

## 6. ANN with Regularization + Callbacks

Now we combine multiple **advanced deep learning concepts**:

### Regularization Techniques
- **L2 weight decay** (penalize large weights)
- **Dropout** (randomly drop neurons during training)

### Callbacks
- `EarlyStopping` → stop when validation loss stops improving
- `ModelCheckpoint` → save the best model weights
- `ReduceLROnPlateau` → reduce learning rate when training stalls
- `CSVLogger` → log training metrics to a CSV file
- `TensorBoard` → visualize training (optional)

This section demonstrates how real-world production models are trained.

In [None]:
# Directory setup for saving models and logs
output_dir = "mnist_training_outputs"
os.makedirs(output_dir, exist_ok=True)

checkpoint_path = os.path.join(output_dir, "best_model.h5")
csv_log_path = os.path.join(output_dir, "training_log.csv")
log_dir = os.path.join(output_dir, "logs_" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

print("Checkpoint path:", checkpoint_path)
print("CSV log path:", csv_log_path)
print("TensorBoard log dir:", log_dir)

In [None]:
# Build a regularized ANN model

regularized_model = Sequential([
    # First hidden layer with L2 regularization
    Dense(
        512,
        activation='relu',
        input_shape=(28 * 28,),
        kernel_regularizer=regularizers.l2(1e-4)  # L2 penalty on weights
    ),
    Dropout(0.3),  # Dropout: randomly drop 30% of neurons

    # Second hidden layer with L2 regularization
    Dense(
        256,
        activation='relu',
        kernel_regularizer=regularizers.l2(1e-4)
    ),
    Dropout(0.3),

    # Output layer (Softmax for multi-class classification)
    Dense(num_classes, activation='softmax')
])

# Compile with Adam optimizer and categorical crossentropy loss
regularized_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

regularized_model.summary()

In [None]:
# Set up callbacks

early_stopping_cb = EarlyStopping(
    monitor='val_loss',           # watch validation loss
    patience=5,                   # epochs to wait for improvement
    restore_best_weights=True,    # roll back to best weights
    verbose=1
)

model_checkpoint_cb = ModelCheckpoint(
    filepath=checkpoint_path,
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

reduce_lr_cb = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,           # reduce LR by half
    patience=3,           # after 3 epochs of no improvement
    min_lr=1e-6,
    verbose=1
)

csv_logger_cb = CSVLogger(csv_log_path)

tensorboard_cb = TensorBoard(log_dir=log_dir)

callbacks_list = [
    early_stopping_cb,
    model_checkpoint_cb,
    reduce_lr_cb,
    csv_logger_cb,
    tensorboard_cb,
]

In [None]:
# Train the regularized model with callbacks
history_reg = regularized_model.fit(
    X_train_flat,
    y_train_cat,
    epochs=50,               # we allow up to 50 epochs, but EarlyStopping will likely cut it
    batch_size=128,
    validation_split=0.1,
    callbacks=callbacks_list,
    verbose=1
)

plot_history(history_reg, title_prefix="Regularized ANN (Adam + L2 + Dropout + Callbacks)")

## 7. Evaluate Best Model on Test Data

We now load the **best saved model** (according to validation loss) and evaluate
it on the **test set** to see how well it generalizes.

In [None]:
# Load the best model from checkpoint (optional, but recommended)
if os.path.exists(checkpoint_path):
    best_model = load_model(checkpoint_path)
    print("Loaded best model from checkpoint.")
else:
    best_model = regularized_model
    print("Checkpoint not found; using the last trained regularized model.")

# Evaluate on test data
test_loss, test_acc = best_model.evaluate(X_test_flat, y_test_cat, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")

## 8. Make Predictions and Visualize

Let's predict some digits from the test set and compare predictions with true labels.

In [None]:
# Predict probabilities for the first 10 test images
y_pred_probs = best_model.predict(X_test_flat[:10])
y_pred = np.argmax(y_pred_probs, axis=1)

print("Predicted labels:", y_pred)
print("True labels:     ", y_test[:10])

# Visualize the corresponding images
plt.figure(figsize=(8, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[i], cmap='gray')
    plt.title(f"Pred: {y_pred[i]}\nTrue: {y_test[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## 9. Summary of Concepts Applied

In this single notebook, we applied the following deep learning concepts on MNIST:

1. **Artificial Neural Network (ANN)**
   - Fully connected layers using `Dense`.

2. **Activation Functions**
   - Hidden layers: `ReLU`
   - Output layer: `Softmax` for multi-class classification.

3. **Loss Function**
   - `categorical_crossentropy` for multi-class classification.

4. **Optimization Techniques**
   - Baseline: `SGD`
   - Improved: `Adam` optimizer with learning rate `1e-3`.

5. **Regularization Techniques**
   - L2 weight regularization (`kernel_regularizer=regularizers.l2(...)`)
   - Dropout layers to reduce overfitting
   - Early stopping (via callback)

6. **Callbacks**
   - `EarlyStopping` → stop training when validation loss stops improving
   - `ModelCheckpoint` → save the best model
   - `ReduceLROnPlateau` → lower learning rate when learning saturates
   - `CSVLogger` → log training history to CSV
   - `TensorBoard` → visualize training with TensorBoard

You can further extend this notebook by:
- Trying different activation functions (e.g., `tanh`, `LeakyReLU`)
- Changing optimizers (e.g., `RMSprop`, `AdamW`)
- Adding Batch Normalization
- Converting this fully connected ANN to a Convolutional Neural Network (CNN).