# MNIST Handwritten Digit Classification

This notebook demonstrates a complete pipeline for classifying handwritten digits from the MNIST dataset using neural networks in Keras/TensorFlow. The workflow follows a modular structure, mirroring the Python project organization.

## Project Folder Structure

```
L36_HomeWork/
├── main.py
├── src/
│   ├── __init__.py
│   ├── data_loader.py
│   ├── model.py
│   ├── training.py
│   └── evaluation.py
├── output/               # auto-created, stores saved plots
├── MNIST_Classification_Notebook.ipynb
├── README.md
└── .gitignore
```

- **main.py**: Orchestrates the full pipeline.
- **src/data_loader.py**: Loads, explores, and preprocesses MNIST data.
- **src/model.py**: Defines baseline and improved neural network architectures.
- **src/training.py**: Handles model training and training history plotting.
- **src/evaluation.py**: Performs evaluation, prediction visualization, error analysis, and confusion matrix plotting.
- **output/**: Stores generated plots and images.
- **MNIST_Classification_Notebook.ipynb**: This notebook version of the pipeline.
- **README.md**: Project documentation.
- **.gitignore**: Git ignore rules.

---

In [None]:
# Import Dependencies
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import confusion_matrix
import os


## Data Loading and Exploration

We load the MNIST dataset using Keras, and print basic statistics about the training and test sets, including shape and label distribution.

In [None]:
# Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(f"Train shape: {x_train.shape}, {y_train.shape}")
print(f"Test shape: {x_test.shape}, {y_test.shape}")
print(f"Train label distribution: {np.bincount(y_train)}")
print(f"Test label distribution: {np.bincount(y_test)}")

## Data Preprocessing

We normalize the images to the [0,1] range, flatten each 28x28 image to a 784-dimensional vector, and one-hot encode the labels for use in categorical classification.

In [None]:
# Normalize, flatten, and one-hot encode
def preprocess_data(x, y):
    x = x.astype('float32') / 255.0
    x = x.reshape((x.shape[0], -1))
    y = to_categorical(y, 10)
    return x, y

x_train_proc, y_train_proc = preprocess_data(x_train, y_train)
x_test_proc, y_test_proc = preprocess_data(x_test, y_test)

## Visualize Random Samples

Let's visualize 16 random samples from the training set in a 4×4 grid, with their true labels. The figure will also be saved to `output/mnist_samples.png`.

In [None]:
# Visualize 16 random samples
def visualize_samples(x, y, save_path=None):
    idxs = np.random.choice(len(x), 16, replace=False)
    fig, axes = plt.subplots(4, 4, figsize=(6, 6))
    for i, ax in enumerate(axes.flat):
        ax.imshow(x[idxs[i]], cmap='gray')
        ax.set_title(f"Label: {y[idxs[i]]}")
        ax.axis('off')
    plt.tight_layout()
    if save_path:
        plt.savefig(save_path)
        print(f"Saved sample grid to {save_path}")
    plt.show()
    plt.close()

os.makedirs('output', exist_ok=True)
visualize_samples(x_train, y_train, save_path='output/mnist_samples.png')

## Model Architecture: Baseline Model

We define the baseline neural network model as follows:
- Dense(512, ReLU) → Dropout(0.2)
- Dense(256, ReLU) → Dropout(0.2)
- Dense(128, ReLU) → Dropout(0.2)
- Dense(10, Softmax)

Let's build and display the model summary.

In [None]:
# Build baseline model
def build_model(input_dim=784, num_classes=10):
    model = Sequential([
        Dense(512, activation='relu', input_shape=(input_dim,)),
        Dropout(0.2),
        Dense(256, activation='relu'),
        Dropout(0.2),
        Dense(128, activation='relu'),
        Dropout(0.2),
        Dense(num_classes, activation='softmax')
    ])
    return model

baseline_model = build_model()
baseline_model.summary()

## Model Architecture: Improved Model

The improved model adds a 4th hidden layer, increases dropout to 0.3, and uses a lower learning rate (0.0005).

In [None]:
# Build improved model
def build_improved_model(input_dim=784, num_classes=10):
    model = Sequential([
        Dense(512, activation='relu', input_shape=(input_dim,)),
        Dropout(0.3),
        Dense(256, activation='relu'),
        Dropout(0.3),
        Dense(128, activation='relu'),
        Dropout(0.3),
        Dense(64, activation='relu'),
        Dropout(0.3),
        Dense(num_classes, activation='softmax')
    ])
    return model

improved_model = build_improved_model()
improved_model.summary()

## Model Compilation

We compile both models using the Adam optimizer, categorical crossentropy loss, and accuracy as the metric. The improved model uses a lower learning rate.

In [None]:
# Compile models
def compile_model(model, lr=0.001):
    optimizer = Adam(learning_rate=lr)
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

compile_model(baseline_model)
compile_model(improved_model, lr=0.0005)

## Model Training

We train the baseline model for 20 epochs with a batch size of 128 and a 10% validation split.

In [None]:
# Train baseline model
history = baseline_model.fit(
    x_train_proc, y_train_proc,
    epochs=20,
    batch_size=128,
    validation_split=0.1,
    verbose=2
)

## Plot Training History

Let's plot the training and validation loss and accuracy curves. The figure will be saved to `output/training_history.png`.

In [None]:
# Plot and save training history
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Train Acc')
plt.plot(history.history['val_accuracy'], label='Val Acc')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
os.makedirs('output', exist_ok=True)
plt.savefig('output/training_history.png')
plt.show()
plt.close()

## Model Evaluation

Evaluate the trained baseline model on the test set and print the test accuracy.

In [None]:
# Evaluate on test set
test_loss, test_acc = baseline_model.evaluate(x_test_proc, y_test_proc, verbose=2)
print(f"Test accuracy: {test_acc*100:.2f}%")

## Single Prediction Visualization

Pick a test image of a specific digit (e.g., 7), display the image and a bar chart of softmax probabilities. Save the figure to `output/prediction.png`.

In [None]:
# Visualize prediction for a specific digit (e.g., 7)
def predict_single(model, x, y, digit=7, save_path=None):
    idxs = np.where(np.argmax(y, axis=1) == digit)[0]
    idx = np.random.choice(idxs)
    img = x_test[idx].reshape(28, 28)
    pred = model.predict(x[idx:idx+1])[0]
    plt.figure(figsize=(8, 4))
    plt.subplot(1, 2, 1)
    plt.imshow(img, cmap='gray')
    plt.title(f"True: {digit}")
    plt.axis('off')
    plt.subplot(1, 2, 2)
    plt.bar(range(10), pred)
    plt.title("Softmax Probabilities")
    plt.xlabel("Digit")
    plt.ylabel("Probability")
    plt.xticks(range(10))
    plt.tight_layout()
    if save_path:
        plt.savefig(save_path)
        print(f"Saved prediction to {save_path}")
    plt.show()
    plt.close()
    return np.argmax(pred)

predict_single(baseline_model, x_test_proc, y_test_proc, digit=7, save_path='output/prediction.png')

## Error Analysis: Misclassified Images

Find all misclassified test images, display a 5×5 grid with predicted (red) and actual (blue) labels. Save the figure to `output/misclassified.png`.

In [None]:
# Analyze misclassified images
def analyze_errors(model, x, y_true, save_path=None):
    y_pred = np.argmax(model.predict(x), axis=1)
    y_true_labels = np.argmax(y_true, axis=1)
    errors = np.where(y_pred != y_true_labels)[0]
    print(f"Total misclassified: {len(errors)}")
    idxs = np.random.choice(errors, min(25, len(errors)), replace=False)
    plt.figure(figsize=(10, 10))
    for i, idx in enumerate(idxs):
        plt.subplot(5, 5, i+1)
        img = x[idx].reshape(28, 28)
        plt.imshow(img, cmap='gray')
        plt.title(f"Pred: {y_pred[idx]}", color='red')
        plt.xlabel(f"True: {y_true_labels[idx]}", color='blue')
        plt.axis('off')
    plt.tight_layout()
    if save_path:
        plt.savefig(save_path)
        print(f"Saved misclassified grid to {save_path}")
    plt.show()
    plt.close()

analyze_errors(baseline_model, x_test_proc, y_test_proc, save_path='output/misclassified.png')

## Plot Confusion Matrix

Generate and display a 10×10 confusion matrix heatmap using seaborn. Save the figure to `output/confusion_matrix.png`. Print the most confused digit pairs.

In [None]:
# Plot confusion matrix and print top confused pairs
def plot_confusion_matrix(model, x, y_true, save_path=None):
    y_pred = np.argmax(model.predict(x), axis=1)
    y_true_labels = np.argmax(y_true, axis=1)
    cm = confusion_matrix(y_true_labels, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix')
    if save_path:
        plt.savefig(save_path)
        print(f"Saved confusion matrix to {save_path}")
    plt.show()
    plt.close()
    # Print top confused pairs
    cm2 = cm.copy()
    np.fill_diagonal(cm2, 0)
    pairs = np.dstack(np.unravel_index(np.argsort(cm2.ravel())[::-1], (10, 10)))[0]
    print("Top confused digit pairs:")
    for i in range(5):
        a, b = pairs[i]
        if cm2[a, b] > 0:
            print(f"{a} vs {b}: {cm2[a, b]} times")

plot_confusion_matrix(baseline_model, x_test_proc, y_test_proc, save_path='output/confusion_matrix.png')

## Model Comparison

Now, let's train the improved model and compare both models' test accuracies. We'll display a results table.

In [None]:
# Train improved model and compare
improved_history = improved_model.fit(
    x_train_proc, y_train_proc,
    epochs=20,
    batch_size=128,
    validation_split=0.1,
    verbose=2
)
improved_loss, improved_acc = improved_model.evaluate(x_test_proc, y_test_proc, verbose=2)
print(f"Improved model test accuracy: {improved_acc*100:.2f}%")

# Results table
print("\n| Model     | Test Accuracy |")
print("|-----------|--------------|")
print(f"| Baseline  | {test_acc*100:.2f}%      |")
print(f"| Improved  | {improved_acc*100:.2f}%      |")

## Colab GPU Tip

> **Tip:** If running this notebook on [Google Colab](https://colab.research.google.com/), you can enable GPU acceleration for much faster training:
>
> - Go to `Runtime` > `Change runtime type` > set `Hardware accelerator` to `GPU`.
> - Then rerun the notebook cells.

This will significantly speed up model training and evaluation.