---
# 🧠 MNIST & Fashion-MNIST Classification Project
---

##### [**Author**] : Aditya Saxena

This project explores both classic and modern techniques for image classification using two datasets:

- **MNIST**: 70,000 grayscale images of handwritten digits (0–9)
- **Fashion-MNIST**: 70,000 grayscale images of fashion items (10 categories)

We implement and evaluate:
- Linear classifiers from scratch: Perceptron, Average Perceptron, Pegasos (SVM)
- Deep learning models using Keras: Dense Neural Networks (DNN) and Convolutional Neural Networks (CNN)
- Evaluation using hinge loss, accuracy, and hyperparameter tuning

All experiments are performed using **Python** and **Jupyter Notebooks** for clarity, interactivity, and reproducibility.


---
## Defining Utility Functions
---

### 🔹 `zero_one_loss` – General Classification Loss
---

**Purpose:**  
Measures how many predictions differ from true labels.

**Formula:**  
For \( n \) samples:

$$
\text{Loss} = \frac{1}{n} \sum_{i=1}^{n} \mathbb{1}[\hat{y}_i \ne y_i]
$$

**Parameters:**  
- `predictions`: 1D NumPy array of predicted labels  
- `labels`: 1D NumPy array of true labels  

**Returns:**  
- Average misclassification rate (float between 0 and 1)


In [15]:
def zero_one_loss(predictions, labels):
    """
    Computes the proportion of incorrect predictions.

    Parameters:
    - predictions: 1D NumPy array of predicted class labels
    - labels: 1D NumPy array of true class labels

    Returns:
    - Zero-One Loss: float in [0, 1]
    """
    # Count mismatches between predictions and true labels
    errors = predictions != labels

    # Compute mean error rate (number of mismatches / total)
    return np.mean(errors)


### 🔹 Hinge Loss (Single Example)

---

**Purpose:**  
Measures margin-based classification error for one example; used in SVMs.

**Formula:**

$$
\text{Loss} = \max(0,\ 1 - y(\theta^\top x + \theta_0))
$$

**Parameters:**  
- `feature_vector (x)`: input features  
- `label (y)`: true class label (±1)  
- `theta`: weight vector  
- `theta_0`: bias term


In [16]:
def hinge_loss_single(feature_vector, label, theta, theta_0):
    # Compute the raw model output (a linear combination of weights and input)
    y = theta @ feature_vector + theta_0  # Equivalent to np.dot(theta, feature_vector) + theta_0

    # Calculate the hinge loss:
    # If the prediction is correct and confidently outside the margin (i.e., y * label ≥ 1), loss is 0.
    # Otherwise, loss increases as the prediction moves closer to or past the margin.
    return max(0, 1 - y * label)


### 🔹 Hinge Loss (Full Dataset)
---
**Purpose:**  
Computes the **average hinge loss** over all training samples.

**Formula:**  
\[
\text{Average Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0,\ 1 - y_i(\theta^\top x_i + \theta_0))
\]

**Parameters:**  
- `feature_matrix`: matrix of input features (shape: n × d)  
- `labels`: true class labels (±1)  
- `theta`: weight vector  
- `theta_0`: bias term

In [17]:
def hinge_loss_full(feature_matrix, labels, theta, theta_0):
    # Initialize total loss to accumulate individual hinge losses
    total_loss = 0

    # Get the number of data points (rows) in the dataset
    n_samples = feature_matrix.shape[0]

    # Loop through each example and compute the hinge loss
    for i in range(n_samples):
        # Add the hinge loss for the i-th sample to the total
        total_loss += hinge_loss_single(feature_matrix[i], labels[i], theta, theta_0)

    # Return the average hinge loss over all samples
    return total_loss / n_samples


### 🔹 Perceptron Single Step Update

---

**Purpose:**  
Performs a single update of the Perceptron algorithm if the current prediction is incorrect or on the decision boundary.

**Formula:**  

If  

$$
y(\theta^\top x + \theta_0) \leq 0
$$

then update:  

$$
\theta := \theta + yx,\quad \theta_0 := \theta_0 + y
$$

Otherwise, no change.

**Parameters:**  
- `feature_vector (x)`: input feature vector  
- `label (y)`: true class label (±1)  
- `current_theta`: current weight vector  
- `current_theta_0`: current bias term

In [18]:
def perceptron_single_step_update(feature_vector, label, current_theta, current_theta_0):
    # Compute the prediction score (dot product + bias)
    prediction = np.dot(current_theta, feature_vector) + current_theta_0

    # If the prediction is incorrect or on the boundary (y * score <= 0)
    # We use a small epsilon (1e-7) to handle floating point precision issues
    if label * prediction <= 1e-7:
        # Update weights and bias in the direction of the true label
        updated_theta = current_theta + label * feature_vector
        updated_theta_0 = current_theta_0 + label
        return updated_theta, updated_theta_0

    # If the prediction is correct and confidently classified, no update needed
    return current_theta, current_theta_0

### 🔹 Perceptron (Full Algorithm)

---

**Purpose:**  
Trains a linear classifier by iteratively updating weights using the Perceptron learning rule over multiple passes through the dataset.

**Formula:**  
For each misclassified example \((x_i, y_i)\), if:

$$
y_i(\theta^\top x_i + \theta_0) \leq 0
$$

Then update:

$$
\theta := \theta + y_i x_i,\quad \theta_0 := \theta_0 + y_i
$$

Repeat for \( T \) full passes over the training data.

**Parameters:**  
- `feature_matrix`: NumPy matrix (n × d), where each row is a data point  
- `labels`: NumPy array of length \( n \) with true class labels (±1)  
- `T`: number of iterations (epochs) over the full dataset


In [19]:
def perceptron(feature_matrix, labels, T):
    """
    Full Perceptron training loop.

    Parameters:
    - feature_matrix: 2D NumPy array where each row is a feature vector
    - labels: 1D NumPy array of labels (±1)
    - T: number of full passes through the data (epochs)

    Returns:
    - theta: learned weight vector
    - theta_0: learned bias term
    """
    (nsamples, nfeatures) = feature_matrix.shape

    # Initialize weights and bias to zero
    theta = np.zeros(nfeatures)
    theta_0 = 0.0

    # Repeat for T full iterations over the dataset
    for t in range(T):
        # get_order provides a shuffled index order each pass
        for i in get_order(nsamples):
            x_i = feature_matrix[i]
            y_i = labels[i]

            # Apply single-step Perceptron update
            theta, theta_0 = perceptron_single_step_update(x_i, y_i, theta, theta_0)

    return theta, theta_0


### 🔹 Pegasos Single Step Update

---

**Purpose:**  
Performs one update step of the Pegasos algorithm for Support Vector Machines (SVM), using stochastic gradient descent and hinge loss.

**Formula:**  
If margin is violated:  
$$
y(\theta^\top x + \theta_0) \leq 1
$$  
then update:  
$$
\theta := \theta + \eta(yx - \lambda\theta),\quad \theta_0 := \theta_0 + \eta y
$$

Else (no violation):  
$$
\theta := \theta - \eta \lambda \theta,\quad \theta_0 := \theta_0
$$

**Parameters:**  
- `feature_vector (x)`: single input sample  
- `label (y)`: true class label (±1)  
- `L`: regularization parameter (λ)  
- `eta`: learning rate  
- `theta`: current weight vector  
- `theta_0`: current bias term


In [20]:
def pegasos_single_step_update(feature_vector, label, L, eta, theta, theta_0):
    """
    One step of the Pegasos update rule for binary SVM.

    Parameters:
    - feature_vector: 1D NumPy array for a single data point
    - label: True class label (±1)
    - L: Regularization parameter (lambda)
    - eta: Learning rate
    - theta: Current weight vector
    - theta_0: Current bias term

    Returns:
    - Updated (theta, theta_0)
    """

    # Compute margin: how confidently the point is classified
    margin_factor = label * (np.dot(feature_vector, theta) + theta_0)

    # Check for hinge loss violation (i.e., margin ≤ 1)
    is_violation = 1.0 if margin_factor <= 1 else 0.0

    # Update rule:
    # - Move in direction of label * feature_vector if violating margin
    # - Always apply L2 regularization by shrinking theta
    new_theta = theta + eta * (is_violation * label * feature_vector - L * theta)
    new_theta_0 = theta_0 + eta * (is_violation * label * 1.0)

    return new_theta, new_theta_0


### 🔹 Pegasos (Full Algorithm)

---

**Purpose:**  
Trains a binary Support Vector Machine (SVM) using the Pegasos algorithm (a stochastic sub-gradient descent method) with L2 regularization.

**Formula:**  
For each training example \((x_i, y_i)\) and update count \( t \), use:

- Learning rate:  
$$
\eta = \frac{1}{\sqrt{t}}
$$

- If margin violated:  
$$
y_i(\theta^\top x_i + \theta_0) \leq 1
$$  
Then update:  
$$
\theta := \theta + \eta (y_i x_i - \lambda \theta),\quad \theta_0 := \theta_0 + \eta y_i
$$

Else:  
$$
\theta := \theta - \eta \lambda \theta,\quad \theta_0 := \theta_0
$$

**Parameters:**  
- `feature_matrix`: NumPy matrix of shape (n, d)  
- `labels`: NumPy array of true labels (±1)  
- `T`: number of full iterations over the data  
- `L`: regularization parameter (λ)


In [21]:
def pegasos(feature_matrix, labels, T, L):
    """
    Full Pegasos training loop.

    Parameters:
    - feature_matrix: 2D NumPy array (each row is a data point)
    - labels: 1D NumPy array of ±1 class labels
    - T: number of full passes through the data (epochs)
    - L: regularization parameter (lambda)

    Returns:
    - theta: learned weight vector
    - theta_0: learned bias term
    """

    (nsamples, nfeatures) = feature_matrix.shape

    # Initialize weights and bias
    theta = np.zeros(nfeatures)
    theta_0 = 0.0

    # Counter for total updates (used for learning rate scheduling)
    count = 0

    # Loop over T epochs
    for t in range(T):
        # Visit each sample in a shuffled order
        for i in get_order(nsamples):
            count += 1
            eta = 1.0 / np.sqrt(count)  # Dynamic learning rate

            # Perform single Pegasos update step
            theta, theta_0 = pegasos_single_step_update(
                feature_matrix[i], labels[i], L, eta, theta, theta_0
            )

    return theta, theta_0


### 🔹 `classify` – Linear Prediction Rule
---

**Purpose:**  
Predicts labels using a linear classifier.

**Formula:**  
$$
\hat{y} = \text{sign}(\theta^\top x + \theta_0)
$$

**Parameters:**  
- `feature_matrix`: 2D NumPy array (n_samples × n_features)  
- `theta`: weight vector  
- `theta_0`: bias term  

**Returns:**  
- 1D NumPy array of predicted labels (+1 or -1)


In [22]:
def classify(feature_matrix, theta, theta_0):
    """
    Predicts labels using a linear decision rule.

    Parameters:
    - feature_matrix: 2D array of shape (n_samples, n_features)
    - theta: Weight vector from trained model
    - theta_0: Bias term from trained model

    Returns:
    - Array of predictions: +1 or -1
    """
    # Compute raw scores (linear combination + bias)
    scores = np.dot(feature_matrix, theta) + theta_0

    # Apply sign rule: +1 if score > 0, else -1
    return np.where(scores > 0, 1, -1)


### 🔹 `classification_accuracy` – Accuracy Metric

**Purpose:**  
Calculates the proportion of correct predictions made by a classifier. It is the most direct and interpretable evaluation metric in classification tasks.

**Formula:**  
$$
\text{Accuracy} = \frac{1}{n} \sum_{i=1}^{n} \mathbb{1}[\hat{y}_i = y_i]
$$

**Parameters:**  
- `predictions`: 1D NumPy array of predicted class labels  
- `labels`: 1D NumPy array of ground truth labels  

**Returns:**  
- Float value between 0 and 1 indicating classification accuracy


In [23]:
def classification_accuracy(predictions, labels):
    """
    Computes classification accuracy.

    Parameters:
    - predictions: 1D NumPy array of predicted labels
    - labels: 1D NumPy array of true labels

    Returns:
    - accuracy: Float (proportion of correct predictions)
    """
    return np.mean(predictions == labels)


### 🔹 `plot_metrics_over_epochs` – Line Plot of Accuracy or Loss
---

**Purpose:**  
Visualizes the change in a performance metric (e.g., accuracy or loss) over training epochs for both train and test sets.

**Parameters:**  
- `train_metrics`: list of metric values per epoch (e.g., train accuracy)  
- `test_metrics`: list of metric values per epoch (e.g., test accuracy)  
- `metric_name`: string to label the y-axis (e.g., "Accuracy" or "Loss")  
- `model_name`: name of the model, displayed in title and legend  

**Returns:**  
- A matplotlib plot showing the metric progression over epochs.


In [24]:
import matplotlib.pyplot as plt

def plot_metrics_over_epochs(train_metrics, test_metrics, metric_name="Accuracy", model_name=""):
    """
    Plots a given metric over epochs for train and test datasets.

    Parameters:
    - train_metrics: list of metric values for training set (1 per epoch)
    - test_metrics: list of metric values for test set (1 per epoch)
    - metric_name: label for the y-axis (e.g., "Accuracy", "Loss")
    - model_name: name of the model being plotted
    """
    epochs = list(range(1, len(train_metrics) + 1))
    plt.figure(figsize=(8, 5))
    plt.plot(epochs, train_metrics, label=f"Train {metric_name}")
    plt.plot(epochs, test_metrics, label=f"Test {metric_name}")
    plt.xlabel("Epoch")
    plt.ylabel(metric_name)
    plt.title(f"{metric_name} Over Epochs – {model_name}")
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()


### 🔹 `report_results_table` – Tabular Summary of Model Results
---

**Purpose:**  
Displays a clean table comparing multiple models on evaluation metrics.

**Parameters:**  
- `results_dict`: a dictionary where each key is a model name, and the value is another dictionary with:
  - "Train Accuracy"
  - "Test Accuracy"
  - "Train Loss"
  - "Test Loss"

**Returns:**  
- A styled and formatted table (Pandas DataFrame) rendered in the notebook.


In [None]:
import pandas as pd

def report_results_table(results_dict):
    """
    Generates a styled summary table of model performance metrics.

    Parameters:
    - results_dict: Dictionary with model names as keys and a dictionary of
                    metrics as values. Example:
                    {
                      "Perceptron": {"Train Accuracy": 0.9, "Test Accuracy": 0.85, ...},
                      "Pegasos": {"Train Accuracy": 0.95, "Test Accuracy": 0.91, ...}
                    }

    Returns:
    - Styled DataFrame shown using display()
    """
    df = pd.DataFrame.from_dict(results_dict, orient="index")
    styled_df = df.style.format("{:.4f}").set_caption("📋 Model Performance Summary")
    display(styled_df)


## [ Solution ] Solving Using Linear Classifiers Only

In [None]:
# to be continued