# Fashion-MNIST • Single MLP (Sequential) — Colab Notebook

## Problem Statement
Build and interpret a **single Multi-Layer Perceptron (MLP)** classifier for **Fashion-MNIST**.  
You will understand **why we flatten images (28×28 → 784)** for MLPs, how **scaling** affects learning, and how to interpret **metrics and errors**.

## Learning Objectives
- Load Fashion-MNIST and **inspect structure** with Pandas (`head()` & `info()`).
- **Visualize** sample images, class distribution, and pixel histograms (**before/after scaling**).
- Understand **Flatten vs Image Grid** and **what an MLP is**.
- Build, train, and evaluate a **single MLP (Sequential)**.
- Plot **training curves**, **confusion matrix**, and review **misclassifications**.

## 1) Setup & Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from sklearn.metrics import confusion_matrix, classification_report

print("TensorFlow:", tf.__version__)

## 2) Load Dataset (Fashion-MNIST)
- Images: 28×28 **grayscale** (shape `(N, 28, 28)`), pixel values **0..255** (uint8).  
- Labels: integers **0..9** mapping to clothing categories.

In [None]:
from tensorflow.keras.datasets import fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = [
    "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
    "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]

print("\nShapes & dtypes:")
print("  train_images:", train_images.shape, train_images.dtype)   # (60000, 28, 28) uint8
print("  train_labels:", train_labels.shape, train_labels.dtype)   # (60000,) uint8
print("  test_images :", test_images.shape,  test_images.dtype)    # (10000, 28, 28)
print("  test_labels :", test_labels.shape,  test_labels.dtype)
print("Pixel range BEFORE scaling:", int(train_images.min()), "to", int(train_images.max()))

## 3) Inspect with Pandas (Head & Info)
Flatten a **sample** to a table (28×28 → 784 columns) to make the data feel like a CSV.

In [None]:
N_SAMPLE = 2000
X_flat_sample = train_images[:N_SAMPLE].reshape(N_SAMPLE, -1)
df = pd.DataFrame(X_flat_sample)
df["label"] = train_labels[:N_SAMPLE]

print("\n--- Pandas HEAD (first 5 rows) ---")
print(df.head())

print("\n--- Pandas INFO ---")
df.info()

## 4) Visualize Samples & Class Distribution

In [None]:
def show_grid(images, labels, rows=3, cols=6, title="Sample training images (raw)"):
    plt.figure(figsize=(cols*2.0, rows*2.0))
    for i in range(rows*cols):
        plt.subplot(rows, cols, i+1)
        plt.imshow(images[i], cmap="gray")
        plt.title(class_names[int(labels[i])], fontsize=9)
        plt.axis("off")
    plt.suptitle(title)
    plt.tight_layout()
    plt.show()

show_grid(train_images, train_labels, rows=3, cols=6)

vals, cnts = np.unique(train_labels, return_counts=True)
plt.figure(figsize=(7,4))
plt.bar(vals, cnts)
plt.xlabel("Class index")
plt.ylabel("Count")
plt.title("Label distribution (training set)")
plt.xticks(vals, class_names, rotation=45, ha="right")
plt.tight_layout()
plt.show()

## 5) Why Scaling? (0..255 → 0..1)
Neural nets train faster and more stably when features share similar ranges.  
We visualize pixel histograms **before** scaling to see the raw spread.

In [None]:
rand_pixels = train_images[:500].reshape(-1)  # 500 images worth of pixels
plt.figure(figsize=(6,4))
plt.hist(rand_pixels, bins=30)
plt.xlabel("Pixel intensity (0..255)")
plt.ylabel("Frequency")
plt.title("Pixel histogram BEFORE scaling")
plt.show()

## 6) Preprocessing: Scale, Flatten, Validation Split
- **Scale** to `[0,1]` by dividing by 255.  
- **Flatten** each image to **784 features** (needed by MLP).  
- Make a **validation split** from the training set.

In [None]:
# Scale to [0,1]
train_images = train_images.astype("float32") / 255.0
test_images  = test_images.astype("float32")  / 255.0

# Flatten to vectors (28*28 = 784)
x_train = train_images.reshape(len(train_images), -1)
x_test  = test_images.reshape(len(test_images),   -1)

# Validation split (10%)
VAL_FRAC = 0.1
val_size = int(len(x_train) * VAL_FRAC)
x_val, y_val = x_train[:val_size], train_labels[:val_size]
x_train2, y_train2 = x_train[val_size:], train_labels[val_size:]

print("\nAfter preprocessing:")
print("  x_train2:", x_train2.shape, " x_val:", x_val.shape, " x_test:", x_test.shape)
print("Pixel range AFTER scaling:", float(x_train2.min()), "to", float(x_train2.max()))

In [None]:
# Pixel histogram AFTER scaling
rand_pixels_scaled = train_images[:500].reshape(-1)
plt.figure(figsize=(6,4))
plt.hist(rand_pixels_scaled, bins=30)
plt.xlabel("Pixel intensity (0..1)")
plt.ylabel("Frequency")
plt.title("Pixel histogram AFTER scaling")
plt.show()

## 7) Concept: Flatten vs Image Grid & What is an MLP?
**Image Grid:** Each image is `(28, 28)` — 2D pixels.  
**Flatten:** Convert to a **1D vector of 784** so a Dense layer can take all pixels as input.

**MLP (Multi-Layer Perceptron):** A feedforward neural network:  
- **Input layer:** 784 features (flattened pixels)  
- **Hidden layers:** Dense layers with non-linear activations (e.g., ReLU)  
- **Output layer:** 10 neurons with **Softmax** → class probabilities

We’ll build: `Input(784) → Dense(512, ReLU) → Dropout(0.3) → Dense(256, ReLU) → Dense(10, Softmax)`

## 8) Build, Compile, Train — Single MLP

In [None]:
model = keras.Sequential([
    layers.Input(shape=(784,)),
    layers.Dense(512, activation="relu"),
    layers.Dropout(0.30),
    layers.Dense(256, activation="relu"),
    layers.Dense(10, activation="softmax")
], name="mlp_fashion")

model.summary()

model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

EPOCHS = 12
BATCH  = 128
history = model.fit(
    x_train2, y_train2,
    validation_data=(x_val, y_val),
    epochs=EPOCHS,
    batch_size=BATCH,
    verbose=2
)

## 9) Training Curves (Loss & Accuracy)

In [None]:
plt.figure(figsize=(6,4))
plt.plot(history.history["loss"], label="train_loss")
plt.plot(history.history["val_loss"], label="val_loss")
plt.xlabel("Epoch"); plt.ylabel("Loss")
plt.title("Training vs Validation Loss")
plt.legend(); plt.tight_layout(); plt.show()

plt.figure(figsize=(6,4))
plt.plot(history.history["accuracy"], label="train_acc")
plt.plot(history.history["val_accuracy"], label="val_acc")
plt.xlabel("Epoch"); plt.ylabel("Accuracy")
plt.title("Training vs Validation Accuracy")
plt.legend(); plt.tight_layout(); plt.show()

## 10) Evaluate on Test Set

In [None]:
test_loss, test_acc = model.evaluate(x_test, test_labels, verbose=0)
print(f"Test accuracy: {test_acc:.4f} | Test loss: {test_loss:.4f}")

## 11) Confusion Matrix & Classification Report

In [None]:
y_prob = model.predict(x_test, verbose=0)
y_pred = np.argmax(y_prob, axis=1)

cm = confusion_matrix(test_labels, y_pred)
plt.figure(figsize=(8,6))
plt.imshow(cm)
plt.title("Confusion Matrix (Test)")
plt.xlabel("Predicted"); plt.ylabel("True")
plt.xticks(range(10), class_names, rotation=45, ha="right")
plt.yticks(range(10), class_names)
for i in range(10):
    for j in range(10):
        plt.text(j, i, cm[i, j], ha="center", va="center", fontsize=9)
plt.colorbar(); plt.tight_layout(); plt.show()

print("\nClassification report:")
print(classification_report(test_labels, y_pred, target_names=class_names, digits=4))

## 12) Misclassified Examples (Lowest Confidence)

In [None]:
wrong_idx = np.where(y_pred != test_labels)[0]
if len(wrong_idx) > 0:
    probs_wrong = y_prob[wrong_idx]
    max_probs = probs_wrong.max(axis=1)
    order = np.argsort(max_probs)  # smallest confidence first
    N_SHOW = min(12, len(order))
    sel = wrong_idx[order[:N_SHOW]]

    rows, cols = 3, 4
    plt.figure(figsize=(cols*2.2, rows*2.2))
    for k, idx in enumerate(sel):
        plt.subplot(rows, cols, k+1)
        plt.imshow(test_images[idx], cmap="gray")
        pred = y_pred[idx]; true = test_labels[idx]; conf = y_prob[idx, pred]
        plt.title(f"pred={class_names[pred]}\ntrue={class_names[true]}\np={conf:.2f}", fontsize=9)
        plt.axis("off")
    plt.suptitle("Hard Misclassifications (lowest confidence)")
    plt.tight_layout(); plt.show()
else:
    print("No misclassifications (unlikely).")

## 13) Save & Reload (Basic Deployment Step)

In [None]:
model.save("fashion_mnist_single_mlp.keras")
reloaded = keras.models.load_model("fashion_mnist_single_mlp.keras")
print("Reloaded OK. Params:", reloaded.count_params())