# MNIST Classification Project

## 1. Introduction & Data Loading

The goal of this project is to explore supervised machine learning by building and evaluating Decision Trees and Neural Networks on the MNIST dataset. The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels.

In this notebook, we will:
1. Load and explore the data.
2. Preprocess the data (normalization, flattening, splitting).
3. Train and evaluate a Decision Tree Classifier.
4. Perform hyperparameter tuning for the Decision Tree.
5. Train and evaluate a Multi-Layer Perceptron (Neural Network).
6. Compare the performance of both models.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, ConfusionMatrixDisplay

# Set random seed for reproducibility
np.random.seed(42)

In [None]:
# Load the dataset
(feature_train_full, label_train_full), (feature_test, label_test) = tf.keras.datasets.mnist.load_data()

print(f"Training data shape: {feature_train_full.shape}")
print(f"Test data shape: {feature_test.shape}")

In [None]:
# Visualize 5-10 random sample images
num_samples = 10
random_indices = np.random.choice(feature_train_full.shape[0], num_samples, replace=False)

plt.figure(figsize=(15, 3))
for i, idx in enumerate(random_indices):
    plt.subplot(1, num_samples, i + 1)
    plt.imshow(feature_train_full[idx], cmap='gray')
    plt.title(f"Label: {label_train_full[idx]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## 1.1 Preprocessing
- Normalize pixel values to 0-1 range.
- Flatten images from 28x28 to 784 features.
- Split training data into training and validation sets (80/20) using stratified sampling.

In [None]:
# Normalize pixel values
feature_train_full = feature_train_full.astype('float32') / 255.0
feature_test = feature_test.astype('float32') / 255.0

# Flatten the images
feature_train_flat = feature_train_full.reshape(feature_train_full.shape[0], -1)
feature_test_flat = feature_test.reshape(feature_test.shape[0], -1)

# Split into training and validation sets (80% train, 20% validation)
feature_train, feature_val, label_train, label_val = train_test_split(
    feature_train_flat, label_train_full,
    test_size=0.2,
    stratify=label_train_full,
    random_state=42
)

print(f"Feature Train Shape: {feature_train.shape}")
print(f"Feature Val Shape: {feature_val.shape}")
print(f"Feature Test Shape: {feature_test_flat.shape}")

In [None]:
# Visualize Class Distributions
def plot_distribution(labels, title):
    unique, counts = np.unique(labels, return_counts=True)
    plt.bar(unique, counts)
    plt.title(title)
    plt.xlabel('Digit')
    plt.ylabel('Count')
    plt.xticks(unique)

plt.figure(figsize=(20, 5))

plt.subplot(1, 4, 1)
plot_distribution(label_train_full, 'Original Training Set')

plt.subplot(1, 4, 2)
plot_distribution(label_train, 'New Training Set')

plt.subplot(1, 4, 3)
plot_distribution(label_val, 'Validation Set')

plt.subplot(1, 4, 4)
plot_distribution(label_test, 'Test Set')

plt.tight_layout()
plt.show()

## 2. Building the Decision Tree Classifier
We will train a decision tree using Information Gain (criterion='entropy').

In [None]:
# Initialize and train Decision Tree Classifier
dt_clf = DecisionTreeClassifier(criterion='entropy', random_state=42)
dt_clf.fit(feature_train, label_train)

# Evaluate on validation set
print(f"Validation Accuracy (Default DT): {dt_clf.score(feature_val, label_val):.4f}")

In [None]:
# Visualize the decision tree (limited depth for readability)
plt.figure(figsize=(20, 10))
plot_tree(dt_clf, max_depth=3, feature_names=[f"pixel_{i}" for i in range(784)], class_names=[str(i) for i in range(10)], filled=True, fontsize=10)
plt.title("Decision Tree Visualization (Top 3 Levels)")
plt.show()

## 3. Hyperparameter Tuning for Decision Tree
We will tune `max_depth`, `min_samples_split`, and `min_samples_leaf` using GridSearchCV.

In [None]:
# Define hyperparameter grid
param_grid = {
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Setup GridSearchCV
grid_search = GridSearchCV(DecisionTreeClassifier(criterion='entropy', random_state=42), param_grid, cv=3, scoring='accuracy', verbose=1, n_jobs=-1)

# Train with Grid Search
grid_search.fit(feature_train, label_train)

# Best hyperparameters
best_dt_params = grid_search.best_params_
print("Best Hyperparameters:", best_dt_params)

# Best Model
best_dt_model = grid_search.best_estimator_
print(f"Best Validation Accuracy: {best_dt_model.score(feature_val, label_val):.4f}")

## 4. Building the Neural Network Classifier
We will use `MLPClassifier` with 2 hidden layers (128, 64 neurons).

In [None]:
# Initialize MLP Classifier
mlp_clf = MLPClassifier(
    hidden_layer_sizes=(128, 64),
    activation='relu',
    solver='adam',
    max_iter=50,
    random_state=42,
    verbose=True
)

# Train MLP
mlp_clf.fit(feature_train, label_train)

# Plot Loss Curve
plt.figure(figsize=(8, 5))
plt.plot(mlp_clf.loss_curve_)
plt.title("MLP Loss Curve")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()

## 5. Performance Evaluation & Comparison
Evaluating both the best Decision Tree and the Neural Network on the Test Set.

In [None]:
models = {"Decision Tree": best_dt_model, "Neural Network": mlp_clf}

for name, model in models.items():
    print(f"\n{'='*20} {name} {'='*20}")
    y_pred = model.predict(feature_test_flat)
    
    # Metrics
    acc = accuracy_score(label_test, y_pred)
    prec = precision_score(label_test, y_pred, average='weighted')
    rec = recall_score(label_test, y_pred, average='weighted')
    f1 = f1_score(label_test, y_pred, average='weighted')
    
    print(f"Accuracy: {acc:.4f}")
    print(f"Precision: {prec:.4f}")
    print(f"Recall: {rec:.4f}")
    print(f"F1 Score: {f1:.4f}")
    
    # Confusion Matrix
    cm = confusion_matrix(label_test, y_pred)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[str(i) for i in range(10)])
    disp.plot(cmap='Blues')
    plt.title(f"{name} Confusion Matrix")
    plt.show()
    
    # Classification Report
    print(f"Classification Report for {name}:\n")
    print(classification_report(label_test, y_pred))

## Conclusion & Insights


- **Which model performed better?**
    - Typically, the Neural Network (MLP) outperforms the Decision Tree on image data like MNIST. Decision Trees struggle with pixel-level dependencies and translation invariance, whereas MLPs (even simple ones) can capture non-linear relationships better.

- **Did the Decision Tree overfit?**
    - Decision Trees tend to overfit if not pruned (max_depth restricted). The difference between training accuracy and validation accuracy would indicate overfitting. Tuning the hyperparameters helps mitigate this.

- **Which digits were most commonly confused?**
    - Common confusions in MNIST include 4 vs 9, 3 vs 5, or 8 vs 3. The Confusion Matrix above highlights the specific misclassifications for this run.