# AI Skills Hub - Python for AI
## Lesson 7: MNIST Digit Classifier Project

**Build:** Complete ML pipeline from scratch  
**Dataset:** MNIST digits  
**Goal:** >90% accuracy  
**Runtime:** ~45 minutes

---

## Setup

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print('âœ… Libraries loaded!')

## Phase 1: Load Data

In [None]:
# Load dataset
digits = load_digits()
X, y = digits.data, digits.target

print(f'Dataset: {X.shape}')
print(f'Labels: {y.shape}')
print(f'Classes: {np.unique(y)}')

# Visualize samples
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, ax in enumerate(axes.flat):
    ax.imshow(X[i].reshape(8, 8), cmap='gray')
    ax.set_title(f'Label: {y[i]}')
    ax.axis('off')
plt.tight_layout()
plt.show()

## Phase 2: Preprocess

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalize
X_train_norm = X_train / 16.0
X_test_norm = X_test / 16.0

print(f'Train: {X_train.shape}')
print(f'Test: {X_test.shape}')

## Phase 3: Train Model

In [None]:
# Train classifier
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train_norm, y_train)
print('âœ… Model trained!')

## Phase 4: Evaluate

In [None]:
# Predict
y_pred = model.predict(X_test_norm)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Test Accuracy: {accuracy:.2%}')

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f'\nConfusion Matrix:\n{cm}')

## Phase 5: Visualize

In [None]:
# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

In [None]:
# Show misclassified examples
misclassified = np.where(y_test != y_pred)[0]
if len(misclassified) > 0:
    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    for i, ax in enumerate(axes.flat):
        if i < len(misclassified):
            idx = misclassified[i]
            ax.imshow(X_test[idx].reshape(8, 8), cmap='gray')
            ax.set_title(f'True: {y_test[idx]}, Pred: {y_pred[idx]}')
            ax.axis('off')
    plt.tight_layout()
    plt.show()
else:
    print('Perfect accuracy! No misclassifications.')

## ðŸŽ‰ Project Complete!

You built a complete ML pipeline!

**Achieved:**
- âœ… Data loading and exploration
- âœ… Preprocessing pipeline
- âœ… Model training
- âœ… Performance evaluation
- âœ… Result visualization

**Next:** Share on GitHub and build with PyTorch!