# MNIST Handwritten Digit Classification with Feedforward Neural Network

**Objective:** Build and train a simple feedforward neural network to classify handwritten digits (0-9) from the MNIST dataset.

**Framework:** TensorFlow/Keras

## Section 1: Import Libraries

We'll use TensorFlow (Keras API) for building the neural network.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

print(f"TensorFlow version: {tf.__version__}")

## Section 2: Data Loading and Preprocessing

### What we're doing:
1. **Load MNIST**: 60,000 training images and 10,000 test images
2. **Normalize**: Scale pixel values from [0, 255] to [0, 1]
3. **Flatten**: Convert 28×28 images to 784-element vectors
4. **One-hot encode**: Convert labels (0-9) to one-hot vectors (e.g., 3 → [0,0,0,1,0,0,0,0,0,0])

### Why?
- Normalized values prevent numerical instability in gradients
- Flattened vectors match the input layer size (784 neurons)
- One-hot encoding works with Softmax + Cross-Entropy loss

In [None]:
# ============================================
# STEP 1: DATA LOADING
# ============================================

print("Loading MNIST dataset...")
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

print(f"\nOriginal dataset shapes:")
print(f"  Training images: {x_train.shape}")  # (60000, 28, 28)
print(f"  Training labels: {y_train.shape}")  # (60000,)
print(f"  Test images: {x_test.shape}")       # (10000, 28, 28)
print(f"  Test labels: {y_test.shape}")       # (10000,)

In [None]:
# ============================================
# STEP 2: NORMALIZE PIXEL VALUES
# ============================================

print("\nNormalizing pixel values...")
print(f"  Original pixel range: [{x_train.min()}, {x_train.max()}]")

# Convert to float and divide by 255
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

print(f"  Normalized pixel range: [{x_train.min()}, {x_train.max()}]")
print("  ✓ Pixels now in range [0, 1]")

In [None]:
# ============================================
# STEP 3: FLATTEN IMAGES
# ============================================

print("\nFlattening images from 2D to 1D...")
print(f"  Before: {x_train.shape}  (28×28 = 784 pixels per image)")

x_train = x_train.reshape(-1, 28 * 28)  # -1 means "infer this dimension"
x_test = x_test.reshape(-1, 28 * 28)

print(f"  After: {x_train.shape}   (784 features per image)")
print("  ✓ Ready for neural network input layer")

In [None]:
# ============================================
# STEP 4: ONE-HOT ENCODE LABELS
# ============================================

print("\nOne-hot encoding labels...")
print(f"  Original labels (first 10): {y_train[:10]}")
print(f"  These are integers 0-9")

y_train = keras.utils.to_categorical(y_train, 10)  # 10 classes (digits 0-9)
y_test = keras.utils.to_categorical(y_test, 10)

print(f"\n  One-hot encoded labels shape: {y_train.shape}")
print(f"  Example: digit 3 is encoded as:")
print(f"  {y_train[3]}")
print("  ✓ Ready for Softmax output layer")

In [None]:
# ============================================
# DATA PREPARATION SUMMARY
# ============================================

print("\n" + "="*60)
print("DATA PREPARATION COMPLETE")
print("="*60)
print(f"Training set:")
print(f"  Images: {x_train.shape}  (60,000 samples of 784 features)")
print(f"  Labels: {y_train.shape}  (60,000 samples of 10 classes)")
print(f"\nTest set:")
print(f"  Images: {x_test.shape}   (10,000 samples of 784 features)")
print(f"  Labels: {y_test.shape}   (10,000 samples of 10 classes)")
print("="*60)

## Section 3: Model Architecture

### Network Structure:
- **Input Layer**: 784 neurons (one per flattened pixel)
- **Hidden Layer 1**: 128 neurons with ReLU activation
- **Hidden Layer 2**: 64 neurons with ReLU activation
- **Output Layer**: 10 neurons with Softmax activation (one per digit 0-9)

### Why these choices?
- **ReLU** in hidden layers: Introduces non-linearity, allows network to learn complex patterns
- **Softmax** in output: Converts outputs to probability distribution (sums to 1)
- **128 → 64**: Gradually reduces dimensionality from input to output

In [None]:
# ============================================
# STEP 5: BUILD NEURAL NETWORK MODEL
# ============================================

print("Building neural network model...\n")

model = keras.Sequential([
    # Input layer: 784 features (flattened 28x28 images)
    layers.Dense(128, activation='relu', input_shape=(784,), name='hidden_layer_1'),
    
    # Hidden layer 2: 64 neurons with ReLU
    layers.Dense(64, activation='relu', name='hidden_layer_2'),
    
    # Output layer: 10 neurons (one per digit 0-9) with Softmax
    layers.Dense(10, activation='softmax', name='output_layer')
])

# Display model architecture
model.summary()

## Section 4: Model Compilation

### What we're setting:
- **Loss Function**: Categorical Cross-Entropy (for multi-class classification)
- **Optimizer**: Adam (adapts learning rate automatically)
- **Metrics**: Accuracy (what percentage of predictions are correct)

In [None]:
# ============================================
# STEP 6: COMPILE MODEL
# ============================================

print("Compiling model...\n")

model.compile(
    loss='categorical_crossentropy',  # For multi-class classification
    optimizer='adam',                  # Adaptive learning rate optimizer
    metrics=['accuracy']               # Track accuracy during training
)

print("✓ Model compiled successfully")
print("  Loss function: categorical_crossentropy")
print("  Optimizer: Adam")
print("  Metrics: accuracy")

## Section 5: Model Training

### Training Parameters:
- **Epochs**: 10 (number of times to iterate through entire training dataset)
- **Batch Size**: 128 (number of samples to process before updating weights)
- **Validation Split**: 0.1 (use 10% of training data to validate during training)

### What's happening:
1. Feed training data through network (forward pass)
2. Compute loss using cross-entropy
3. Compute gradients (how much each weight contributed to error)
4. Update weights using Adam optimizer (using gradients)
5. Repeat for all batches

In [None]:
# ============================================
# STEP 7: TRAIN MODEL
# ============================================

print("Training model...\n")

history = model.fit(
    x_train,                    # Training images
    y_train,                    # Training labels (one-hot encoded)
    epochs=10,                  # Number of passes through entire dataset
    batch_size=128,             # Process 128 samples before updating weights
    validation_split=0.1,       # Use 10% for validation during training
    verbose=1                   # Show progress bar
)

print("\n✓ Training complete!")

## Section 6: Training Visualization

Let's plot how the loss and accuracy changed during training.

In [None]:
# ============================================
# VISUALIZE TRAINING HISTORY
# ============================================

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot loss
ax1.plot(history.history['loss'], label='Training Loss', marker='o')
ax1.plot(history.history['val_loss'], label='Validation Loss', marker='s')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Model Loss Over Time')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot accuracy
ax2.plot(history.history['accuracy'], label='Training Accuracy', marker='o')
ax2.plot(history.history['val_accuracy'], label='Validation Accuracy', marker='s')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Model Accuracy Over Time')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nFinal Training Accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final Validation Accuracy: {history.history['val_accuracy'][-1]:.4f}")

## Section 7: Model Evaluation

Evaluate the trained model on the test set (data the model has never seen before).

In [None]:
# ============================================
# STEP 8: EVALUATE ON TEST SET
# ============================================

print("Evaluating model on test set...\n")

test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)

print("="*60)
print("TEST SET RESULTS")
print("="*60)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print("="*60)

## Section 8: Make Predictions on Sample Images

In [None]:
# ============================================
# MAKE PREDICTIONS ON TEST SAMPLES
# ============================================

# Get predictions for first 10 test samples
sample_predictions = model.predict(x_test[:10])
sample_predictions_classes = np.argmax(sample_predictions, axis=1)

print("Sample Predictions:")
print("="*60)
for i in range(10):
    true_label = np.argmax(y_test[i])
    predicted_label = sample_predictions_classes[i]
    confidence = sample_predictions[i][predicted_label]
    
    match = "✓" if true_label == predicted_label else "✗"
    print(f"Sample {i+1}: True={true_label}, Predicted={predicted_label}, Confidence={confidence:.4f} {match}")

print("="*60)

## Section 9: Visualize Predictions

In [None]:
# ============================================
# VISUALIZE PREDICTIONS ON SAMPLE IMAGES
# ============================================

# Reshape flattened images back to 28x28 for visualization
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
axes = axes.ravel()

for i in range(10):
    # Reshape from 784 back to 28x28
    image = x_test[i].reshape(28, 28)
    true_label = np.argmax(y_test[i])
    predicted_label = sample_predictions_classes[i]
    
    axes[i].imshow(image, cmap='gray')
    
    title = f"True: {true_label}, Pred: {predicted_label}"
    color = 'green' if true_label == predicted_label else 'red'
    axes[i].set_title(title, color=color)
    axes[i].axis('off')

plt.tight_layout()
plt.show()

## Summary

### Model Architecture:
- **Input Layer**: 784 neurons
- **Hidden Layer 1**: 128 neurons (ReLU activation)
- **Hidden Layer 2**: 64 neurons (ReLU activation)
- **Output Layer**: 10 neurons (Softmax activation)

### Training Details:
- **Loss Function**: Categorical Cross-Entropy
- **Optimizer**: Adam
- **Epochs**: 10
- **Batch Size**: 128

### Results:

In [None]:
print("\n" + "="*60)
print("FINAL SUMMARY")
print("="*60)
print(f"\nFinal Test Accuracy: {test_accuracy*100:.2f}%")
print(f"\nThis means the model correctly classified {int(test_accuracy*10000)}/10000 test images")
print("\n" + "="*60)