# MNIST Digit Classification with Feedforward Neural Network

This notebook builds, trains, and evaluates a simple neural network to classify handwritten digits (0-9) from the MNIST dataset.

**Framework:** TensorFlow/Keras

**Estimated time to run:** 3-5 minutes

---

## Section 1: Import Libraries

We start by importing the tools we need. Think of these as importing JavaScript libraries into a web app.

In [None]:
# Import NumPy for numerical operations (arrays, math)
import numpy as np

# Import Matplotlib for visualizing images
import matplotlib.pyplot as plt

# Import TensorFlow/Keras (deep learning framework)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Sequential
from tensorflow.keras.utils import to_categorical

print("✓ All libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")

---

## Section 2: Load and Explore the MNIST Dataset

MNIST is already available in Keras, so we can load it with one line of code.

The dataset contains 70,000 images of handwritten digits, each 28×28 pixels.

In [None]:
# Load MNIST dataset
# Keras includes MNIST with train/test split already done
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()

print("Dataset loaded!")
print(f"\nTraining set:")
print(f"  - Images shape: {train_images.shape}")
print(f"    (60,000 images, each 28×28 pixels)")
print(f"  - Labels shape: {train_labels.shape}")
print(f"    (60,000 labels, values 0-9)")
print(f"\nTest set:")
print(f"  - Images shape: {test_images.shape}")
print(f"    (10,000 images, each 28×28 pixels)")
print(f"  - Labels shape: {test_labels.shape}")

print(f"\nPixel value range: {train_images.min()} - {train_images.max()}")

### Visualize a Sample Image

In [None]:
# Display a single training image
# This helps us understand what the data looks like

sample_index = 0  # First image in training set
sample_image = train_images[sample_index]
sample_label = train_labels[sample_index]

plt.figure(figsize=(4, 4))
plt.imshow(sample_image, cmap='gray')  # 'gray' shows grayscale images
plt.title(f"Digit: {sample_label}")
plt.axis('off')  # Hide axis labels
plt.tight_layout()
plt.show()

print(f"This image represents the digit: {sample_label}")
print(f"Image shape: {sample_image.shape} (28 pixels × 28 pixels)")

---

## Section 3: Data Preparation

Before we can train the neural network, we need to prepare the data:
1. **Normalize** pixel values (scale to 0-1)
2. **Flatten** images (convert 28×28 to 784-length vector)
3. **One-hot encode** labels (convert 3 to [0,0,0,1,0,0,0,0,0,0])

See `GUIDE_1_Data_Preparation.md` for detailed explanations.

### Step 1: Normalize Pixel Values

Pixel values range from 0-255. We divide by 255 to scale them to 0-1.

**Why?** Neural networks learn better with smaller numbers.

In [None]:
# Normalize: divide all pixel values by 255
# This converts range [0, 255] to [0, 1]
train_images = train_images / 255.0
test_images = test_images / 255.0

print("✓ Normalization complete")
print(f"New pixel value range: {train_images.min():.4f} - {train_images.max():.4f}")
print(f"Example pixel values: {train_images[0, 0, :5]}")

### Step 2: Flatten Images

Neural networks expect 1D input (a vector), not 2D (a grid).

We reshape each 28×28 image into a single vector of 784 values.

In [None]:
# Flatten: reshape 28x28 images into 784-length vectors
# reshape(-1, 784) means: "reshape to have 784 columns, auto-determine rows"
train_images_flat = train_images.reshape(-1, 784)
test_images_flat = test_images.reshape(-1, 784)

print("✓ Flattening complete")
print(f"Original shape: (60000, 28, 28) - a 2D grid")
print(f"New shape: {train_images_flat.shape} - a 1D vector")
print(f"\nTotal values per image: 28 × 28 = {28 * 28}")

### Step 3: One-Hot Encode Labels

Instead of storing labels as single numbers (3, 5, 7),
we convert them to vectors where only one position is 1 and others are 0.

**Example:**
- Label 3 → [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- Label 5 → [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

In [None]:
# One-hot encode labels
# num_classes=10 because we have digits 0-9
train_labels_encoded = to_categorical(train_labels, num_classes=10)
test_labels_encoded = to_categorical(test_labels, num_classes=10)

print("✓ One-hot encoding complete")
print(f"Original labels shape: {train_labels.shape}")
print(f"Encoded labels shape: {train_labels_encoded.shape}")
print(f"\nExample:")
print(f"  Original label: {train_labels[0]}")
print(f"  One-hot encoded: {train_labels_encoded[0]}")
print(f"  (Position {np.argmax(train_labels_encoded[0])} has the 1)")

### Summary of Prepared Data

In [None]:
print("Data Preparation Complete!\n")
print("="*50)
print("TRAINING DATA")
print("="*50)
print(f"Images: {train_images_flat.shape}")
print(f"  - 60,000 images")
print(f"  - Each is 784 values (28×28 flattened)")
print(f"  - Values range: 0-1 (normalized)")
print(f"\nLabels: {train_labels_encoded.shape}")
print(f"  - 60,000 labels")
print(f"  - Each is 10 values (one-hot encoded)")
print(f"\n" + "="*50)
print("TEST DATA")
print("="*50)
print(f"Images: {test_images_flat.shape}")
print(f"Labels: {test_labels_encoded.shape}")

---

## Section 4: Build the Neural Network Model

We create a Feedforward Neural Network (FNN) with:
- **Input Layer:** 784 neurons (one per pixel)
- **Hidden Layer 1:** 128 neurons with ReLU activation
- **Hidden Layer 2:** 64 neurons with ReLU activation
- **Output Layer:** 10 neurons with Softmax activation

See `GUIDE_2_Model_Architecture.md` for detailed explanations.

### Create the Model Architecture

In [None]:
# Create a sequential model (layers stacked one after another)
model = Sequential([
    # Input layer + Hidden layer 1
    # Dense = fully connected layer (each neuron connects to all previous neurons)
    # input_shape=(784,) tells Keras to expect 784 input values
    # ReLU activation: outputs max(0, input) - allows non-linear learning
    layers.Dense(128, activation='relu', input_shape=(784,)),
    
    # Hidden layer 2
    # 64 neurons, ReLU activation
    # Input is automatically (128,) from previous layer
    layers.Dense(64, activation='relu'),
    
    # Output layer
    # 10 neurons (one for each digit 0-9)
    # Softmax activation: converts outputs to probabilities that sum to 1
    layers.Dense(10, activation='softmax')
])

print("✓ Model architecture created!")

### View Model Summary

In [None]:
# Display model architecture
model.summary()

print("\n" + "="*60)
print("ARCHITECTURE EXPLANATION")
print("="*60)
print("\nLayer 1 (Dense): 128 neurons")
print(f"  Input: 784 values (flattened image)")
print(f"  Parameters: 784×128 weights + 128 biases = 100,480")
print(f"  Activation: ReLU (Rectified Linear Unit)")
print(f"  Output: 128 values")
print(f"\nLayer 2 (Dense): 64 neurons")
print(f"  Input: 128 values from Layer 1")
print(f"  Parameters: 128×64 weights + 64 biases = 8,256")
print(f"  Activation: ReLU")
print(f"  Output: 64 values")
print(f"\nLayer 3 (Dense): 10 neurons")
print(f"  Input: 64 values from Layer 2")
print(f"  Parameters: 64×10 weights + 10 biases = 650")
print(f"  Activation: Softmax (probability distribution)")
print(f"  Output: 10 probabilities (sum = 1.0)")
print(f"\nTotal Parameters: 109,386")

---

## Section 5: Compile the Model

Compilation tells Keras:
- **Loss function:** How to measure error (Categorical Cross-Entropy)
- **Optimizer:** How to improve weights (Adam)
- **Metrics:** What to report (Accuracy)

See `GUIDE_3_Training.md` for detailed explanations.

In [None]:
# Compile the model
model.compile(
    loss='categorical_crossentropy',  # For multi-class classification
    optimizer='adam',                 # Adaptive learning rate optimizer
    metrics=['accuracy']              # Report accuracy during training
)

print("✓ Model compiled!")
print("\nCompilation settings:")
print(f"  Loss function: Categorical Cross-Entropy")
print(f"  Optimizer: Adam")
print(f"  Metrics: Accuracy")

---

## Section 6: Train the Model

We train for 10 epochs (10 complete passes through all training data).

During training:
- Network makes predictions
- Calculates error (loss)
- Adjusts weights to reduce error
- Repeats thousands of times

You should see loss decreasing and accuracy increasing.

In [None]:
# Train the model
# This is where the network learns!
history = model.fit(
    train_images_flat,        # Input: training images
    train_labels_encoded,     # Output: training labels (one-hot encoded)
    epochs=10,                # Number of times to go through all data
    batch_size=32,            # Process 32 images before updating weights
    validation_split=0.1,     # Use 10% of training data to validate
    verbose=1                 # Show progress bar
)

print("\n✓ Training complete!")

### Visualize Training Progress

In [None]:
# Plot training and validation metrics
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot 1: Loss over epochs
ax1.plot(history.history['loss'], label='Training Loss', linewidth=2)
ax1.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Loss Over Training')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Accuracy over epochs
ax2.plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
ax2.plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Accuracy Over Training')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print final metrics
print("\nTraining Summary:")
print(f"  Initial Loss: {history.history['loss'][0]:.4f}")
print(f"  Final Loss: {history.history['loss'][-1]:.4f}")
print(f"\n  Initial Accuracy: {history.history['accuracy'][0]*100:.2f}%")
print(f"  Final Accuracy: {history.history['accuracy'][-1]*100:.2f}%")
print(f"\nValidation Accuracy: {history.history['val_accuracy'][-1]*100:.2f}%")

---

## Section 7: Evaluate on Test Set

Now we test the model on completely new data (test set).
This shows how well the model generalizes to unseen images.

See `GUIDE_4_Evaluation.md` for detailed explanations.

In [None]:
# Evaluate on test set
# The network has never seen these images before!
test_loss, test_accuracy = model.evaluate(
    test_images_flat,
    test_labels_encoded,
    verbose=0
)

print("="*60)
print("TEST SET EVALUATION")
print("="*60)
print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy*100:.2f}%")
print(f"\nCorrect Predictions: {int(test_accuracy * len(test_labels))} / {len(test_labels)}")
print(f"Wrong Predictions: {len(test_labels) - int(test_accuracy * len(test_labels))}")

---

## Section 8: Make Predictions on Individual Images

Let's test the model on some specific images from the test set.

### Test 1: Single Random Image

In [None]:
# Pick a random test image
random_index = np.random.randint(0, len(test_images_flat))
test_image = test_images_flat[random_index]
test_label = test_labels[random_index]

# Make prediction
# model.predict expects batch format, so reshape to (1, 784)
prediction = model.predict(test_image.reshape(1, -1), verbose=0)

# Extract results
predicted_digit = np.argmax(prediction[0])  # Index of highest probability
confidence = prediction[0][predicted_digit]  # Probability of predicted digit

# Display results
print(f"\nTest Case 1: Random Image")
print(f"="*50)
print(f"\nTrue Label: {test_label}")
print(f"Predicted Label: {predicted_digit}")
print(f"Confidence: {confidence*100:.2f}%")
print(f"Result: {'✓ CORRECT' if predicted_digit == test_label else '✗ WRONG'}")

# Show probability distribution
print(f"\nProbabilities for each digit:")
for digit in range(10):
    probability = prediction[0][digit]
    bar_length = int(probability * 40)  # Scale for display
    bar = '█' * bar_length
    print(f"  {digit}: {bar} {probability*100:6.2f}%")

# Show the image
image = test_images[random_index]
plt.figure(figsize=(4, 4))
plt.imshow(image, cmap='gray')
plt.title(f"True: {test_label}, Predicted: {predicted_digit}")
plt.axis('off')
plt.tight_layout()
plt.show()

### Test 2: Different Random Image

In [None]:
# Let's try another random image to see another example
random_index_2 = np.random.randint(0, len(test_images_flat))
test_image_2 = test_images_flat[random_index_2]
test_label_2 = test_labels[random_index_2]

# Make prediction
prediction_2 = model.predict(test_image_2.reshape(1, -1), verbose=0)

# Extract results
predicted_digit_2 = np.argmax(prediction_2[0])
confidence_2 = prediction_2[0][predicted_digit_2]

# Display results
print(f"\nTest Case 2: Another Random Image")
print(f"="*50)
print(f"\nTrue Label: {test_label_2}")
print(f"Predicted Label: {predicted_digit_2}")
print(f"Confidence: {confidence_2*100:.2f}%")
print(f"Result: {'✓ CORRECT' if predicted_digit_2 == test_label_2 else '✗ WRONG'}")

# Show probability distribution
print(f"\nProbabilities for each digit:")
for digit in range(10):
    probability = prediction_2[0][digit]
    bar_length = int(probability * 40)
    bar = '█' * bar_length
    print(f"  {digit}: {bar} {probability*100:6.2f}%")

# Show the image
image_2 = test_images[random_index_2]
plt.figure(figsize=(4, 4))
plt.imshow(image_2, cmap='gray')
plt.title(f"True: {test_label_2}, Predicted: {predicted_digit_2}")
plt.axis('off')
plt.tight_layout()
plt.show()

---

## Section 9: Detailed Analysis

Let's analyze the model's performance in more detail.

In [None]:
# Get predictions on all test images
all_predictions = model.predict(test_images_flat, verbose=0)

# Convert to digit predictions
predicted_digits = np.argmax(all_predictions, axis=1)

# Calculate accuracy by digit
print("\nAccuracy by Digit:")
print("="*50)
for digit in range(10):
    # Find all test images of this digit
    digit_mask = (test_labels == digit)
    
    # Calculate accuracy for this digit
    digit_predictions = predicted_digits[digit_mask]
    digit_true_labels = test_labels[digit_mask]
    
    accuracy = np.sum(digit_predictions == digit_true_labels) / len(digit_true_labels)
    
    # Show result with bar chart
    bar_length = int(accuracy * 30)
    bar = '█' * bar_length
    print(f"Digit {digit}: {bar} {accuracy*100:6.2f}%")

### Find Easy and Hard Examples

In [None]:
# Find high-confidence correct predictions (easy cases)
correct_mask = (predicted_digits == test_labels)
confidences = np.max(all_predictions, axis=1)

# Find easiest examples
correct_indices = np.where(correct_mask)[0]
if len(correct_indices) > 0:
    easy_idx = correct_indices[np.argsort(-confidences[correct_indices])[:3]]
else:
    easy_idx = []

# Find hardest examples (low confidence wrong predictions)
wrong_mask = ~correct_mask
wrong_indices = np.where(wrong_mask)[0]
if len(wrong_indices) > 0:
    hard_idx = wrong_indices[np.argsort(confidences[wrong_indices])[:3]]
else:
    hard_idx = []

print("\n" + "="*60)
print("EASY EXAMPLES (High Confidence Correct)")
print("="*60)
if len(easy_idx) > 0:
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    for i, idx in enumerate(easy_idx):
        image = test_images[idx]
        true_digit = test_labels[idx]
        pred_digit = predicted_digits[idx]
        conf = confidences[idx]
        
        axes[i].imshow(image, cmap='gray')
        axes[i].set_title(f"True: {true_digit}, Pred: {pred_digit}\nConf: {conf*100:.1f}%")
        axes[i].axis('off')
    plt.tight_layout()
    plt.show()
else:
    print("No correct predictions found.")

print("\n" + "="*60)
print("HARD EXAMPLES (Wrong Predictions)")
print("="*60)
if len(hard_idx) > 0:
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    for i, idx in enumerate(hard_idx):
        image = test_images[idx]
        true_digit = test_labels[idx]
        pred_digit = predicted_digits[idx]
        conf = confidences[idx]
        
        axes[i].imshow(image, cmap='gray')
        axes[i].set_title(f"True: {true_digit}, Pred: {pred_digit}\nConf: {conf*100:.1f}%", 
                         color='red')
        axes[i].axis('off')
    plt.tight_layout()
    plt.show()
else:
    print("Perfect accuracy! No wrong predictions.")

---

## Summary

Congratulations! You've successfully built, trained, and evaluated a neural network!

In [None]:
print("\n" + "="*60)
print("FINAL SUMMARY")
print("="*60)
print("\nMODEL ARCHITECTURE:")
print(f"  - Input Layer: 784 neurons (28×28 pixels)")
print(f"  - Hidden Layer 1: 128 neurons (ReLU)")
print(f"  - Hidden Layer 2: 64 neurons (ReLU)")
print(f"  - Output Layer: 10 neurons (Softmax)")
print(f"\nTRAINING:")
print(f"  - Epochs: 10")
print(f"  - Batch Size: 32")
print(f"  - Loss Function: Categorical Cross-Entropy")
print(f"  - Optimizer: Adam")
print(f"\nPERFORMANCE:")
print(f"  - Training Accuracy: {history.history['accuracy'][-1]*100:.2f}%")
print(f"  - Validation Accuracy: {history.history['val_accuracy'][-1]*100:.2f}%")
print(f"  - Test Accuracy: {test_accuracy*100:.2f}%")
print(f"  - Test Loss: {test_loss:.4f}")
print(f"\nRESULTS:")
print(f"  - Correctly classified: {int(test_accuracy * len(test_labels))} / {len(test_labels)} images")
print(f"  - Error rate: {(1 - test_accuracy)*100:.2f}%")
print("\n" + "="*60)
print("You have successfully completed the assignment!")
print("="*60)