# Exercise 5: CNN for MNIST Digit Recognition

**Objective:** Build a Convolutional Neural Network to classify handwritten digits

**Dataset:** MNIST (70,000 grayscale images of digits 0-9)

**Target:** Achieve >95% accuracy on test set

**Time:** 60 minutes

---

## What You'll Learn
- How convolution and pooling layers work
- Building CNN architecture from scratch
- Training and evaluating image classification models
- Visualizing learned filters and feature maps

## Step 1: Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")

## Step 2: Load and Explore the Data

MNIST contains:
- 60,000 training images
- 10,000 test images
- Each image is 28x28 pixels, grayscale

In [None]:
# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# TODO: Print the shapes of training and test sets
# Expected output:
# Training data shape: (60000, 28, 28)
# Training labels shape: (60000,)
# Test data shape: (10000, 28, 28)
# Test labels shape: (10000,)


In [None]:
# Visualize some examples
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

## Step 3: Preprocess the Data

**Key preprocessing steps:**
1. Reshape to add channel dimension (28, 28) → (28, 28, 1)
2. Normalize pixel values from [0, 255] → [0, 1]
3. Convert labels to categorical (one-hot encoding)

In [None]:
# TODO: Reshape data to add channel dimension
# Hint: Use .reshape() with -1 for automatic batch size
# X_train = ...
# X_test = ...

print(f"New training shape: {X_train.shape}")
print(f"New test shape: {X_test.shape}")

In [None]:
# TODO: Normalize pixel values to [0, 1]
# Hint: Divide by 255.0 to convert uint8 to float32
# X_train = ...
# X_test = ...

print(f"Min pixel value: {X_train.min()}")
print(f"Max pixel value: {X_train.max()}")

In [None]:
# TODO: Convert labels to categorical (one-hot encoding)
# Hint: Use keras.utils.to_categorical()
# y_train_cat = ...
# y_test_cat = ...

print(f"Original label: {y_train[0]}")
print(f"One-hot encoded: {y_train_cat[0]}")

## Step 4: Build the CNN Model

**Architecture to implement:**
1. Conv2D layer: 32 filters, 3x3 kernel, ReLU activation
2. MaxPooling2D: 2x2 pool size
3. Conv2D layer: 64 filters, 3x3 kernel, ReLU activation
4. MaxPooling2D: 2x2 pool size
5. Flatten layer
6. Dense layer: 128 units, ReLU activation
7. Dropout: 0.5
8. Dense output layer: 10 units, softmax activation

In [None]:
# TODO: Build the CNN model using keras.Sequential
# model = keras.Sequential([
#     # First convolutional block
#     ...
#     
#     # Second convolutional block
#     ...
#     
#     # Fully connected layers
#     ...
# ])

# Print model summary
model.summary()

## Step 5: Compile the Model

**Compilation parameters:**
- Optimizer: Adam
- Loss function: Categorical crossentropy
- Metrics: Accuracy

In [None]:
# TODO: Compile the model
# model.compile(
#     optimizer=...,
#     loss=...,
#     metrics=[...]
# )


## Step 6: Train the Model

In [None]:
# TODO: Train the model
# Hint: Use validation_split=0.1 to monitor validation accuracy
# Use epochs=10 and batch_size=128
# history = model.fit(
#     ...
# )


## Step 7: Evaluate the Model

In [None]:
# TODO: Evaluate on test set and print accuracy and loss
# Hint: Use model.evaluate()


In [None]:
# Plot training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

## Step 8: Make Predictions and Visualize Results

In [None]:
# Make predictions
predictions = model.predict(X_test[:20])
predicted_classes = np.argmax(predictions, axis=1)

# Visualize predictions
plt.figure(figsize=(15, 6))
for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
    color = 'green' if predicted_classes[i] == y_test[i] else 'red'
    plt.title(f"Pred: {predicted_classes[i]}\nTrue: {y_test[i]}", color=color)
    plt.axis('off')
plt.tight_layout()
plt.show()

## Step 9: Analyze Misclassifications

In [None]:
# TODO: Find all misclassified examples in the test set
# Hint: Use np.where() to find indices where predicted label != true label
# Then print the total count and error rate


In [None]:
# Visualize some misclassified examples
plt.figure(figsize=(15, 6))
for i, idx in enumerate(misclassified[:20]):
    plt.subplot(4, 5, i + 1)
    plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
    plt.title(f"Pred: {predicted_labels[idx]}\nTrue: {y_test[idx]}")
    plt.axis('off')
plt.suptitle('Misclassified Examples', fontsize=16)
plt.tight_layout()
plt.show()

## Step 10: Visualize Learned Filters (Optional)

Let's see what patterns the first convolutional layer learned!

In [None]:
# Get weights from first convolutional layer
filters, biases = model.layers[0].get_weights()
print(f"Filter shape: {filters.shape}")  # (3, 3, 1, 32)

# Normalize filters for visualization
f_min, f_max = filters.min(), filters.max()
filters_normalized = (filters - f_min) / (f_max - f_min)

# Plot first 32 filters
plt.figure(figsize=(12, 6))
for i in range(32):
    plt.subplot(4, 8, i + 1)
    plt.imshow(filters_normalized[:, :, 0, i], cmap='gray')
    plt.axis('off')
plt.suptitle('Learned Filters from First Conv Layer', fontsize=16)
plt.tight_layout()
plt.show()


## Fashion-MNIST Evaluation

To further validate the robustness of the CNN model, it was also tested on the Fashion-MNIST dataset, which contains grayscale images of clothing items such as shirts, shoes, bags, and coats.

Unlike handwritten digits, Fashion-MNIST images are visually more complex and share similar shapes across different classes. Because of this, classification becomes more challenging.

The same CNN architecture trained on MNIST was reused here to understand how well the model generalizes to a different type of data.

```python
from tensorflow.keras.datasets import fashion_mnist

# Load Fashion-MNIST
(x_train_f, y_train_f), (x_test_f, y_test_f) = fashion_mnist.load_data()

# Normalize
x_train_f = x_train_f / 255.0
x_test_f = x_test_f / 255.0

# Reshape
x_train_f = x_train_f.reshape(-1,28,28,1)
x_test_f = x_test_f.reshape(-1,28,28,1)

# Train model on Fashion-MNIST
history_f = model.fit(x_train_f, y_train_f, epochs=10, validation_split=0.1)

# Evaluate
loss_f, acc_f = model.evaluate(x_test_f, y_test_f)
print("Fashion-MNIST Test Accuracy:", acc_f)
```

### Observation

The accuracy on Fashion-MNIST is lower compared to MNIST because clothing items contain overlapping visual patterns (for example, shirts and coats look similar). This experiment shows that while CNNs perform exceptionally well on digit recognition, real-world image classification problems require deeper architectures, data augmentation, and additional regularization.

This step highlights the importance of testing machine learning models on diverse datasets to ensure proper generalization.


## Reflection Questions

1. **Why do we use MaxPooling layers?**  
MaxPooling reduces the spatial dimensions of feature maps, which decreases computational cost and helps the model become invariant to small translations. It retains the most significant features while discarding less important information.

2. **What happens if you remove the Dropout layer?**  
Removing Dropout increases the risk of overfitting. The model may perform very well on training data but generalize poorly on unseen test data.

3. **Why is the first Conv layer filter count (32) smaller than the second (64)?**  
Initial convolution layers learn simple patterns such as edges and curves. Deeper layers capture more complex structures, so a higher number of filters is used to represent richer features.

4. **How would you modify this for RGB images?**  
The input shape must be changed from `(28,28,1)` to `(height,width,3)` since RGB images have three channels. Example:
```python
input_shape = (32, 32, 3)
```

5. **What does each filter in the first Conv layer detect?**  
Each filter learns basic visual features such as horizontal edges, vertical edges, curves, and corners. These low‑level features are combined in deeper layers to recognize complete digits.
