# MNIST Handwritten Digit Recognition using a Convolutional Neural Network (CNN)

This notebook provides a step-by-step guide to building, training, and testing a CNN to classify handwritten digits from the famous MNIST dataset.

### Step 1: Import Necessary Libraries

First, we import all the required libraries. 
- **TensorFlow and Keras:** For building and training our neural network.
- **MNIST dataset:** A built-in dataset in Keras containing 70,000 images of handwritten digits.
- **Layers (Conv2D, MaxPooling2D, etc.):** The building blocks of our CNN.
- **to_categorical:** A utility to convert integer labels into a one-hot encoded format.
- **Matplotlib and NumPy:** For data visualization and numerical operations.

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import numpy as np

### Step 2: Load and Preprocess the Data

Before we can train our model, we need to prepare the data. This involves several key steps:

1.  **Load the dataset:** We load the MNIST dataset, which is conveniently split into training and testing sets.
2.  **Reshape the images:** A CNN expects a 4D tensor as input: `(batch_size, height, width, channels)`. Since the MNIST images are grayscale, the number of channels is 1. We reshape our `(60000, 28, 28)` images to `(60000, 28, 28, 1)`.
3.  **Normalize pixel values:** We scale the pixel values from their original range of `0-255` to a range of `0.0-1.0`. This helps the model learn more efficiently.
4.  **One-hot encode labels:** We transform the integer labels (e.g., `5`) into a binary vector format (e.g., `[0,0,0,0,0,1,0,0,0,0]`). This is necessary for the `categorical_crossentropy` loss function we'll use.

In [2]:
# Load the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Save original test labels for later comparison
y_test_original = y_test

# Preprocessing for the CNN
# 1. Reshape images to (28, 28, 1) - adding the '1' for grayscale channel
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)

# 2. Normalize pixel values from 0-255 to 0.0-1.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# 3. One-hot encode the labels (e.g., 5 -> [0,0,0,0,0,1,0,0,0,0])
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

Training data shape: (60000, 28, 28, 1)
Test data shape: (10000, 28, 28, 1)


### Step 3: Build the CNN Model

Now we define the architecture of our Convolutional Neural Network using the Keras `Sequential` API.

- **Conv2D:** These are the convolutional layers that apply filters to the image to learn features like edges, corners, and textures.
- **MaxPooling2D:** This layer downsamples the feature maps, reducing their dimensions and making the model more robust to variations in the position of features.
- **Flatten:** This layer converts the 2D feature maps from the convolutional layers into a 1D vector, preparing the data for the fully connected layers.
- **Dense:** These are standard fully connected neural network layers.
- **Dropout:** A regularization technique that randomly sets a fraction of input units to 0 during training to prevent overfitting.
- **Softmax (Output Layer):** The final layer has 10 neurons (one for each digit from 0 to 9) and uses the `softmax` activation function to output a probability distribution over the classes.

In [3]:
model = Sequential([
    # First convolutional layer
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
    MaxPooling2D(pool_size=(2, 2)),
    
    # Second convolutional layer
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    
    # Flatten the 2D maps to 1D vector
    Flatten(),
    
    # Fully connected dense layer
    Dense(128, activation='relu'),
    Dropout(0.5), # Dropout for regularization
    
    # Output layer (10 classes, softmax for probabilities)
    Dense(10, activation='softmax')
])

### Step 4: Compile and Train the Model

Before training, we need to configure the learning process using `model.compile()`.

- **optimizer='adam'**: Adam is an efficient optimization algorithm that adjusts the learning rate during training.
- **loss='categorical_crossentropy'**: This loss function is suitable for multi-class classification problems where labels are one-hot encoded.
- **metrics=['accuracy']**: We monitor the classification accuracy during training and evaluation.

Then, we train the model using `model.fit()`, passing it the training data, batch size, number of epochs, and a validation split to monitor performance on a subset of the training data.

In [4]:
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

print("\n--- Starting Model Training ---")
# Train for 5 epochs (for a quick example)


--- Starting Model Training ---


In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dense (Dense)               (None, 128)               2

In [6]:

model.fit(x_train, y_train, 
          batch_size=128, 
          epochs=5, 
          validation_split=0.1)
print("--- Model Training Complete ---")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Model Training Complete ---


### Step 5: Evaluate the Model

After training is complete, we evaluate the model's performance on the unseen test data. This gives us a good indication of how well our model generalizes to new examples.

In [7]:
score = model.evaluate(x_test, y_test, verbose=0)
print(f'\nTest loss: {score[0]}')
print(f'Test accuracy: {score[1]}')


Test loss: 0.026710160076618195
Test accuracy: 0.9908999800682068


### Step 6: Define a Visualization Function

To make our predictions more intuitive, we'll create a simple helper function to display a digit's image along with its true label.

In [9]:
def display_digit(image, true_label):
    """
    Displays a single MNIST digit and its true label.
    """
    # The image is (28, 28, 1), plt.imshow needs (28, 28)
    image = image.reshape(28, 28)
    
    plt.imshow(image, cmap='gray')
    plt.title(f'True Label: {true_label}')
    
    plt.show()

### Step 7: Test the Model on a Single Image

Finally, let's pick a single image from our test set and see what the model predicts.

1.  **Select an image:** We choose an image and its corresponding original label.
2.  **Visualize:** We use our `display_digit` function to see the image.
3.  **Prepare for Prediction:** The `model.predict()` method expects a batch of images. We need to add an extra dimension to our single image `(28, 28, 1)` to make it a batch of one: `(1, 28, 28, 1)`.
4.  **Make Prediction:** We call `model.predict()`.
5.  **Interpret Result:** The output is an array of 10 probabilities. We use `np.argmax()` to find the index (the digit) with the highest probability and compare it to the true label.

In [None]:
# Let's pick an image from the test set, e.g., the 10th image (index 9)
instance_index = 9
test_image = x_test[instance_index]
true_label = y_test_original[instance_index] # Get the original number label

# Step 7a: Visualize the digit
print(f"\n--- Visualizing Test Image #{instance_index} ---")
display_digit(test_image, true_label)

# Step 7b: Prepare the image for prediction
# The model.predict() method expects a *batch* of images.
# Our image is (28, 28, 1), we need to make it (1, 28, 28, 1)
image_for_prediction = np.expand_dims(test_image, axis=0)
print(f"Original image shape: {test_image.shape}")
print(f"Shape for prediction: {image_for_prediction.shape}")

# Step 7c: Make the prediction
prediction = model.predict(image_for_prediction)

# The 'prediction' is an array of 10 probabilities
# We find the index with the highest probability
predicted_class = np.argmax(prediction)

# Step 7d: Show the result
print("\n--- Prediction Result ---")
print(f"Model's Prediction: {predicted_class}")
print(f"True Label:           {true_label}")

if predicted_class == true_label:
    print("Result: Correct! ✅")
else:
    print("Result: Incorrect. ❌")