# Convolutional Neural Network Implementation

We build and implement convolutional neural networks (CNNs) to classify MNIST images of handwritten digits.

We use the following implementations:

- Tensorflow and Keras
  - Subclassing `tf.keras.models.Model`
  - Keras Functional APIs
- Pytorch 


**Dataset: MNIST**

MNIST consists of grayscale images of handwritten digits (0–9):

- Image size: 28 × 28
- Channels: 1 (grayscale)
- Classes: 10

**Model Architecture**

The CNN architecture we use follows the standard conceptual pattern of CNNs:

1. Convolutional feature extraction

    - Small 3×3 filters detect local patterns (edges, corners)
    - Feature depth increases as representations become more abstract

2. Spatial downsampling

    - 2×2 max pooling reduces resolution
    - Helps with translation invariance and parameter efficiency

3. Dense classification head

    - Flattened feature maps are mapped to a low-dimensional representation
    - Dropout is used to reduce overfitting
    - Final layer outputs one score per class

**Summary of CNN Architecture**

- Input: 28 × 28 × 1
- Conv block 1: 32 filters → max pooling
- Conv block 2: 64 filters → max pooling
- Dense layer: 128 units + ReLU
- Dropout: 0.5
- Output: 10 class scores (logits)

---

## Implementation by Subclassing `tf.keras.models.Model`

Subclassing `Model` (instead of using `Sequential`) gives you:

- Full control over the forward pass (call)

- Explicit handling of training vs inference

- Easier extension to more complex architectures

This is the recommended approach for anything non-trivial in terms of neural network architecture.

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

tf.random.set_seed(69)

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# MNIST data comes as (N, 28, 28); we need to add the channel dimension for CNN to work
x_train = x_train[..., tf.newaxis]  # (N, 28, 28, 1)
x_test = x_test[..., tf.newaxis]  # (N, 28, 28, 1)

# normalization
x_train = (x_train/ 255.0).astype("float32")
x_test = (x_test/ 255.0).astype("float32")

  if not hasattr(np, "object"):


We load the dataset and then add a channel dimension to it because convolutional layers works with arrays of shape `(m, h, w, c)`. We also normalize the data by dividing by `255.0` which is the maximum possible value where as `0.0` is the least possible value.

Now we define the CNN neural architecture that we will use in this example.

In [2]:
# ---------------------
#  model architecture
# ---------------------

class ConvNet(Model):
    def __init__(self):
        super().__init__()

        # first convolutional block - 3x3x32 filter + relu + 2x2 maxpool
        self.conv1 = Conv2D(
            filters=32,
            kernel_size=3,
            padding="same",
            activation="relu",
            name="Conv1"
        )
        self.pool1 = MaxPooling2D(pool_size=2, strides=2)

        # second convolutional block - 3x3x64 filter + relu + 2x2 maxpool
        self.conv2 = Conv2D(
            filters=64,
            kernel_size=3,
            padding="same",
            activation="relu",
            name="Conv2"
        )
        self.pool2 = MaxPooling2D(pool_size=2, strides=2)

        # fully connected layers - 128 neurons + relu + dropout + 10 neurons (logits)
        self.flatten = Flatten()
        self.fc1 = Dense(128, activation="relu", name="FC1")
        self.dropout = Dropout(0.5, name="Dropout") # dropout layer for regularization
        self.fc2 = Dense(10, name="Logits_Output")  # logits are the outputs
        

    def call(self, x, training=False): # invoked during training aswell as inference
        # 28x28x1 -> 14x14x32
        x = self.pool1(self.conv1(x))

        # 14x14x32 -> 7x7x64
        x = self.pool2(self.conv2(x))

        # 7x7x64 -> 3136
        x = self.flatten(x)

        # fully connected layers
        x = self.fc1(x)
        x = self.dropout(x, training=training)
        x = self.fc2(x)

        return x

Here's a breakdown of the various components: 

1. Convolution Layer `Conv2D()`:

   - `filters=32` 
     - You learn 32 different feature maps
     - Each filter has shape (3, 3, in_channels)

   - `kernel_size=3` 
     - Means a 3×3 spatial kernel
     - padding="same"
       - Output spatial size is preserved
       - For a 28×28 input, output remains 28×28
   - `activation="relu"`
     - Applies ReLU inside the layer
     - Equivalent to `Conv2D(...)` followed by `ReLU()` in terms of Keras Layers.
   - `name="Conv1"`
     - Optional, but useful for:
       - Model summaries
       - Debugging
       - Loading weights

2. `MaxPooling2D()` defines a max-pooling layer.

   - `pool_size=2`
     - Uses a 2×2 window
   - `strides=2`
     - Moves the window by 2 pixels

   The effect of it is that:
   - Spatial dimensions are halved: 28×28 → 14×14
   - Channels are unchanged.

3. `Dropout(0.5)` randomly sets 50% of activations to zero during training.
   - Purpose: 
     - Regularization 
     - Reduces co-adaptation 
     - Helps prevent overfitting
   - Dropout does nothing during inference.
  
4. Forward pass (the `call` method) defines the forward computation.

   - `training` flag is crucial: 
     - `True` → training mode 
     - `False` → inference mode

   - Keras sets this automatically when calling: 
     - `model.fit()` → `training=True`
     - `model.predict()` → `training=False`

1. Other points to note:
   - The model is trained using logits (no softmax in the final layer) for numerical stability.
   - Softmax is applied only when converting logits to probabilities at inference time.
   - The code is intended for learning and experimentation, not production deployment.

In [3]:
# --------------------------------
# setting up and training the model
# ---------------------------------

model = ConvNet()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

model.compile(
    optimizer=optimizer,
    loss=loss_fn,
    metrics=["accuracy"]
)

history=model.fit(
    x_train,y_train,
    epochs=5,
    batch_size=512,
    validation_split=0.2
)

training_loss, training_accuracy =model.evaluate(x_train, y_train)
test_loss, test_accuracy =model.evaluate(x_test, y_test)

Epoch 1/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.8220 - loss: 0.5868 - val_accuracy: 0.9578 - val_loss: 0.1362
Epoch 2/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9517 - loss: 0.1614 - val_accuracy: 0.9770 - val_loss: 0.0801
Epoch 3/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9676 - loss: 0.1075 - val_accuracy: 0.9818 - val_loss: 0.0624
Epoch 4/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9727 - loss: 0.0882 - val_accuracy: 0.9847 - val_loss: 0.0527
Epoch 5/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 71ms/step - accuracy: 0.9788 - loss: 0.0715 - val_accuracy: 0.9859 - val_loss: 0.0461
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9884 - loss: 0.0374
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.

In [4]:
print(f"Training Loss: {training_loss}, Training Accuracy: {training_accuracy}")
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")

Training Loss: 0.03741063177585602, Training Accuracy: 0.9884333610534668
Test Loss: 0.035951100289821625, Test Accuracy: 0.9879000186920166


In [5]:
model.summary()

## Implementation using Functional APIs

We can implement the same using Functional API of keras. Here's how:

In [6]:
import tensorflow as tf
from tensorflow.keras.layers import (
    Input, Conv2D, MaxPooling2D,
    Flatten, Dense, Dropout
)
from tensorflow.keras.models import Model

inputs = Input(shape=(28, 28, 1), name="Input")

# First convolutional block
x = Conv2D(
    filters=32,
    kernel_size=3,
    padding="same",
    activation="relu",
    name="Conv1"
)(inputs)
x = MaxPooling2D(pool_size=2, strides=2, name="Pool1")(x)

# Second convolutional block
x = Conv2D(
    filters=64,
    kernel_size=3,
    padding="same",
    activation="relu",
    name="Conv2"
)(x)
x = MaxPooling2D(pool_size=2, strides=2, name="Pool2")(x)

# Fully connected layers
x = Flatten(name="Flatten")(x)
x = Dense(128, activation="relu", name="FC1")(x)
x = Dropout(0.5, name="Dropout")(x)

# Output layer — logits
outputs = Dense(10, name="Logits")(x)

model = Model(inputs=inputs, outputs=outputs, name="ConvNet")

model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

history = model.fit(
    x_train, y_train,
    epochs=5,
    batch_size=512,
    validation_split=0.2
)

training_loss, training_accuracy = model.evaluate(x_train, y_train)
test_loss, test_accuracy = model.evaluate(x_test, y_test)

print(f"Training Loss: {training_loss}, Training Accuracy: {training_accuracy}")
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")

Epoch 1/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 73ms/step - accuracy: 0.8146 - loss: 0.5975 - val_accuracy: 0.9582 - val_loss: 0.1416
Epoch 2/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9486 - loss: 0.1735 - val_accuracy: 0.9771 - val_loss: 0.0785
Epoch 3/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9650 - loss: 0.1169 - val_accuracy: 0.9822 - val_loss: 0.0607
Epoch 4/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9733 - loss: 0.0886 - val_accuracy: 0.9847 - val_loss: 0.0516
Epoch 5/5
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.9782 - loss: 0.0745 - val_accuracy: 0.9862 - val_loss: 0.0461
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9883 - loss: 0.0377
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.

In [7]:
model.summary()

# Implementation using PyTorch

In [None]:
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# defining the CNN architecture
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        
        # convolutional block - 1
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, 
                               kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # convolutional block - 2
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, 
                               kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.relu3 = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        # convolutional block - 1: 28x28x1 -> 14x14x32
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)
        
        # convolutional block - 2: 14x14x32 -> 7x7x64
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)
        
        # flattening: 7x7x64 -> 3136
        x = x.view(x.size(0), -1)
        
        # fully connected layers
        x = self.fc1(x)
        x = self.relu3(x)
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

# training setup
def train_model():
    # data preprocessing
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # load mnist dataset
    train_dataset = datasets.MNIST(root='./data', train=True, 
                                   download=True, transform=transform)
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    
    # initialize model, loss, and optimizer
    model = ConvNet()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # training loop
    model.train()
    for epoch in range(5):
        running_loss = 0.0
        for batch_idx, (data, target) in enumerate(train_loader):
            optimizer.zero_grad()
            
            # forward pass
            output = model(data)
            loss = criterion(output, target)
            
            # backward pass and optimization
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            if batch_idx % 100 == 99:
                print(f'Epoch {epoch+1}, Batch {batch_idx+1}, '
                      f'Loss: {running_loss/100:.4f}')
                running_loss = 0.0
    
    return model

# run training
trained_model = train_model()

100%|██████████| 9.91M/9.91M [00:02<00:00, 4.18MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 216kB/s]
100%|██████████| 1.65M/1.65M [00:01<00:00, 1.12MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 1.29MB/s]


Epoch 1, Batch 100, Loss: 0.7825
Epoch 1, Batch 200, Loss: 0.2609
Epoch 1, Batch 300, Loss: 0.1933
Epoch 1, Batch 400, Loss: 0.1718
Epoch 1, Batch 500, Loss: 0.1380
Epoch 1, Batch 600, Loss: 0.1310
Epoch 1, Batch 700, Loss: 0.1265
Epoch 1, Batch 800, Loss: 0.1108
Epoch 1, Batch 900, Loss: 0.1093
Epoch 2, Batch 100, Loss: 0.0979
Epoch 2, Batch 200, Loss: 0.0902
Epoch 2, Batch 300, Loss: 0.0852
Epoch 2, Batch 400, Loss: 0.0807
Epoch 2, Batch 500, Loss: 0.0805
Epoch 2, Batch 600, Loss: 0.0816
Epoch 2, Batch 700, Loss: 0.0738
Epoch 2, Batch 800, Loss: 0.0658
Epoch 2, Batch 900, Loss: 0.0734
Epoch 3, Batch 100, Loss: 0.0613
Epoch 3, Batch 200, Loss: 0.0554
Epoch 3, Batch 300, Loss: 0.0638
Epoch 3, Batch 400, Loss: 0.0553
Epoch 3, Batch 500, Loss: 0.0637
Epoch 3, Batch 600, Loss: 0.0567
Epoch 3, Batch 700, Loss: 0.0652
Epoch 3, Batch 800, Loss: 0.0719
Epoch 3, Batch 900, Loss: 0.0664
Epoch 4, Batch 100, Loss: 0.0526
Epoch 4, Batch 200, Loss: 0.0524
Epoch 4, Batch 300, Loss: 0.0590
Epoch 4, B