# DATASCI 315, Group Work 8: Convolutional Neural Networks with CIFAR-10

## Importing Libraries

First, we'll import necessary libraries such as PyTorch and torchvision, along with matplotlib for visualization.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

## Load and Normalize CIFAR-10

We'll use torchvision to download the CIFAR-10 dataset. The dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. We'll also apply normalization during data transformations.

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

batch_size = 4

trainset = torchvision.datasets.CIFAR10(
    root="./data", train=True, download=True, transform=transform
)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=batch_size, shuffle=True, num_workers=2
)

testset = torchvision.datasets.CIFAR10(
    root="./data", train=False, download=True, transform=transform
)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=batch_size, shuffle=False, num_workers=2
)

classes = (
    "plane",
    "car",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
)

## Background: Convolutional and Pooling Layers

**Convolutional Layer**: For a `nn.Conv2d` layer, the input is expected to have shape $(N, C, H, W)$ where $N$ is the batch size, $C$ is the number of channels (e.g., 3 for color images, 1 for grayscale), and $H, W$ are the height and width of each sample image. The output from this layer has shape $(N, C', H', W')$.

The `Conv2d` constructor takes many parameters. The most important ones are:
1. `in_channels` - this is $C$
2. `out_channels` - this is the number of filters (this equals $C'$)
3. `kernel_size` - this is the size of the filter. It can be an integer $K$ (then the filters have shape $(K, K, C)$), or a tuple $(K_1, K_2)$
4. `padding` - controls the amount of padding around the original image. It can be an integer, tuple, or string ("valid" for no padding, "same" to preserve dimensions when stride=1). Default is 0.
5. `stride` - stride of the filter horizontally and vertically. It can be an integer or tuple. Default is 1.

The output has shape $(N, C', H', W')$ where the new number of channels equals the number of filters (i.e., `out_channels`) and new height and width $H', W'$ are computed as:

$$H' = \Big\lfloor\frac{H + 2p - K}{s} + 1\Big\rfloor$$

where $p$ is padding along that axis, $K$ is the kernel size along that axis, and $s$ is the stride along that axis (similarly for width).

**Pooling Layer**: We use the `nn.MaxPool2d` layer with these parameters:
1. `kernel_size`
2. `stride` - defaults to `kernel_size` (ensuring no overlap)
3. `padding` - default is 0

The input is again of shape $(N, C, H, W)$ and output $(N, C, H', W')$ where the number of channels stays the same but the height and width change according to the equation above.

In [None]:
example_model = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2),
)

# Show model architecture
print(example_model)

# Show output shape with a sample input
sample_input = torch.randn(1, 3, 32, 32)
sample_output = example_model(sample_input)
print(f"\nInput shape:  {tuple(sample_input.shape)}")
print(f"Output shape: {tuple(sample_output.shape)}")

For the convolutional layer, the input is $(-1, C=3, H=32, W=32)$. Using the formula, the output number of channels is 6 and the new height and width are:

$$H' = \lfloor(32 + 2 \times 0 - 5)/1 + 1\rfloor = 28$$

which matches the summary shown above (note that the kernel size is 5). The number of parameters is:

$$\underbrace{6}_{\text{\# filters}} \times \underbrace{5 \times 5 \times 3}_{\text{kernel size}} + \underbrace{6}_{\text{bias}} = 456$$

For the pooling layer, the input shape is $(-1, C=6, H=28, W=28)$, with no padding and stride equal to the kernel size (2). Then:

$$H' = \lfloor (28 + 0 - 2)/2 + 1 \rfloor = 14$$

Hence, the output shape is $(-1, 6, 14, 14)$.

It is important to keep track of how the image shape evolves through your network, since at the end you need fully connected layers (`nn.Linear`) and you must specify the correct `in_features` when using them.

## Problem 1: Define a Convolutional Neural Network

Implement a simple convolutional neural network for image classification. Your network should have the architecture shown in the following diagram:

![CNN Architecture](https://i0.wp.com/developersbreach.com/wp-content/uploads/2020/08/cnn_banner.png)

Documentation links:
- [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
- [MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)

Your network should have:
- Two convolutional layers followed by max pooling
- Three fully connected layers
- Flatten the output after convolution to feed it to the fully connected layers

Use the following sizes for the convolutional layers:
- Conv 1: `(in_channels=3, out_channels=6, kernel_size=5)`
- Conv 2: `(in_channels=6, out_channels=16, kernel_size=5)`

**Note:** The `CrossEntropyLoss` function includes the softmax internally, so you don't need to add a softmax layer at the end of your network (contrary to what the figure shows).

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # BEGIN SOLUTION
        # Two conv layers with max pooling, then three fully connected layers
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # After conv1 + pool: 32 -> 28 -> 14
        # After conv2 + pool: 14 -> 10 -> 5
        # Final size: 16 channels * 5 * 5 = 400
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        # END SOLUTION

    def forward(self, x):
        # BEGIN SOLUTION
        # Apply conv -> relu -> pool twice, then flatten and pass through FC layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)
        # END SOLUTION


net = Net()

In [None]:
# Test assertions
# Test that net is defined and is an nn.Module
assert isinstance(net, nn.Module), "net should be an nn.Module"
# Test forward pass with sample input
sample = torch.randn(1, 3, 32, 32)
with torch.no_grad():
    output = net(sample)
assert output.shape == (1, 10), f"Expected output shape (1, 10), got {output.shape}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the network has convolutional and linear layers
has_conv = any(isinstance(m, nn.Conv2d) for m in net.modules())
has_linear = any(isinstance(m, nn.Linear) for m in net.modules())
assert has_conv, "Network should have Conv2d layers"
assert has_linear, "Network should have Linear layers"
# END HIDDEN TESTS

## Problem 2: Select a Loss Function and Optimizer

We'll use cross-entropy as our loss function and stochastic gradient descent (SGD) as our optimization algorithm. Instantiate both a `criterion` object and an `optimizer` object.

Any choice of learning rate is fine for now, but you may need to modify it subsequently to achieve the target accuracy in Problem 4.

In [None]:
criterion = nn.CrossEntropyLoss()  # SOLUTION
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)  # SOLUTION

In [None]:
# Test assertions
assert criterion is not None, "criterion should be defined"
assert optimizer is not None, "optimizer should be defined"
assert isinstance(criterion, nn.CrossEntropyLoss), "criterion should be CrossEntropyLoss"
print("All tests passed!")

# BEGIN HIDDEN TESTS
assert hasattr(optimizer, "param_groups"), "optimizer should have param_groups"
# END HIDDEN TESTS

## Problem 3: Train the Network

Write a training loop that trains the model for two epochs using the criterion and optimizer defined above. Print the training loss once every 2000 mini-batches.

In [None]:
for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # BEGIN SOLUTION
        # Training loop: zero grads, forward, loss, backward, step
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # END SOLUTION

        running_loss += loss.item()
        if i % 2000 == 1999:
            print(f"[epoch {epoch + 1}, minibatch {i + 1:5d}] loss: {running_loss / 2000:.3f}")
            running_loss = 0.0

print("Finished Training")

In [None]:
# Test assertions
# Training should have completed
assert "running_loss" in dir() or True, "Training loop should have completed"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the model has been trained (parameters should have changed)
# END HIDDEN TESTS

## Problem 4: Evaluate on the Entire Test Dataset

Calculate and print the accuracy of your network on the full test dataset. Apply the trained model to the whole test set without computing gradients, obtain the predicted labels, and compare them to the true labels to obtain the overall accuracy.

Your network should achieve at least 50% accuracy. If it does not, modify your solutions to the previous problems (without increasing the number of training epochs) to achieve the target accuracy.

In [None]:
correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        # BEGIN SOLUTION
        # Evaluate without gradients: get predictions and compare to labels
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        # END SOLUTION

print(f"Accuracy of the network on the 10000 test images: {100 * correct // total}%")

In [None]:
# Test assertions
assert total > 0, "total should be greater than 0"
assert correct >= 0, "correct should be non-negative"
accuracy = 100 * correct / total
assert accuracy >= 50, f"Model should achieve at least 50% accuracy, got {accuracy:.1f}%"
print("All tests passed!")

# BEGIN HIDDEN TESTS
assert total == 10000, "Should evaluate on all 10000 test images"
# END HIDDEN TESTS

## Problem 5: Accuracy per Class

The previous problem evaluated the overall accuracy. For this problem, provide a breakdown of the model's accuracy for each individual class in the dataset. Report the classification accuracy for each class separately.

In [None]:
correct_pred = dict.fromkeys(classes, 0)
total_pred = dict.fromkeys(classes, 0)

with torch.no_grad():
    for data in testloader:
        # BEGIN SOLUTION
        # Track correct predictions per class
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        for label, prediction in zip(labels, predictions, strict=True):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1
        # END SOLUTION

for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f"Accuracy for {classname:5s}: {accuracy:.1f}%")

In [None]:
# Test assertions
assert len(correct_pred) == 10, "Should have predictions for all 10 classes"
assert len(total_pred) == 10, "Should have totals for all 10 classes"
assert all(v >= 0 for v in correct_pred.values()), "correct_pred values should be non-negative"
print("All tests passed!")

# BEGIN HIDDEN TESTS
assert sum(total_pred.values()) == 10000, "Total predictions should sum to 10000"
# END HIDDEN TESTS