# Assignment 3: Image Classification with Convolutional Neural Network

A convolutional network should be designed for the automatic classification of digits in street house numbers.
Particularly, we make use of the SVHN dataset, which is composed of 10 different classes, one for each digit, and implemented in the `torchvision.datasets`: https://pytorch.org/vision/stable/generated/torchvision.datasets.SVHN.html
The color images are of size $32\times32$ pixels, where the house number of interest is in the center of the image.

Additionally to the network, you are supposed to implement at least one regularization techniques that you defined in Task (b).
Note that such techniques can be implemented in the network design as well as in the training loop.
Make sure that you adapt your code as to accommodate these regularization techniques.

## Task (c) Dataset and Data Loaders

Instantiate the training and the test set of the SVHN dataset.
Set the original data to download automatically.
Apply appropriate transforms to the training and the test datasets.
Instantiate data loaders for the two datasets with reasonable parameters.

In [None]:
import torchvision
import torch

# instantiate the training set
training_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),                                    # Resize the image such that the shorter side has size 256
    torchvision.transforms.CenterCrop(224),                                # Take the center crop of size 224x224
    torchvision.transforms.ToTensor(),                                     # Convert the image into a tensor
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],           # Normalize pixel values with mean
                                     std=[0.229, 0.224, 0.225])            # Normalize pixel values with standard deviation
])
training_dataset = torchvision.datasets.SVHN(root="data", split="train", transform=training_transform, download=True)
training_dataloader = torch.utils.data.DataLoader(training_dataset, batch_size=32, shuffle=True)

# instantiate the test set
testing_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),                                    # Resize the image such that the shorter side has size 256
    torchvision.transforms.CenterCrop(224),                                # Take the center crop of size 224x224
    torchvision.transforms.ToTensor(),                                     # Convert the image into a tensor
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],           # Normalize pixel values with mean
                                     std=[0.229, 0.224, 0.225])            # Normalize pixel values with standard deviation
])
testing_dataset = torchvision.datasets.SVHN(root="data", split="test", transform=testing_transform, download=True)
testing_dataloader = torch.utils.data.DataLoader(testing_dataset, batch_size=32, shuffle=False)

## Image Samples

We provide the source code to visualize some samples of the training set.
We also print the class labels for the data, which are arranged in the same grid as the images.
Feel free to run the code below.

In [None]:
from matplotlib import pyplot

fig, axes = pyplot.subplots(5,10,figsize=(10,5))

for i in range(5):
    for j in range(10):
        image, label = training_dataset[i*10+j]
        axes[i][j].imshow(image.permute(1,2,0))
        axes[i][j].axis("off")
        print(label, end=" ")
    print()

## Task (d): Convolutional Network Implementation

The convolutional network is designed with the following layers:

- A convolution layer with kernel size $5\times5$, 32 output channels, stride 1, padding 2 in vertical dimension only.
- A maximum pooling layer of size $2\times2$.
- A convolution layer with kernel size $5\times5$, 64 output channels, stride 1, padding 2 in vertical dimension only.
- A maximum pooling layer of size $2\times2$.
- A convolution layer with kernel size $5\times5$, 128 output channels, stride 1, padding 2 in vertical dimension only.
- A fully-connected layer with $K$ inputs and $O$ outputs.

Please add additional layers in your implementation whenever you see the need for them.

In [None]:
import torch

# implement and instantiate network

network = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=(0,2)),
            torch.nn.MaxPool2d(kernel_size=2, stride=2),
            torch.nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=(0,2)),
            torch.nn.MaxPool2d(kernel_size=2, stride=2),
            torch.nn.Conv2d(64, 128, kernel_size=5, stride=1, padding=(0,2)),
            torch.nn.Linear(128*5*5, 10))


# Task (e): Network Training

Instantiate an appropriate loss function and an optimizer, with reasonable parameters. 
Train the network for 10 epochs on the GPU device (during development, you can use the CPU device, but at the end, the code should be able to run on the GPU). 
Compute the average training set loss within an epoch. 
Compute the test set accuracy at the end of each epoch, and print it together with the training set loss.

In [None]:
# Instantiate loss function
loss = torch.nn.CrossEntropyLoss()

# instantiate optimizer
optimizer = torch.optim.Adam(network.parameters(), lr=0.001)

# make sure to train everything on the GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

for epoch in range(10):
    # train the network on the training set
    for x, t in training_dataloader:
        train_loss = 0
        # train on the batch
        x, t = x.to(device), t.to(device)
        z = network(x)
        J = loss(z, t)
        optimizer.zero_grad()
        J.backward()
        optimizer.step()
        # aggregate training set loss
        train_loss += J.item() * x.size(0)

    # evaluate of the test set
    for x, t in testing_dataloader:
        test_loss = 0
        x = x.to(device)
        t = t.to(device)
        with torch.no_grad():
        # classify original samples
            z = network(x)
            test_loss += loss(z,t).item() * x.size(0)

    # print average training set loss and test set accuracy
    print(f"Epoch {epoch}: training loss {train_loss}, test accuracy {test_loss / len(testing_dataset)}")