[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NNDesignDeepLearning/NNDesignDeepLearning/blob/master/10.PyTorchIntroChapter/Code/LabSolutions/PyTorchIntroLab1_Solution.ipynb)

# PyTorch Introduction Lab 1 -- Getting Started

This objective of this PyTorchFlow lab is to help you become familiar with the basics of using PyTorch to load data, create convolution networks, train the networks and display the results. If you haven't already done so, run the cells in the `PyTorchFlowIntroChapter.ipynb` Jupyter Notebook to prepare for this lab.

Some of the cells in this notebook are prefilled with working code. In addition, there will be cells with missing code (labeled `# TODO`), which you will need to complete. If you need additional cells, you can use the `Insert` menu at the top of the page.

## Loading Modules

We begin by loading some useful modules.

In [None]:
%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torch.nn.functional as F
from torchvision import datasets
from torchvision.transforms import ToTensor
import os

# Loading Data

For this lab we will use a famous data set -- MNIST. This is a large database of handwritten digits. It contains 60,000 training images and 10,000 testing images. Each image consists of arrays of 28x28 pixels. The original website for the data, which describes the dataset in detail, and records accuracies using various machine learning strategies, can be found [here](http://yann.lecun.com/exdb/mnist/). The data set can be accessed easily using `torchvision.datasets`, as illustrated in the next cell.

In [None]:
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

In the next cell, print out the number of examples in the training and test sets, the shape of the first feature and the first label.

In [None]:
# Length of Training Data
# TODO
print()
# Length of Testing Data
# TODO
print()
# The shape of the first feature
# TODO
print()
# The first label
# TODO
print()

Now plot the first feature, to see if it matches the label.

In [None]:
plt.imshow(training_data[0][0][0], cmap='gray')

Put the training and testing data into DataLoaders. Use a batch size of 100 for both sets, and shuffle the training data, but not the test data.

In [None]:
# TODO
BATCH_SIZE =
train_loader =
test_loader =

# Constructing the Model

Now that the data is loaded, the next step is to construct the model. Create a method that uses the module subclass method to construct a network with two convolution layers and two fully connected layers and returns the constructed model. The function `nn.Conv2d()` is used to create the convolution layers. The important arguments are

1. `in_channels` -- number of input feature maps
2. `out_channels` -- number of output feature maps
3. `kernel_size`
4. `stride`
5. `padding`

The parameters `kernel_size`, `stride`, and `padding` can either be:
* a single int – in which case the same value is used for the height and width dimension
* a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension

The network should have the following components:
1. Convolution with 32 feature maps, 3x3 kernel, stride of 1 and no padding.
2. ReLU activation.
3. Convolution with 64 feature maps, 3x3 kernel, stride of 1 and no padding.
4. ReLU activation.
5. Max pooling layer using `F.max_pool2d(x, 2)`.
6. Dropout with activation probability of 0.25.
7. Convert to vector with `torch.flatten(x,1)`
8. Fully connected layer with 128 neurons, using `nn.Linear`.
9. ReLU activation.
10. Dropout with activation probability of 0.5.
11. Fully connected layer with 10 neurons, using `nn.Linear`.

In [None]:
# Define the CNN model
# TODO
class cnn_model(nn.Module):
    def __init__(self):
        super(cnn_model, self).__init__()
        # TODO
        self.conv1 =
        self.conv2 =
        self.dropout1 =
        self.dropout2 =
        self.fc1 =
        self.fc2 =

    def forward(self, x):
        # TODO
        x =
        x =
        x =
        x =
        x =
        x =
        x =
        x =
        x =
        x =
        x =
        output = x
        return output

Use the method you just created to construct a model.

In [None]:
model = cnn_model()

After constructing the model, print a summary.

In [None]:
print(model)

# Training the Network

The first step in training the network is to select the optimizer. Use `Adam` as the training function.

In [None]:
# TODO
optimizer =

Assign the loss function as `nn.CrossEntropyLoss()`.

In [None]:
# TODO
loss_fn =

Write a training loop. First, use a GPU if one is available. Train for 10 epochs, using the train_loader created above. Every 100 iterations, print out the training loss for the current minibatch, and save the loss for later plotting.

In [None]:
# Training loop
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

total_loss =  []
ind =  []
for epoch in range(10):
    for batch_idx, (data, target) in enumerate(train_loader):
        # TODO






        if batch_idx % 100 == 0:
            total_loss.append(loss.item())
            ind.append(batch_idx + epoch*len(train_loader)/BATCH_SIZE)
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))


Plot the loss that you saved in the training loop.

In [None]:
plt.plot(total_loss)
plt.title('Training Loss')
plt.xlabel('Iterations')
plt.show()

# Evaluate the Trained Model

In a loop of minibatches, using the test_loader, compute the overall accuracy of the network on the test set.

In [None]:
# Testing loop
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        # TODO




print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
    correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

To get some insight into what the network has learned, plot the kernels of the 32 feature maps in the first layer of the network. You can get the weights of that layer uisng `model1.conv1.weight.data.cpu().numpy()`.

In [None]:
# Access the first convolutional layer
# TODO
first_conv_layer =

# Get the weights of the layer
# TODO
weights =

# Visualize the kernels
fig, axes = plt.subplots(nrows=8, ncols=4)
for i in range(32):
    ax = axes[i // 4, i % 4]
    ax.imshow(weights[i, 0], cmap='gray')
    ax.axis('off')

plt.tight_layout()
plt.show()

Do these kernels give you any insight into how the network is identifying the different numerals?

Another way to understand the operation of the network is to look at the output of the feature maps for a specific input. In the next code block, select a single image from the test loader and apply it to the network. Then plot the outputs of the 32 feature maps in the first layer. You can access the outputs of the feature maps using `model1.conv1(image).cpu()`.

In [None]:
# Get the output of the first convolutional layer
# TODO
first_batch =
image =
output =

# Visualize feature maps
for i in range(output.shape[0]):
    plt.subplot(8, 4, i+1)  # Adjust grid size as needed
    plt.imshow(output[i].detach().numpy(), cmap='gray')
    plt.axis('off')

plt.show()

After the model is trained to your satisfaction, save the model so that it can be used in the second lab. It is possible to save the entire model, but it is recommended to save just the model’s learned parameters, which are stored in `model.state_dict()`. This can be done with `torch.save(model.state_dict(), PATH)`.

To load the model later, you need to first create an instance of the model, for example, by using `model = cnn_model()`. Then you can load the parameters using `model.load_state_dict(torch.load(PATH, weights_only=True))`.

In the next cell, save the model. We will load the model back in the second PyTorch lab.

In [None]:
path = os.getcwd()
os.makedirs('../Model' , exist_ok=True)
#data_path = '/media/martin/Storage/github/DeepLearning/10.PyTorchIntroChapter/Code/data/'
model_path = '../Model/'
# TODO
torch.save()

## Explore Further

Experiment with different network architectures. Try to find the architecture that gives you the best accuracy. Investigate the following.

1. Increase the size of the convolution kernels, and display the kernels. Do the shapes of the kernels become more intuitive?
1. Increase the number of feature maps in the convolution layers. Does the testing accuracy increase?
1. What if you add another convolution layer? Do you get better results increasing the number of neurons in each layer, or the number of layers (assuming the overal number of weights stays the same)?
1. Try using batch normalization and removing the dropout. How does the test accuracy change?
1. How small can you make the network and still achieve 98% test accuracy.
1. Train with and without the GPU. How much speedup, if any, does the GPU give you?