# Convolutional Networks in PyTorch

In the last exercise, we explained what convolution and pooling are, how these operations work, and what they are used for in the context of neural networks. The goal of this exercise is to train simple neural networks to classify images of handwritten digits. We will use the *PyTorch* framework, which is used for training neural networks. If you haven't installed it yet, [follow the current installation guide](https://pytorch.org/get-started/locally/). Then, download this notebook and work in it locally, or for example, on [Google Colab](https://colab.research.google.com).

If you are running the notebook via Colab, use the following code to work with the graphics card. However, since the dataset is simple, if you don't have a GPU setup on your computer, you can still train the network in a few minutes.


In [None]:
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

## Loading the Dataset

Before we prepare the structure of the neural network, we need to load the training and testing data. In this exercise, we will use the standard [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database), as training a network on it has become a sort of modern-day version of the *Hello, world!* program. The dataset contains handwritten digits from 0-9 in the form of grayscale images with dimensions of 28×28.

![MNIST dataset example](https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamplesModified.png)


Before loading the data itself, let's first import the necessary libraries:


In [None]:
# automatically reload external modules if they change
%load_ext autoreload
%autoreload 2

import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchbearer
from torch import nn
from torch import optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchbearer import Trial

The very first necessary step when loading images is the need to transform them into tensor representation. In PyTorch, each image is represented as a three-dimensional tensor with dimensions `[channels][height][width]`, where `channels` is the number of color channels (3 for RGB images, 1 for grayscale). These transformations are defined as a list or a series of simple operations that will be applied when loading the images.


In [None]:
# convert each image to tensor format
transform = transforms.Compose([
    transforms.ToTensor()  # convert to tensor
])

# load data
trainset = MNIST(".", train=True, download=True, transform=transform)
testset = MNIST(".", train=False, download=True, transform=transform)

# create data loaders
trainloader = DataLoader(trainset, batch_size=128, shuffle=True)
testloader = DataLoader(testset, batch_size=128, shuffle=True)

In addition to loading the dataset, we will also create dataloaders that will be responsible for delivering the used data in batches. Besides the batch size, we will also define the loading method; here, for example, we set a random order for usage because regular neural networks are trained efficiently when they are trained on temporally independent data.


## Definition of the Neural Network

The next step is to define the model of the neural network. To begin with, we will add only one convolutional layer and padding to our network, so you can see the basic structure of a convolutional block. For completeness of the implementation, we will also add an activation layer as well as `Dropout` to prevent overfitting. The structure of the network is as follows:

1. The first hidden layer (the input one will be created automatically) is a convolutional layer `Convolution2D`. The layer contains 32 feature maps with 5×5 kernels and a ReLU activation layer.
2. The next layer is a max pooling layer `MaxPooling2D`. The filter size is 2×2.
3. Then we use a regularization layer `Dropout`, which randomly zeroes out 20% of the neurons in the layer to avoid overfitting.
4. Before reaching the classification part of the neural network, we need to adjust the output of the previous layers so that it can be processed by fully connected layers. This is done by the *flatten* operation, which transforms the tensor representation into a vector that can be processed by traditional layers.
5. We proceed with a layer of 128 neurons and another ReLU activation layer.
6. The output layer is given by the number of classes, so it contains 10 neurons.

To define the network, we will create a subclass of `nn.Module`:


In [None]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, (5, 5), padding=0)
        self.fc1 = nn.Linear(32 * 12**2, 128)
        self.fc2 = nn.Linear(128, 10)
            
    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(out)
        out = F.max_pool2d(out, (2,2))
        out = F.dropout(out, 0.2)
        out = out.view(out.shape[0], -1)
        out = self.fc1(out)
        out = F.relu(out)
        out = self.fc2(out)
        return out

When generating vectors using the `view` method, the second parameter is set to `-1`, which serves to automatically calculate the second dimension and ensures that the batch size is preserved. The input to the `forward` method therefore has dimensions `[batch_size][channels][height][width]`, and the output will have dimensions `[batch_size][num_classes=10]`.

**What are the dimensions of the inputs and outputs for each layer?**


## Training the Network

After defining the model, we can train our network. We will use the cross-entropy loss function and the ADAM optimizer. We will train for 5 epochs with a batch size of 128. For a simpler implementation of training and evaluation, we can use `torchbearer`.


In [None]:
# build the model
model = SimpleCNN()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.Adam(model.parameters())

device = "cuda:0" if torch.cuda.is_available() else "cpu"
trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy']).to(device)
trial.with_generators(trainloader, test_generator=testloader)
trial.run(epochs=5)
results = trial.evaluate(data_key=torchbearer.TEST_DATA)
print(results)

## Network Extension

Create an extended neural network structure with the following architecture:

1. Convolutional layer with 30 feature maps of size 5×5 and ReLU activation.
2. Max pooling layer with dimensions 2×2.
3. Convolutional layer with 15 feature maps of size 3×3 and ReLU activation.
4. Max pooling layer with dimensions 2×2.
5. Dropout layer with a probability of 20%.
6. Flatten layer.
7. Fully connected layer with 128 neurons and ReLU activation.
8. Fully connected layer with 50 neurons and ReLU activation.
9. Output layer.


In [None]:
import torch 
import torch.nn.functional as F
from torch import nn

# Model Definition
class BetterCNN(nn.Module):
    def __init__(self):
        super(BetterCNN, self).__init__()
        # TODO: define layers
    
    def forward(self, x):
        # TODO: define structure
        return

Use the following code for training:


In [None]:
#reset the data loaders
trainloader = DataLoader(trainset, batch_size=128, shuffle=True)
testloader = DataLoader(testset, batch_size=128, shuffle=True)

# build the model
model = BetterCNN()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.Adam(model.parameters())

device = "cuda:0" if torch.cuda.is_available() else "cpu"
trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy']).to(device)
trial.with_generators(trainloader, test_generator=testloader)
trial.run(epochs=5)
results = trial.evaluate(data_key=torchbearer.TEST_DATA)
print(results)

## Uloženie siete

After training a network, you often need to save the weights and load them for future use. In PyTorch, the method `torch.save(state, filepath)` is used to save the model, which stores the weight values into a file so that they can be loaded later.


In [None]:
#save the trained model weights
torch.save(model.state_dict(), "./bettercnn.weights")

If you are working in Colab, you can download the model using the method:


In [None]:
from google.colab import files
files.download('bettercnn.weights')

## Using the Network

Once you have trained and saved the network, you probably want to use it for making predictions during the model's lifetime, for example, within an application. Ideally, the network should be able to process inputs that were not part of the original dataset but share the same fundamental characteristics. To verify the functionality of your solution from the previous step, download the [sample digit images](lab03/imgs.zip).

Next, load the saved weights. **Note:** By default, PyTorch only saves the weights, not the network structure. Therefore, if you're writing code across multiple files, you will often need to copy/import the class defining the network into the file where you are handling the weight loading.


In [None]:
import matplotlib.pyplot as plt

# build the model and load state
model = BetterCNN()
model.load_state_dict(torch.load('bettercnn.weights'))

# put model in eval mode
model = model.eval()

On the last line of the previous code block, we enable the so-called *eval* mode, which sets the network for inference use. In this mode, the weight values are not updated, and some layers, such as `Dropout` or `BatchNorm2D`, are disabled.

Next, we load and visualize a sample input (you can choose a different sample as well):


In [None]:
from PIL import Image
import torchvision

transform = torchvision.transforms.ToTensor()
im = transform(Image.open("imgs/1.png"))

plt.imshow(im[0], cmap=plt.get_cmap('gray'))

Now we can use the model for prediction, but the network expects a batch as input, while we are providing just one image. That's why we call the `unsqueeze(0)` method, which adds the necessary dimensions. The network's output will then be 10 values, each representing the prediction score for the individual classes. The final prediction of the network is the class with the highest predicted value.


In [None]:
batch = im.unsqueeze(0)
predictions = model(batch)

print("logits:", predictions.data)

_, predicted_class = predictions.max(1)

print("predicted class:", predicted_class.item())

**Check the network's prediction on all the sample data.**


In [None]:
transform = torchvision.transforms.ToTensor()
for i in range(10):
    im = transform(Image.open("imgs/{}.png".format(i)))

    plt.imshow(im[0], cmap=plt.get_cmap('gray'))
    batch = im.unsqueeze(0)
    predictions = model(batch)

    print("logits:", predictions.data)

    _, predicted_class = predictions.max(1)

    print("predicted class:", predicted_class.item())

## Visualization of Filters

To gain an intuitive understanding of how a neural network works, we can visualize various characteristics of individual layers and filters. Filters can be visualized because they can be considered as small images that describe the basic features of the inputs. The filter weights can be loaded directly from the trained network and visualized using `matplotlib`:


In [None]:
weights = model.conv1.weight.data.cpu()

# plot the first layer features
for i in range(0,30):
    plt.subplot(5,6,i+1)
    plt.imshow(weights[i, 0, :, :], cmap=plt.get_cmap('gray'))
plt.show()

The value `model.conv1.data` is a tensor containing the weight values, and the `cpu()` method loads the data from the GPU to the processor for better visualization. Since the influence of individual weights is aggregated as we move through the network, it often makes sense to visualize only the filters of the first layer.


## Visualization of Feature Maps

In a similar way to the filters themselves, we can visualize their effect on a selected input by passing the input through the network and obtaining the outputs at any layer. In PyTorch, this can be done using a `hook` object, which is used to pause the forward pass at a certain layer. For example, for the second convolutional layer, we can use:


In [None]:
transform = torchvision.transforms.ToTensor()
im = transform(Image.open("imgs/1.PNG")).unsqueeze(0)

def hook_function(module, grad_in, grad_out):
    for i in range(grad_out.shape[1]):
        conv_output = grad_out.data[0, i]
        plt.subplot(5, int(1+grad_out.shape[1]/5), i+1)
        plt.imshow(conv_output, cmap=plt.get_cmap('gray'))
        
hook = model.conv2.register_forward_hook(hook_function) # register the hook
model(im) # forward pass
hook.remove() #Tidy up

**Try visualizing the outputs of the first convolutional layer.**


## Visualization of Maximum Activation

The last useful way to visualize what the filters have learned is to find an input image that would cause the maximum activation of a filter. We can obtain such an image by generating random noise, which we optimize using gradient ascent until we find the maximization for the given filter. The following code will generate exactly such an image:


In [None]:
def visualise_maximum_activation(model, target, num=10, alpha = 1.0):
    for selected in range(num):
        input_img = torch.randn(1, 1, 28, 28, requires_grad=True)

        # we're interested in maximising outputs of the 3rd layer:
        conv_output = None

        def hook_function(module, grad_in, grad_out):
            nonlocal conv_output
            # Gets the conv output of the selected filter/feature (from selected layer)
            conv_output = grad_out[0, selected]

        hook = target.register_forward_hook(hook_function)

        for i in range(30):
            model(input_img)
            loss = torch.mean(conv_output)
            loss.backward()

            norm = input_img.grad.std() + 1e-5
            input_img.grad /= norm
            input_img.data = input_img + alpha * input_img.grad

        hook.remove()

        input_img = input_img.detach()

        plt.subplot(2,num/2,selected+1)
        plt.imshow(input_img[0,0], cmap=plt.get_cmap('gray'))

    plt.show()
    
visualise_maximum_activation(model, model.fc3)