# What is Convolution?

Suppose we have two images that are identical except one is slightly shifted as illustred below. We can see what happens when we convert both images to 12 dimensional vectors, where the grid locations represent the areas
where the intensity values are high.

<img src="images/convolution.svg" width="70%"/>

As we can see the intensity values are in a totally different location. We understand that the relationship between the pixels are more important than the actual location of the pixels. Thus, we can use convolution since it preserves the spatial relationship. 

So, basically, to apply convolution we have the image and something called the kernel that we perform the convolution. The resulting of this process is an activation map.

<img src="images/apply_convolution.svg" width="60%"/>

Analogous to a linear equation, convolution can be performed on a vector or a matrix and we have the parameter $W$, which is known as a kernel, the parameter $b$, which is known as a bias, and the operator known as a convolution as:

$$Z = W \ast X + b$$

In PyTorch we can create a convolution object as follows:

In [1]:
import torch
import torch.nn as nn
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)

In [2]:
# zeros(nb_images, nb_channels, width, height)
image = torch.zeros(1, 1, 5, 5)
image[0,0,:,2] = 1
image

tensor([[[[0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.]]]])

In [3]:
z = conv(image)
z

tensor([[[[ 0.4304, -0.3784,  0.6896],
          [ 0.4304, -0.3784,  0.6896],
          [ 0.4304, -0.3784,  0.6896]]]], grad_fn=<ThnnConv2DBackward>)

How a convolution works can be seen in the image below:
   
<a href="https://stats.stackexchange.com/questions/199702/1d-convolution-in-neural-networks">
    <img src="https://i.stack.imgur.com/SFST9.gif" width="40%"/>
</a>

# Simple Convolutional Neural Network

A Convolutional Neural Network (CNN) usually has the architecture as a variation of the architecture below:
    
<img src="https://cdn-images-1.medium.com/max/1600/0*-1Pad7loK_dFOUvS.png" width="60%"/>

Here, we use a simple architeture having only two convolutional layers followed by a ReLU activation function and max pooling. In Pytorch, the code is as follows:

In [None]:
class CNN(nn.Module):
    def __init__(self, out_1=2, out_2=1):
        super(CNN, self).__init__()
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=out_1, kernel_size=2, padding=0)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=1)

        self.cnn2 = nn.Conv2d(in_channels=out_1, out_channels=out_2, kernel_size=2, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=1)
        self.fc1 = nn.Linear(out_2*7*7, 2)


    def forward(self, x):
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.maxpool1(out)
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.maxpool2(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        return out

In order to train and validate, the code keeps the same structure as previous examples:

In [None]:
# create a train set and a validation set
train_dataset = Data(N_images=1000)
validation_dataset = Data(N_images=1000, train=False)
model = CNN(2, 1)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

n_epochs = 100
loss_list = []
accuracy_list = []
train_loader = torch.utils.data.Dataloader(dataset=train_dataset, batch_size=100)
validation_loader = torch.utils.data.Dataloader(dataset=validation_dataset, batch_size=5000)

for epoch in range(n_epochs):
    for x, y in train_loader:
        optimizer.zero_grad()
        z = model(x)
        loss = criterion(z, y)
        loss.backward()
        optimizer.step()

    correct = 0
    for x_test, y_test in validation_loader:
        z = model(x)
        _, yhat = torch.max(z.data, 1)
        correct += (yhat == y_test).sum().item()
    accuracy = float(correct) /N_test
    accuracy_list.append(accuracy)
    loss_list.append(loss.data)

# Pre-trained Models

We can use a number of [pre-trained models](https://pytorch.org/docs/stable/torchvision/models.html) available in Pytorch. An example of how to use a pre-trained model of ResNet18 is as follows:

In [None]:
import torchvision.models as models

model = model.resnet18(pretrained=True)

mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.255]

transforms_stuff = transforms.Compose(
                        [transforms.Resize(224), 
                         transforms.ToTensor(), 
                         transforms.Normalize(mean, std)]
)

train_data = dataset(root='./data', download=True, transform=transforms_stuff)
validation_data = dataset(root='./data', split='test', download=True, transform=transforms_stuff)

for param in model.parameters():
    param.requires_grad=False

# change the last fc layer 
model.fc = nn.Linear(512, 3)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([parameters for parameters in model.parameters() if parameters.requires_grad], 
                             lr=0.001)