# Convolutional Neural Networks

Convolution = coil / twist 

The standard neural network we looked at in the previous lesson takes in a vector as input thus a flattened image could be passed in as input and used for classification problems successfully. But this is not the best way to do it. If you think about an image, the spatial relations between the different pixels is an important piece of information when determening what the image is. If we scrambled the pixels of an image, it would be much harder to determine what is in it. This is basically what we are doing when we flatten the image and use standard NNs as the order of pixels which we use in the input vector to the network does not affect the performance of the NN. We are losing the information of the spatial relations of the pixels.

Convolutional neural networks solve this very problem. Rather than performing a matrix multiplication, a convolution operation is performed which can take in a 2d input and give a 2d output hence keep the information about the spatial relations of the pixels. This greatly increases their performance on image and video processing tasks.

In the convolution proccess, you have a filter which you start on the top left side of the image and slide across the whole image, taking a dot product between the values of the filter and pixel values of the image. Bear in mind that colour images have three channels so your filter may be 3-d so you take the dot product across a 3d-volume. Each dot product corresponds to a single activation value in a 2-d matrix of neurons which corresponds to a single layer in the output.

The animation below shows how a 1x3x3 filter is applied to a 1x5x5 image. Notice how the filter has high output values when there is an X shape in the input image. This is because the values of the filter are such that it is performing pattern matching for the X shape.

![](cnn.gif)

For a long time, operations like this were used in computer vision to find different patterns in images with the engineers having to manually tune the values of the filters to perform the required function. The only difference now is that we apply an activation such as Relu or Sigmoid at each layer and after setting up the structure of the network, we initialize the filter values randomly before using gradient descent to automatically tune the values of the filters.
We can also apply pooling operations to subsample the output at each layer therefore reducing the number of parameters that need to be learned for the next convolution operation.

![](cnn.png)

Just like before, each layer in the whole network learns higher level abstract features from the inputs. In CNNs the features are even more interpretable as the output of each layer is 2d so can be viewed as an image.

## Implementation
we will be implementing a CNN which can take as input a 1x28x28 black and white images of hand-written digits and classifying it into which digit the image depicts. We have a dataset of 60,000 images and their corresponding labels to train on and 10,000 images to test on. This is called the MNIST dataset.<br> 
Identify the hand written numbers 

Lets import the necessary libraries

In [0]:
%matplotlib notebook
import torch
from torch.autograd import Variable
import torchvision #classic datasets that can be used; it includes MNIST dataset 
from torchvision import transforms, datasets
import torch.nn.functional as F
import matplotlib.pyplot as plt

Import the MNIST training and test sets from the default PyTorch datasets

In [5]:
training_data = datasets.MNIST(root='data/',
                                  transform=transforms.ToTensor(),
                                  train=True,
                                  download=True, 
                               )
#sepcify location to save the dataset 

test_data = datasets.MNIST(root='data/',
                           transform=transforms.ToTensor(),
                           train=False,
                            )

#show our first training data item
print(training_data[0])
plt.imshow(training_data[0][0][0], cmap='gray_r') #color map: black background with white digit 
plt.show()

print('Number of training examples:', len(training_data))
print('Number of test examples:', len(test_data))

(
(0 ,.,.) = 

Columns 0 to 8 
   0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.1176
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.1922  0.9333
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0706  0.8588
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.3137
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0

Create dataloader generator from our datasets. Dataloaders generate random samples from the training set and the test set (not randomly from the test set as it doesn't affect our results). Done with data pre-processing 

In [0]:
batch_size = 128# put 128 training data to calculate h, derivatives etc. --> do in batch of 128 

# dataloader is a generator that can sample from the training set
training_samples = torch.utils.data.DataLoader(dataset=training_data,
                                              batch_size=batch_size,
                                              shuffle=True)

test_samples = torch.utils.data.DataLoader(dataset=test_data,
                                            batch_size=batch_size,
                                            shuffle=False) #testing the whole dataset, so do not need to shuffle

Define the model calss, it is a 2 layer convnet with a fully connected layer at the end to map it to 10 output neurons, each corresopnding to the probability of the input image being a particular digit <br> make a architecture that takes image & applies model 

In [0]:
class convnet(torch.nn.Module):

    def __init__(self):
        super().__init__()
            # conv2d(in_channels, out_channels, kernel_size)
            # in_channels is the number of layers which it takes in (i.e.num color streams in 1st layer)
            # out_channels is the number of different filters that we use
            # kernel_size is the depthxwidthxheight of the kernel#
            # stride is how many pixels we shift the kernel by each time
        self.conv1 = torch.nn.Conv2d(1, 32, kernel_size=5, stride=1) #kernel = size of filter, stride = how much we shift the filter @ each calculation
        # 1 = input channel 
        self.conv2 = torch.nn.Conv2d(32, 64, kernel_size=5, stride=1) #second layer 
        self.dense1 = torch.nn.Linear(25600, 10) # 25600 = 64* 20*20 --> given use that size kernal and that size stride 

    def forward(self, x):
        x = F.relu((self.conv1(x))) #apply convulation and activation 
        x = F.relu((self.conv2(x))).view(x.shape[0], -1) #flatten output ready for fully connected layer
        x = F.softmax(self.dense1(x), dim=1) #fully connected layer with softmax activation
        
        return x

Define hyper-parameters, cost function, optimizer and instantiate model

In [0]:
lr = 0.0001 #learning rate
epochs = 1 #number of epochs

cnn = convnet() #instantiate model
criterion = torch.nn.CrossEntropyLoss() #cost function
#multiple neurons --> thake the binary cross entropy for each neuron 
optimizer = torch.optim.Adam(cnn.parameters(), lr=lr) #optimizer

Define training loop and train for the number of epochs. Since our dataset is huge and an epoch takes a while to complete, we will plot our cost every batch rather than every epoch.

In [11]:
def train(epochs):
    #for plotting cost per batch
    costs = []
    plt.ion()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.show()
    ax.set_xlabel('Batch')
    ax.set_ylabel('Cost')
    
    for e in range(epochs): # each epoch, need run-through of whole dataset 
        for i, (x, y) in enumerate(training_samples):
            x, y = Variable(x), Variable(y)

            h = cnn.forward(x) #calculate hypothesis
            cost = criterion(h, y) #calculate cost --> cost bw hypothesis and label 
            
            optimizer.zero_grad() #zero gradients
            cost.backward() # calculate derivatives of values of filters
            optimizer.step() #update parameters

            costs.append(cost.data[0])
            ax.plot(costs, 'b')
            fig.canvas.draw()

            print('Epoch', e, '\tBatch', i, '\tCost', cost.data[0])

train(epochs)

<IPython.core.display.Javascript object>

Epoch 0 	Batch 0 	Cost 2.30241060256958
Epoch 0 	Batch 1 	Cost 2.3006432056427
Epoch 0 	Batch 2 	Cost 2.297680377960205
Epoch 0 	Batch 3 	Cost 2.2955784797668457
Epoch 0 	Batch 4 	Cost 2.293663263320923
Epoch 0 	Batch 5 	Cost 2.2896103858947754
Epoch 0 	Batch 6 	Cost 2.2851369380950928
Epoch 0 	Batch 7 	Cost 2.28212833404541
Epoch 0 	Batch 8 	Cost 2.2759087085723877
Epoch 0 	Batch 9 	Cost 2.270097017288208
Epoch 0 	Batch 10 	Cost 2.268277645111084
Epoch 0 	Batch 11 	Cost 2.259593963623047
Epoch 0 	Batch 12 	Cost 2.2429239749908447
Epoch 0 	Batch 13 	Cost 2.228670120239258
Epoch 0 	Batch 14 	Cost 2.228891134262085
Epoch 0 	Batch 15 	Cost 2.2145376205444336
Epoch 0 	Batch 16 	Cost 2.2169487476348877
Epoch 0 	Batch 17 	Cost 2.1613805294036865
Epoch 0 	Batch 18 	Cost 2.161062479019165
Epoch 0 	Batch 19 	Cost 2.13957142829895
Epoch 0 	Batch 20 	Cost 2.1286985874176025
Epoch 0 	Batch 21 	Cost 2.1278269290924072
Epoch 0 	Batch 22 	Cost 2.1148953437805176
Epoch 0 	Batch 23 	Cost 2.042966604232

Epoch 0 	Batch 68 	Cost 1.7045613527297974
Epoch 0 	Batch 69 	Cost 1.7894513607025146
Epoch 0 	Batch 70 	Cost 1.7618305683135986
Epoch 0 	Batch 71 	Cost 1.6706362962722778
Epoch 0 	Batch 72 	Cost 1.713289737701416
Epoch 0 	Batch 73 	Cost 1.7691351175308228
Epoch 0 	Batch 74 	Cost 1.701555609703064
Epoch 0 	Batch 75 	Cost 1.7556195259094238
Epoch 0 	Batch 76 	Cost 1.7267024517059326
Epoch 0 	Batch 77 	Cost 1.7703443765640259
Epoch 0 	Batch 78 	Cost 1.7408050298690796
Epoch 0 	Batch 79 	Cost 1.7683489322662354
Epoch 0 	Batch 80 	Cost 1.6693369150161743
Epoch 0 	Batch 81 	Cost 1.7589823007583618
Epoch 0 	Batch 82 	Cost 1.7729517221450806
Epoch 0 	Batch 83 	Cost 1.723032832145691
Epoch 0 	Batch 84 	Cost 1.7235840559005737
Epoch 0 	Batch 85 	Cost 1.6888537406921387
Epoch 0 	Batch 86 	Cost 1.7155338525772095
Epoch 0 	Batch 87 	Cost 1.7152698040008545
Epoch 0 	Batch 88 	Cost 1.7264167070388794
Epoch 0 	Batch 89 	Cost 1.6931562423706055
Epoch 0 	Batch 90 	Cost 1.7172017097473145
Epoch 0 	Batch

Epoch 0 	Batch 135 	Cost 1.6421936750411987
Epoch 0 	Batch 136 	Cost 1.721541166305542
Epoch 0 	Batch 137 	Cost 1.6693248748779297
Epoch 0 	Batch 138 	Cost 1.6922498941421509
Epoch 0 	Batch 139 	Cost 1.6200445890426636
Epoch 0 	Batch 140 	Cost 1.6040853261947632
Epoch 0 	Batch 141 	Cost 1.6280999183654785
Epoch 0 	Batch 142 	Cost 1.6293039321899414
Epoch 0 	Batch 143 	Cost 1.6494017839431763
Epoch 0 	Batch 144 	Cost 1.5979381799697876
Epoch 0 	Batch 145 	Cost 1.641823172569275
Epoch 0 	Batch 146 	Cost 1.6183675527572632
Epoch 0 	Batch 147 	Cost 1.6506105661392212
Epoch 0 	Batch 148 	Cost 1.625889778137207
Epoch 0 	Batch 149 	Cost 1.6671382188796997
Epoch 0 	Batch 150 	Cost 1.607689619064331
Epoch 0 	Batch 151 	Cost 1.639747977256775
Epoch 0 	Batch 152 	Cost 1.632187843322754
Epoch 0 	Batch 153 	Cost 1.6063508987426758
Epoch 0 	Batch 154 	Cost 1.58777916431427
Epoch 0 	Batch 155 	Cost 1.5968780517578125
Epoch 0 	Batch 156 	Cost 1.6492267847061157
Epoch 0 	Batch 157 	Cost 1.6168575286865

Epoch 0 	Batch 202 	Cost 1.5951389074325562
Epoch 0 	Batch 203 	Cost 1.5917036533355713
Epoch 0 	Batch 204 	Cost 1.5796973705291748
Epoch 0 	Batch 205 	Cost 1.5687202215194702
Epoch 0 	Batch 206 	Cost 1.5451173782348633
Epoch 0 	Batch 207 	Cost 1.5317161083221436
Epoch 0 	Batch 208 	Cost 1.6245362758636475
Epoch 0 	Batch 209 	Cost 1.5818933248519897
Epoch 0 	Batch 210 	Cost 1.5834095478057861
Epoch 0 	Batch 211 	Cost 1.6180145740509033
Epoch 0 	Batch 212 	Cost 1.5616987943649292
Epoch 0 	Batch 213 	Cost 1.5854510068893433
Epoch 0 	Batch 214 	Cost 1.5698968172073364
Epoch 0 	Batch 215 	Cost 1.5874286890029907
Epoch 0 	Batch 216 	Cost 1.5661461353302002
Epoch 0 	Batch 217 	Cost 1.594504714012146
Epoch 0 	Batch 218 	Cost 1.5658137798309326
Epoch 0 	Batch 219 	Cost 1.6062873601913452
Epoch 0 	Batch 220 	Cost 1.5750941038131714
Epoch 0 	Batch 221 	Cost 1.5630086660385132
Epoch 0 	Batch 222 	Cost 1.542922854423523
Epoch 0 	Batch 223 	Cost 1.5847787857055664
Epoch 0 	Batch 224 	Cost 1.5730112

Epoch 0 	Batch 269 	Cost 1.539233922958374
Epoch 0 	Batch 270 	Cost 1.5498570203781128
Epoch 0 	Batch 271 	Cost 1.550692081451416
Epoch 0 	Batch 272 	Cost 1.5745174884796143
Epoch 0 	Batch 273 	Cost 1.5471775531768799
Epoch 0 	Batch 274 	Cost 1.626419186592102
Epoch 0 	Batch 275 	Cost 1.5349018573760986
Epoch 0 	Batch 276 	Cost 1.5898035764694214
Epoch 0 	Batch 277 	Cost 1.6056435108184814
Epoch 0 	Batch 278 	Cost 1.5298731327056885
Epoch 0 	Batch 279 	Cost 1.6026743650436401
Epoch 0 	Batch 280 	Cost 1.5638691186904907
Epoch 0 	Batch 281 	Cost 1.567095398902893
Epoch 0 	Batch 282 	Cost 1.5381766557693481
Epoch 0 	Batch 283 	Cost 1.5686230659484863
Epoch 0 	Batch 284 	Cost 1.5905938148498535
Epoch 0 	Batch 285 	Cost 1.579026699066162
Epoch 0 	Batch 286 	Cost 1.608651876449585
Epoch 0 	Batch 287 	Cost 1.5588123798370361
Epoch 0 	Batch 288 	Cost 1.5782743692398071
Epoch 0 	Batch 289 	Cost 1.5332787036895752
Epoch 0 	Batch 290 	Cost 1.5787793397903442
Epoch 0 	Batch 291 	Cost 1.56924521923

Epoch 0 	Batch 336 	Cost 1.5500530004501343
Epoch 0 	Batch 337 	Cost 1.543817400932312
Epoch 0 	Batch 338 	Cost 1.6018399000167847
Epoch 0 	Batch 339 	Cost 1.5832782983779907
Epoch 0 	Batch 340 	Cost 1.5887298583984375
Epoch 0 	Batch 341 	Cost 1.5668541193008423
Epoch 0 	Batch 342 	Cost 1.531614899635315
Epoch 0 	Batch 343 	Cost 1.5900167226791382
Epoch 0 	Batch 344 	Cost 1.5610283613204956
Epoch 0 	Batch 345 	Cost 1.5789058208465576
Epoch 0 	Batch 346 	Cost 1.5671637058258057
Epoch 0 	Batch 347 	Cost 1.5632026195526123
Epoch 0 	Batch 348 	Cost 1.6021798849105835
Epoch 0 	Batch 349 	Cost 1.5652555227279663
Epoch 0 	Batch 350 	Cost 1.54506516456604
Epoch 0 	Batch 351 	Cost 1.5383975505828857
Epoch 0 	Batch 352 	Cost 1.579789400100708
Epoch 0 	Batch 353 	Cost 1.5152250528335571
Epoch 0 	Batch 354 	Cost 1.5603009462356567
Epoch 0 	Batch 355 	Cost 1.5275142192840576
Epoch 0 	Batch 356 	Cost 1.5474554300308228
Epoch 0 	Batch 357 	Cost 1.5431545972824097
Epoch 0 	Batch 358 	Cost 1.5573855638

Epoch 0 	Batch 403 	Cost 1.5938853025436401
Epoch 0 	Batch 404 	Cost 1.5888020992279053
Epoch 0 	Batch 405 	Cost 1.542858600616455
Epoch 0 	Batch 406 	Cost 1.5655113458633423
Epoch 0 	Batch 407 	Cost 1.5563280582427979
Epoch 0 	Batch 408 	Cost 1.5377039909362793
Epoch 0 	Batch 409 	Cost 1.5577260255813599
Epoch 0 	Batch 410 	Cost 1.5514883995056152
Epoch 0 	Batch 411 	Cost 1.5558668375015259
Epoch 0 	Batch 412 	Cost 1.5327022075653076
Epoch 0 	Batch 413 	Cost 1.6190662384033203
Epoch 0 	Batch 414 	Cost 1.5556529760360718
Epoch 0 	Batch 415 	Cost 1.555713415145874
Epoch 0 	Batch 416 	Cost 1.541774868965149
Epoch 0 	Batch 417 	Cost 1.5480871200561523
Epoch 0 	Batch 418 	Cost 1.5609493255615234
Epoch 0 	Batch 419 	Cost 1.574280858039856
Epoch 0 	Batch 420 	Cost 1.5379291772842407
Epoch 0 	Batch 421 	Cost 1.5587377548217773
Epoch 0 	Batch 422 	Cost 1.5441186428070068
Epoch 0 	Batch 423 	Cost 1.563842535018921
Epoch 0 	Batch 424 	Cost 1.530615210533142
Epoch 0 	Batch 425 	Cost 1.53708219528

Test our model on the test set: Run for 10K data points <br> 
kernal size: each kernel = filter --> what size kernal when sliding through data <br> 
larger kernal = down sampling <br> also related to stride (how kernal window slides across data) 

In [12]:
def test():
    print('Started evaluation...')
    cnn.eval() #put model into evaluation mode    #if we apply bathc norm, this would ne necesary, but not now 
    
    #calculate the accuracy of our model over the whole test set in batches
    correct = 0 # initial 
    for x, y in test_samples: #use dataloarder object 
        x, y = Variable(x), y
        h = cnn.forward(x)
        pred = h.data.max(1)[1] #get hypothesis and data --> take argument of the maximum probabilities 
        #obtain concrete prediction
        #h.data.max --> returns maximum value according to axis 1 for each row, and arg max is the second value --> [1] 
        correct += pred.eq(y).sum() #add # of correct
    return correct/len(test_data)

acc = test()
print('Test accuracy: ', acc)

Started evaluation...
Test accuracy:  0.9265
