# Project 4 (Part 3): Kernels

"Civilian, what's your twenty on Operation Heroic Storm?" asks Colonel Trick.

"I thought it was called Operation Heroic Deliverance," you say.

"Escalation, civilian," says Colonel Trick. "Escalation."

"And also, 'what's your twenty' is a miliary thing? I thought it was more a police thing."

"Just testing you, civilian."

"Or a Call of Duty thing."

"What's your status, civilian?" asks Colonel Trick.

"Well," you say proudly. "We've gotten both the data and the basic neural network training infrastructure set up, and..."

"Neural networks?" asks Trick dubiously. "I'd heard from you lot that support vector machines were the way to go."

"Well, sir," you say. "That used to be the case, but people have been getting more success from neural networks these days."

"You scientists," mutters Colonel Trick. "Always vacillating."

Back at Dulles International Airport, waiting again for your flight home, CNN is playing on all the monitors. The top story is the emergence of a cult of humans that have joined the zebras in their international misadventures.

![anderson cooper](./img/cooper2.png)

This inspires you to start coding up your convolutional neural network models.

You consult your notes on Computing Convolutions. The first step to building a convolutional layer is to take the kernels and convert them into a **kernel-row matrix**. For instance, suppose you have two 2x2 RGB (three channel) kernels:

In [None]:
import torch

red_kernel1 = torch.tensor([[1., 2], [3, 4]])
green_kernel1 = torch.tensor([[5., 6], [7, 8]])
blue_kernel1 = torch.tensor([[9., 10], [11, 12]])
kernel1 = torch.stack([red_kernel1, green_kernel1, blue_kernel1])
print("THE FIRST KERNEL (with shape {}):".format(kernel1.shape))
print(kernel1)

red_kernel2 = torch.tensor([[13., 14], [15, 16]])
green_kernel2 = torch.tensor([[17., 18], [19, 20]])
blue_kernel2 = torch.tensor([[21., 22], [23, 24]])
kernel2 = torch.stack([red_kernel2, green_kernel2, blue_kernel2])
print("\nTHE SECOND KERNEL (with shape {}):".format(kernel2.shape))
print(kernel2)


kernels = torch.stack([kernel1, kernel2])
print("\nTHE KERNEL TENSOR (SHAPE {}):".format(kernels.shape))
print(kernels)

A kernel-row matrix converts each kernel into one row of a matrix, e.g.:

In [None]:
kernel_row_matrix = torch.tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
                                  [13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24.]])
print("\nTHE KERNEL-ROW MATRIX (SHAPE {}):".format(kernel_row_matrix.shape))
print(kernel_row_matrix)


### Question 8

Complete the implementation of ```create_kernel_row_matrix``` in ```cnn.py```.

We've provided a unit test in test.py, so that you can (a) understand the expected behavior  and (b) check your implementation is working properly. Run it from the command line as follows:

    python -m unittest test.CnnTests.test_kernel_row

The next task is to take each window of a batch of input images and convert them into columns of a **window-column matrix**. For instance, suppose you have two 3x3 RGB (three channel) images:

In [None]:
red1 = torch.tensor([[1., 2, 3], [4,5,6], [7,8,9]])
green1 = torch.tensor([[10., 11, 12], [13,14,15], [16,17,18]])
blue1 = torch.tensor([[19.,20,21], [22,23,24], [25,26,27]])
rgb1 = torch.stack([red1, green1, blue1])
print("THE FIRST IMAGE (with shape {}):".format(rgb1.shape))
print(rgb1)


red2 = torch.tensor([[28., 29, 30], [31,32,33], [34,35,36]])
green2 = torch.tensor([[37., 38, 39], [40,41,42], [43,44,45]])
blue2 = torch.tensor([[46.,47,48], [49,50,51], [52,53,54]])
rgb2 = torch.stack([red2, green2, blue2])
print("THE SECOND IMAGE (with shape {}):".format(rgb2.shape))
print(rgb2)

images = torch.stack([rgb1, rgb2])
print("\nTHE IMAGE TENSOR (SHAPE {}):".format(images.shape))
print(images)

A window-column matrix converts each window of each image into a column of a matrix:

In [None]:
window_column_matrix = torch.tensor([   [ 1.,  2.,  4.,  5., 28., 29., 31., 32.],
                                        [ 2.,  3.,  5.,  6., 29., 30., 32., 33.],
                                        [ 4.,  5.,  7.,  8., 31., 32., 34., 35.],
                                        [ 5.,  6.,  8.,  9., 32., 33., 35., 36.],
                                        [10., 11., 13., 14., 37., 38., 40., 41.],
                                        [11., 12., 14., 15., 38., 39., 41., 42.],
                                        [13., 14., 16., 17., 40., 41., 43., 44.],
                                        [14., 15., 17., 18., 41., 42., 44., 45.],
                                        [19., 20., 22., 23., 46., 47., 49., 50.],
                                        [20., 21., 23., 24., 47., 48., 50., 51.],
                                        [22., 23., 25., 26., 49., 50., 52., 53.],
                                        [23., 24., 26., 27., 50., 51., 53., 54.]])
print("THE WINDOW_COLUMN MATRIX (SHAPE {}):".format(window_column_matrix.shape))
print(window_column_matrix)

Each column in the above matrix corresponds to a 3x2x2 window of an image (the first 4 columns are the windows of image 1, the second 4 columns are the windows of image 2). Each window is **3**x2x2, because there are 3 (RGB) channels. 

### Question 9

Complete the implementation of ```create_window_column_matrix``` in ```cnn.py```.

We've provided two unit tests in test.py, so that you can (a) understand the expected behavior  and (b) check your implementation is working properly. Run them from the command line as follows:

    python -m unittest test.CnnTests.test_window_column1

and

    python -m unittest test.CnnTests.test_window_column2


A third thing you'll need is a function to add a padding of zeroes to each image in a batch. For instance, given the images defined above:

In [None]:
print("\nTHE IMAGE TENSOR (SHAPE {}):".format(images.shape))
print(images)

A padding of 2 would result in each image having a border of two zeros in each direction, e.g.

In [None]:
from torch import tensor

t = tensor([[[[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  1.,  2.,  3.,  0.,  0.],
              [ 0.,  0.,  4.,  5.,  6.,  0.,  0.],
              [ 0.,  0.,  7.,  8.,  9.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]],

             [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0., 10., 11., 12.,  0.,  0.],
              [ 0.,  0., 13., 14., 15.,  0.,  0.],
              [ 0.,  0., 16., 17., 18.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]],

             [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0., 19., 20., 21.,  0.,  0.],
              [ 0.,  0., 22., 23., 24.,  0.,  0.],
              [ 0.,  0., 25., 26., 27.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]]],


            [[[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0., 28., 29., 30.,  0.,  0.],
              [ 0.,  0., 31., 32., 33.,  0.,  0.],
              [ 0.,  0., 34., 35., 36.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]],

             [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0., 37., 38., 39.,  0.,  0.],
              [ 0.,  0., 40., 41., 42.,  0.,  0.],
              [ 0.,  0., 43., 44., 45.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]],

             [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0., 46., 47., 48.,  0.,  0.],
              [ 0.,  0., 49., 50., 51.,  0.,  0.],
              [ 0.,  0., 52., 53., 54.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
              [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]]]])

print("\nTHE PADDED IMAGE TENSOR (SHAPE {}):".format(t.shape))
print(t)

### Question 10

Complete the implementation of ```pad``` in ```cnn.py```.

We've provided a unit test in test.py, so that you can (a) understand the expected behavior  and (b) check your implementation is working properly. Run it from the command line as follows:

    python -m unittest test.CnnTests.test_pad

With these three functions working, you can go ahead and implement a function to convolve a set of kernels with a batch of images, using the techniques described in your Computing Convolutions notes. Convolving our two example kernels with our two 3x3 example images (with stride 1 and no padding), should yield the following:

In [None]:
# Should be: result = convolve(kernels, images, stride=1, padding=0)
result = tensor([[[   [1245., 1323.],
                      [1479., 1557.]],

                     [[2973., 3195.],
                      [3639., 3861.]]],


                    [[[3351., 3429.],
                      [3585., 3663.]],

                     [[8967., 9189.],
                      [9633., 9855.]]]])

print("\nTHE RESULT OF THE CONVOLUTION (SHAPE {}):".format(result.shape))
print(result)

### Question 11

Complete the implementation of ```convolve``` in ```cnn.py```.

We've provided a unit test in test.py, so that you can (a) understand the expected behavior  and (b) check your implementation is working properly. Run it from the command line as follows:

    python -m unittest test.CnnTests.test_conv
    
Once convolve is operational, the ```torch.nn.Module``` called ```ConvLayer``` in ```cnn.py``` should work:

In [1]:
from cnn import ConvLayer
from test import construct_test_images2

conv = ConvLayer(input_channels=3, num_kernels=1, 
                 kernel_size=3,
                 stride=1, 
                 padding=1)

images = construct_test_images2()
# print(images)
print("INPUT IMAGE TENSOR (SHAPE {}):".format(images.shape))
convolved = conv.forward(images)



print("\nRESULT OF CONVOLUTION (SHAPE {}):".format(convolved.shape))
print(convolved)

INPUT IMAGE TENSOR (SHAPE torch.Size([2, 3, 4, 4])):

RESULT OF CONVOLUTION (SHAPE torch.Size([2, 1, 4, 4])):
tensor([[[[-1.0724,  0.2315, -1.4380,  1.0853],
          [-0.6878, -0.8480, -1.3513,  0.4508],
          [ 0.8623,  0.0679,  0.5354,  1.7838],
          [-0.1533, -0.5266, -0.6946,  0.3432]]],


        [[[-3.3885,  1.3807,  0.4243,  2.0232],
          [-2.1855, -0.5170, -0.6447,  0.9560],
          [-0.0981, -0.3739,  1.3703,  3.5306],
          [-0.7753, -1.3160, -0.3610,  0.7642]]]], grad_fn=<AddBackward0>)



You should understand the structure of these tensors. The input tensor ```images``` has shape ```(2,3,4,4)``` which means that the batch has two 4x4 images, each with 3 channels (so they're RGB images). The output tensor ```convolved``` has shape ```(2,1,4,4)```, because the 1 kernel has produced a 4x4 matrix for each of the two batch images.

We can now run the result through a ReLU layer (since we made ReLU layers last time).

In [2]:
from training import ReLU
relu = ReLU()
result = relu.forward(convolved)
print("\nRESULT OF RELU (SHAPE {}):".format(result.shape))
print(result)


RESULT OF RELU (SHAPE torch.Size([2, 1, 4, 4])):
tensor([[[[0.0000, 0.2315, 0.0000, 1.0853],
          [0.0000, 0.0000, 0.0000, 0.4508],
          [0.8623, 0.0679, 0.5354, 1.7838],
          [0.0000, 0.0000, 0.0000, 0.3432]]],


        [[[0.0000, 1.3807, 0.4243, 2.0232],
          [0.0000, 0.0000, 0.0000, 0.9560],
          [0.0000, 0.0000, 1.3703, 3.5306],
          [0.0000, 0.0000, 0.0000, 0.7642]]]], grad_fn=<ClampBackward>)


Note that the result of applying ReLU has the same shape, because ReLU simply applies the activation function elementwise.

If we want to feed these new evidence variables as input to a standard feedforward layer (i.e. ```Dense``` from last time), the problem is that the tensors have order 4, whereas ```Dense``` expects tensors of order 2, i.e. the shape should be ```(batch_size, num_evidence_vars)```.

Right now, the result of ReLU has shape ```(batch_size, num_kernels, image_width, image_width)```. So we need to flatten that tensor into the right shape, because at this point, all the evidence variables per batch can be treated identically, so we don't need the additional structure.

### Question 12

Complete the implementation of the ```Flatten``` module in ```cnn.py```.

We've provided a unit test in test.py, so that you can (a) understand the expected behavior  and (b) check your implementation is working properly. Run it from the command line as follows:

    python -m unittest test.CnnTests.test_flatten


With this infrastructure in place, you proceed to create a convolutional neural network with two convolutional layers and two feedforward layers. Note that you've made it general enough to handle both grayscale images (with 1 input channel) and RGB images (with 3 input channels).

For the convolutional layers, you set the padding and stride so that the size of the input and output images are the same.

Unfortunately, you ran out of the time at the airport to implement a maxpool layer, but luckily the folks at Torch have an implementation (torch.nn.MaxPool2d) that you can leverage.

In [None]:
import torch
from training import ReLU, Dense
from torch.nn import MaxPool2d, Conv2d
from cnn import ConvLayer, Flatten
from cnn import Classifier

def create_cnn(num_kernels, kernel_size, 
               output_classes, dense_hidden_size,
               image_width, is_grayscale=True,
               use_torch_conv_layer = True,
               use_maxpool=True):
    
    if use_torch_conv_layer:
        Conv = Conv2d
    else:
        Conv = ConvLayer    
    padding = kernel_size//2
    output_width = image_width
    if use_maxpool:
        output_width = output_width // 16
    model = torch.nn.Sequential()
    if is_grayscale:
        num_input_channels = 1
    else:
        num_input_channels = 3
    model.add_module("conv1", Conv(num_input_channels, num_kernels,
                                   kernel_size=kernel_size, 
                                   stride=1, padding=padding))
    model.add_module("relu1", ReLU())
    if use_maxpool:
        model.add_module("pool1", MaxPool2d(kernel_size=4, stride=4, padding=0))
    model.add_module("conv2", Conv(num_kernels, num_kernels,
                                              kernel_size=kernel_size, 
                                              stride=1, padding=padding))
    model.add_module("relu2", ReLU())
    if use_maxpool:
        model.add_module("pool2", MaxPool2d(kernel_size=4, stride=4, padding=0))
    model.add_module("flatten", Flatten())
    model.add_module("dense1", Dense(num_kernels * output_width**2, 
                                     dense_hidden_size, 
                                     init_bound = 0.1632993161855452))
    model.add_module("relu3", ReLU())
    model.add_module("dense2", Dense(dense_hidden_size, output_classes, 
                                     init_bound = 0.2992528008322899))
    return model

Everything is now in place to train your CNN! Before trying it on zebras, you first try it on a simpler data set where a "positive" example looks like this:

![positive](./img/positive.png)

and a "negative" example looks like this:

![negative](./img/negative.png)

In other words, positive examples contain a three pixel diagonal line going downward right, while negative examples contain a three pixel diagonal line going downward left.

In [None]:
from datamanager import DataPartition, DataManager
from training import nlog_softmax_loss, minibatch_training
from torch import optim

def run(data_config, n_epochs, num_kernels, 
        kernel_size, dense_hidden_size, 
        use_maxpool, use_torch_conv_layer):    
    """
    Runs a training regime for a CNN.
    
    """
    train_set = DataPartition(data_config, './data', 'train')
    test_set = DataPartition(data_config, './data', 'test')
    manager = DataManager(train_set, test_set)
    loss = nlog_softmax_loss
    learning_rate = .001
    image_width = 64
    net = create_cnn(num_kernels = num_kernels, kernel_size= kernel_size, 
                                 output_classes=2, image_width=image_width,
                                 dense_hidden_size=dense_hidden_size,
                                 use_maxpool = use_maxpool,
                                 use_torch_conv_layer = use_torch_conv_layer)
    optimizer = optim.Adam(net.parameters(), lr=learning_rate)  
    best_net, monitor = minibatch_training(net, manager, 
                                           batch_size=32, n_epochs=n_epochs, 
                                           optimizer=optimizer, loss=loss)
    classifier = Classifier(best_net, num_kernels, kernel_size, 
                            dense_hidden_size, manager.categories, image_width)
    return classifier, monitor




def experiment1():
    return run('stripes.data.json', 
               n_epochs = 20,
               num_kernels = 20, 
               kernel_size = 3, 
               dense_hidden_size = 64,
               use_maxpool = True,
               use_torch_conv_layer = True)

classifier, _ = experiment1()

You should see it reach 100% test accuracy, which means it's time to identify some zebras! Though the training may be a little bit slow.

Luckily, while you were at the airport, you sent your code to your UNCOOL team. During your flight, they optimized it and also got it accepted into the torch.nn package as torch.nn.Conv2d. You try out their implementation.

In [None]:
def experiment2():
    return run('stripes.data.json', 
               n_epochs = 20,
               num_kernels = 20, 
               kernel_size = 3, 
               dense_hidden_size = 64,
               use_maxpool = True,
               use_torch_conv_layer = True)

classifier, _ = experiment2()

Pretty fast! Fast enough for more in-depth experimentation. You figure it's worth evaluating the importance of the maxpool layers. Luckily, you've set up your code so that you can train without them. 

In [None]:
def experiment3():
    return run('stripes.data.json', 
               n_epochs = 20,
               num_kernels = 20, 
               kernel_size= 3, 
               dense_hidden_size=64, 
               use_maxpool=False,
               use_torch_conv_layer = True)

classifier, _ = experiment3()

It probably doesn't do exceptionally well, because it's harder for the network to generalize about the diagonal patterns without the maxpooling. It's also slower, because without the maxpooling, the input to the dense layers is much larger.

Lesson learned: maxpool is maxcool.

To identify zebras instead of random stripe patterns, all you need to do is swap out the stripes data for the zebra data. Also, given the complexity of the task, you increase the kernel size from 3 to 7. Each epoch should take around 20 seconds to train.

In [None]:
def experiment4():
    return run('zebra.data.json', 
               n_epochs = 8,
               num_kernels = 20, 
               kernel_size = 7, 
               dense_hidden_size = 64,
               use_maxpool = True,
               use_torch_conv_layer = True)

classifier, _ = experiment4()

Hopefully you're seeing performance in the low to mid nineties, which is a new state-of-the-art in zebra recognition. The generals will surely be pleased.

As a final step, your team has created an application called ZebraShop that allows you to run your zebra detector interactively. First, save your classifier from ```experiment3```:

In [None]:
classifier.save('zc')

This will create a file called ```zc.json``` that stores your classifier. You can then run the GUI by typing the following from the home directory of this project:

    python zebrashop.py zc.json

Try out different images to see where things go right and wrong. In particular, you may want to try out some giraffes. Why do you think they get classified incorrectly? What else gets classified incorrectly?

What could you do to improve the system?

Feel free to try and make improvements as part of this project submission. If you do something you think is cool, let me know by describing it in the README and/or the pull request message.

**Project 4 is now complete.**