# 4. Convolutional Neural Network 

## 1. About convoutional Neural Network

### Basic convolutional neural network
- additional **convolution** and **pooling** layers **before** feedforward neural network
- layer with **linear function and non-linearity** (the **hidden layer**) is also called  **fully connected layer**

### One convolutional layer (gray scale image): high level view
You select a filter or kernel. This is a sub matrix, usualy 3x3 or 5x5 (usually an odd number). That will be used to run the convolution. You can imagine that this filter is a spotlight that will be used to inspect the entire input matrix.

Whenever you apply your filter/kernel to the matrix at any given point you are essentially multiplying every point in the original matrix that falls in the area of the filter (**the receptive field**) and summing all of them together. Than you store this number in a new output matrix. This output matrix is called **the feature map**.

Once that is done, you move the filter to a new spot in the input matrix. How many elements you move from one spot to the other is called **stride** and it is usually 2 or 3.

The same multiplication and sum operations are repeated and the value is stored in the output matrix at the adjecent point of the first output.


You can have kernels of multiple sizes. And for every kernel you can "inspect" an entire image and output a different feature map. Indeed as many feature maps as kernels you want to apply

#### What happens when we apply the filter?
The filter usually describes a shape, an edge, a relationship, a line, a dot, something. Usually in the form of values between 0 and 1 [or -1 and 1]. Because of that. The filter will respond the maximum when the input is exactly the same as the the filter. 

If the element in the input matrix correspond to a 1 in the same element of the filter. The output will be the same as the value in the input matrix. If it corresponds to a 0 it does not affects the output value. if it is a -1 it penalizes the value in the output matrix.

### Three convolutional layer (color image)
The only difference is that your kernel must have the same depth as the input tensor.

### Multiple convolutional layers
Progressive convolutions gradually abstracts the what from where. 
After you finish passing through a series of convolutional layers. you connect it to a fully connected layer that will finally output a decision regarding the input (softmax).

### Pooling and down sampling
The pooling layer follows the convolution layer. The goal in using a pooling layer is to down sample the convolutional layer (and reduce the size of its representation).

The most common types of pooling are
- max pooling
- average pooling

In pooling, as in convolution, you also have a kernel (usually 2x2) that you will use to inspect the whole featuremap outputed from the previous convolution operation. At any inspection point, you are going to see all the elements that falls into the pooling kernel and get the maximum value or the average, depending on the kind of pooling you are doing. And store it in the output matrix at a correspondent element.

### Padding
It affects whether the output of the pooling has the same size or smaller than the input feature map.
formula for the size of the matrix after convolution
$$O = \frac{W-K+2p}{S}+1$$
Where:
- O : output height/length
- W : input height / length
- K : filter size (kernel size)
- P : same padding (non-zero)
   - P = $\frac{K-1}{2} = \frac{5-1}{2} = 2$
- S : stride

**Zero padding** is also called **Valid padding**. The output matrix will be smaller than the input feature mab by the size of the pooling kernel rounded to the lowest even number. example: if you have a input feature map of 7x7 and a pooling kernel of 3x3 the output matrix will be 7-2 = 5x5

**Same padding**. This is when you want the output matrix to have the same size as the input feature map. To do this you must add elements to the border of the matrix. in each border you must add the size of the kernel rounded to the lowest closest even number divided by 2. Example: if you pooling kernel is 7x7, the lowest even number is 6, divided by 2 is 3, so in every border you should add 3 elements.
$$ Padding = \frac{Kernel -1}{1} $$

The padding can be composed of just zeros or copy of the elements close to the corner or the mirror image of the pixels close to the corner.
## Building a convolutional neural network with pytorch
### Model A:
- 2 convoluntional Layers
   - Same padding (same output size)
- 2 Max pooling layers
- 1 fully connected layer

### Steps
1. Load Dataset
2. Make Dataset iterable
3. Make model class
4. Instantiate the model
5. Instantiate the loss class
6. Instantiate the optimizer
7. Train
8. Measure accuracy
9. Save the model

In [60]:
## Building a convolutional neural network with pytorch
"""
0. Import libraries
"""
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [87]:
"""
1. Load Dataset
"""

train_dataset = dsets.MNIST(root = './data',
                           train = True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform=transforms.ToTensor())

In [82]:
print(train_dataset.train_data.size())
print(train_dataset.train_labels.size())
print(test_dataset.test_data.size())
print(test_dataset.test_labels.size())

torch.Size([60000, 28, 28])
torch.Size([60000])
torch.Size([10000, 28, 28])
torch.Size([10000])


In [97]:
"""
2. Make the dataset iterable

"""
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader =  torch.utils.data.DataLoader(dataset=train_dataset,
                                     batch_size = batch_size,
                                     shuffle = True)

test_loader =  torch.utils.data.DataLoader(dataset=test_dataset,
                                    batch_size=batch_size,
                                    shuffle = False)

In [98]:
num_epochs

5

In [99]:
"""
3. Create a Model
"""

class CNN_ModelA(nn.Module):
    def __init__(self):
        super(CNN_ModelA, self).__init__()
        # Because in CNNs it is more complicated to define the input and output dimension
        # the input and output dimensions are going to be inserted manually
        
        # 1st Convolution
        self.cnn1 = nn.Conv2d(in_channels = 1,   # channels are the number of layers 1- gray 3- color
                              out_channels = 16, # number of layers in the feature maps, 16 features
                              kernel_size=5,     # size of the filter
                              stride=1,          # value by which the filter will walk
                              padding=2)         # padding to prevent the featuremap to be smaller.
        
        
        self.relu1 = nn.ReLU()
        
        # 1st Pooling: Max pooling
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
        
        # 2nd Convolution
        self.cnn2 = nn.Conv2d(in_channels = 16, 
                              out_channels = 32, 
                              kernel_size=5, 
                              stride=1, 
                              padding=2)
        
        self.relu2 = nn.ReLU()
        
        # 2nd Pooling: Max pooling
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32*7*7,10) # 7 is because the original image has 28 x 28 pixels and it passes by 2 max poolings
                                        # with kernels with size 2. So 28 / (2*2) = 7
                                        # 32 is the number of featuremaps as inputs
                                        # 10 is the number of possible classes

    def forward(self,x): 
        # Conv Layer 1
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.maxpool1(out)
        
        # Conv layer 2
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.maxpool2(out)
        
        # Resize
        # To fit into the linear function you need to flatten the 3D tensor of feature maps into a 1D tensor
        # Original size : (100,32,7,7)
        # out.size(0) = 100
        # new outsize = (100,32*7*7)
        out = out.view(out.size(0),-1)
        
        # Linear function(readout)
        out = self.fc1(out)
        
        return out

In [100]:
"""
4. Instantiate the model class
"""

model = CNN_ModelA()
if torch.cuda.is_available():
    model.cuda()
    

In [101]:
"""
5. Instantiate the Loss class
"""
# Convolution Neural Network: Cross Entropy Loss
    # Feedforward Neural Network : Cross entropy Loss
    # Logistic Regression : Cross entropy loss
    # Linear Regression: MSE
    
criterion = nn.CrossEntropyLoss()

## 6. Instantiate Optimizer Class
- simplified equation
    - $\theta = \theta- \alpha . \delta_\theta$ 
    - where
        - $\theta$ : parameters (our variables)
        - $\alpha$ : learning rate (our variables)
        - $\delta_\theta$ : parameters gradient
    - even simplier equation
        - parameters = parameters - learning_rate * parameters_gradiente
        - at every iteration, we update our models parametes

In [102]:
"""
6. Instantiate the optimizer class
"""
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate)

In [103]:
# What is inside the model
print(model.parameters())
print(len(list(model.parameters())))
print(list(model.parameters())[0].size()) # Conv1 : 16 kernels
print(list(model.parameters())[1].size()) # Conv1 Bias: 16 kernels

print(list(model.parameters())[2].size()) # Conv2 Bias: 32 Kernels with depth 16
print(list(model.parameters())[3].size()) # Conv2 Bias: 32 kernels

print(list(model.parameters())[4].size()) # Fully connected layer 1
print(list(model.parameters())[5].size()) # Fully connected bias



<generator object Module.parameters at 0x000002226554C990>
6
torch.Size([16, 1, 5, 5])
torch.Size([16])
torch.Size([32, 16, 5, 5])
torch.Size([32])
torch.Size([10, 1568])
torch.Size([10])


## 7. Train the Model
- Process
    1. Convert inputs/ labels to variables
        - CNN input: (1,28,28) - the only difference from FFNN is that a CNN can receive a 2D tensor as input
        - Feedforward NN Input: (1,28*28)
    2. Clear gradient buffers
    3. Get output given inputs
    4. Get loss
    5. Get Gradients with respect to parameters
    6. Update parameters using gradients
        -  parameters = parameters - learning_rate * parameters_gradiente
    7. REPEAT

In [107]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs,labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        # Accuracy test
        if iter % 500 == 0:
            correct = 0
            total = 0
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                
                total += labels.size(0)
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration: {}. Loss:{}. Accuracy: {}'.format(iter,loss.data, accuracy))

Iteration: 500. Loss:0.04247228056192398. Accuracy: 98
Iteration: 1000. Loss:0.12507490813732147. Accuracy: 98
Iteration: 1500. Loss:0.10888931155204773. Accuracy: 98
Iteration: 2000. Loss:0.025883041322231293. Accuracy: 98
Iteration: 2500. Loss:0.053134769201278687. Accuracy: 98
Iteration: 3000. Loss:0.021015744656324387. Accuracy: 98


#  CNN Model B
- using average pooling

In [109]:
## Building a convolutional neural network with pytorch
"""
0. Import libraries
"""
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
"""
1. Load Dataset
"""

train_dataset = dsets.MNIST(root = './data',
                           train = True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform=transforms.ToTensor())

"""
2. Make the dataset iterable

"""
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader =  torch.utils.data.DataLoader(dataset=train_dataset,
                                     batch_size = batch_size,
                                     shuffle = True)

test_loader =  torch.utils.data.DataLoader(dataset=test_dataset,
                                    batch_size=batch_size,
                                    shuffle = False)

"""
3. Create a Model
"""

class CNN_ModelB(nn.Module):
    def __init__(self):
        super(CNN_ModelB, self).__init__()
        # Because in CNNs it is more complicated to define the input and output dimension
        # the input and output dimensions are going to be inserted manually
        
        # 1st Convolution
        self.cnn1 = nn.Conv2d(in_channels = 1,   # channels are the number of layers 1- gray 3- color
                              out_channels = 16, # number of layers in the feature maps, 16 features
                              kernel_size=5,     # size of the filter
                              stride=1,          # value by which the filter will walk
                              padding=2)         # padding to prevent the featuremap to be smaller.
        
        
        self.relu1 = nn.ReLU()
        
        # 1st Pooling: Average pooling
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)
        
        # 2nd Convolution
        self.cnn2 = nn.Conv2d(in_channels = 16, 
                              out_channels = 32, 
                              kernel_size=5, 
                              stride=1, 
                              padding=2)
        
        self.relu2 = nn.ReLU()
        
        # 2nd Pooling: Average pooling
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32*7*7,10) # 7 is because the original image has 28 x 28 pixels and it passes by 2 max poolings
                                        # with kernels with size 2. So 28 / (2*2) = 7
                                        # 32 is the number of featuremaps as inputs
                                        # 10 is the number of possible classes

    def forward(self,x): 
        # Conv Layer 1
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.avgpool1(out)
        
        # Conv layer 2
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.avgpool2(out)
        
        # Resize
        # To fit into the linear function you need to flatten the 3D tensor of feature maps into a 1D tensor
        # Original size : (100,32,7,7)
        # out.size(0) = 100
        # new outsize = (100,32*7*7)
        out = out.view(out.size(0),-1)
        
        # Linear function(readout)
        out = self.fc1(out)
        
        return out
    
"""
4. Instantiate the model class
"""

model = CNN_ModelB()
if torch.cuda.is_available():
    model.cuda()

"""
5. Instantiate the Loss class
"""
# Convolution Neural Network: Cross Entropy Loss
    # Feedforward Neural Network : Cross entropy Loss
    # Logistic Regression : Cross entropy loss
    # Linear Regression: MSE
    
criterion = nn.CrossEntropyLoss()

"""
6. Instantiate the optimizer class
"""
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate)


"""
7. TRain the model
"""
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs,labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        # Accuracy test
        if iter % 500 == 0:
            correct = 0
            total = 0
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                
                total += labels.size(0)
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration: {}. Loss:{}. Accuracy: {}'.format(iter,loss.data, accuracy))


Iteration: 500. Loss:0.6524097919464111. Accuracy: 86
Iteration: 1000. Loss:0.6030823588371277. Accuracy: 89
Iteration: 1500. Loss:0.2930809259414673. Accuracy: 89
Iteration: 2000. Loss:0.29460909962654114. Accuracy: 91
Iteration: 2500. Loss:0.28996703028678894. Accuracy: 92
Iteration: 3000. Loss:0.2284603714942932. Accuracy: 93


#  CNN Model C
- uses max pooling
- uses valid padding

In [114]:
## Building a convolutional neural network with pytorch
"""
0. Import libraries
"""
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
"""
1. Load Dataset
"""

train_dataset = dsets.MNIST(root = './data',
                           train = True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform=transforms.ToTensor())

"""
2. Make the dataset iterable

"""
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader =  torch.utils.data.DataLoader(dataset=train_dataset,
                                     batch_size = batch_size,
                                     shuffle = True)

test_loader =  torch.utils.data.DataLoader(dataset=test_dataset,
                                    batch_size=batch_size,
                                    shuffle = False)

"""
3. Create a Model
"""

class CNN_ModelC(nn.Module):
    def __init__(self):
        super(CNN_ModelC, self).__init__()
        # Because in CNNs it is more complicated to define the input and output dimension
        # the input and output dimensions are going to be inserted manually
        
        # 1st Convolution
        self.cnn1 = nn.Conv2d(in_channels = 1,   # channels are the number of layers 1- gray 3- color
                              out_channels = 16, # number of layers in the feature maps, 16 features
                              kernel_size=5,     # size of the filter
                              stride=1,          # value by which the filter will walk
                              padding=0)         # padding to prevent the featuremap to be smaller.
        
        
        self.relu1 = nn.ReLU()
        
        # 1st Pooling: Average pooling
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)
        
        # 2nd Convolution
        self.cnn2 = nn.Conv2d(in_channels = 16, 
                              out_channels = 32, 
                              kernel_size=5, 
                              stride=1, 
                              padding=0)
        
        self.relu2 = nn.ReLU()
        
        # 2nd Pooling: Average pooling
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32*4*4,10) # 7 is because the original image has 28 x 28 pixels and it passes by 2 max poolings
                                        # with kernels with size 2. So 28 / (2*2) = 7
                                        # 32 is the number of featuremaps as inputs
                                        # 10 is the number of possible classes

    def forward(self,x): 
        # Conv Layer 1
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.avgpool1(out)
        
        # Conv layer 2
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.avgpool2(out)
        
        # Resize
        # To fit into the linear function you need to flatten the 3D tensor of feature maps into a 1D tensor
        # Original size : (100,32,7,7)
        # out.size(0) = 100
        # new outsize = (100,32*7*7)
        out = out.view(out.size(0),-1)
        
        # Linear function(readout)
        out = self.fc1(out)
        
        return out
    
"""
4. Instantiate the model class
"""

model = CNN_ModelC()
if torch.cuda.is_available():
    model.cuda()

"""
5. Instantiate the Loss class
"""
# Convolution Neural Network: Cross Entropy Loss
    # Feedforward Neural Network : Cross entropy Loss
    # Logistic Regression : Cross entropy loss
    # Linear Regression: MSE
    
criterion = nn.CrossEntropyLoss()

"""
6. Instantiate the optimizer class
"""
learning_rate = 0.05
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate)


"""
7. TRain the model
"""
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs,labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        # Accuracy test
        if iter % 500 == 0:
            correct = 0
            total = 0
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                
                total += labels.size(0)
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration: {}. Loss:{}. Accuracy: {}'.format(iter,loss.data, accuracy))


Iteration: 500. Loss:0.15751191973686218. Accuracy: 92
Iteration: 1000. Loss:0.11110392212867737. Accuracy: 95
Iteration: 1500. Loss:0.2289888560771942. Accuracy: 96
Iteration: 2000. Loss:0.1407839059829712. Accuracy: 96
Iteration: 2500. Loss:0.0230003260076046. Accuracy: 97
Iteration: 3000. Loss:0.03644980862736702. Accuracy: 97


# Expanding CNNs
There are 3 ways to expand a convolutional neural network
1. More convolutional layers
2. Less agressive downsampling
    - smaller kernel size for pooling (gradually downsampling)
3. More fully connected layers

Cons:
1. It needs a larger dataset
    - Curse of dimensionality
2. Does not necessarily mean higher accuracy

# Summary
Transition from **feedforward neural networ**
- addition of convolutiona+relu+pooling layers before the linear layers
- one convolutional layer basics
- One pooling layer basics
    - Max pooling
    - Average pooling
- Padding
- **Output Dimension** Calculations and Examples
    - $ O = \frac{W.K+2P}{S}+1$
- Convolutional Neural Networks
    - Model A: 
        - 2 Conv + 2 Max pool +1 FC
        - Same Padding
    - Model B: 
        - 2 Conv + 2 Average pool +1 FC
        - Same Padding
    - Model C: 
        - 2 Conv + 2 Max pool +1 FC
        - Valid Padding
- Model variation in Code
    - Modify only Step 3 in the model
    - Special attention to the size of the input to the linear layer
- Ways to Expand Models capacity
    - More convolutions
    - Gradual pooling
    - More fully connected layers

In [115]:
?nn.BCELoss

In [116]:
?nn.Sequential