
# Convolutional Neural Networks (CNNs)

Convolving은 이미지와 필터 사이에 내적을 하는 것을 의미합니다. 따라서 이미지의 각 위치에 대해 단일 숫자를 얻습니다.

In [2]:
'''
Create 10 images with shape (1, 28, 28).
Build 6 convolutional filters of size (3, 3) with stride set to 1 and padding set to 1.
Apply the filters in the image and print the shape of the feature map.

The arguments in Conv2d are in_channels, out_channels, kernel_size, stride, and padding.
The number of in_channels is 1, while the number of out_channels is 6.
'''

# Convolution operator - OOP way
import torch


# Create 10 random images of shape (1, 28, 28)
images = torch.rand(10, 1, 28, 28)

# Build 6 conv. filters
conv_filters = torch.nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=1, padding=1)

# Convolve the image with the filters 
output_feature = conv_filters(images)
print(output_feature.shape)

torch.Size([10, 6, 28, 28])


In [3]:
# Convolution operator - Functional way
import torch.nn.functional as F

# Create 10 random images
image = torch.rand(10, 1, 28, 28)

# Create 6 filters
filters = torch.rand(6, 1, 3, 3)

# Convolve the image with the filters
output_feature = F.conv2d(image, filters, stride=1, padding=1)
print(output_feature.shape)

torch.Size([10, 6, 28, 28])


## Pooling operators
- 컨볼루션은 이미지에서 특징을 추출하는 데 사용되는 반면 풀링은 특징 선택, 이미지에서 가장 지배적인 특징을 선택하거나 다른 특징을 결합하는 방법입니다. 또한 이미지의 해상도를 낮추어 계산을 보다 효율적으로 만듭니다.
- Max-pooling: simply takes the maximum number in regions of images. 
- Average-Pooling: it takes the average value.

In [None]:
# Build a pooling operator with size `2`.
max_pooling = torch.nn.MaxPool2d(2)

# Apply the pooling operator
output_feature = max_pooling(im)

# Use pooling operator in the image
output_feature_F = F.max_pool2d(im, 2)

# print the results of both cases
print(output_feature)
print(output_feature_F)

In [None]:
# Build a pooling operator with size `2`.
avg_pooling = torch.nn.AvgPool2d(2)

# Apply the pooling operator
output_feature = avg_pooling(im)

# Use pooling operator in the image
output_feature_F = F.avg_pool2d(im, 2)

# print the results of both cases
print(output_feature)
print(output_feature_F)

# Convolutional Neural Networks
- We first build a class called AlexNet which inherits from nn.Module. Then we start writing the __init__ method, where we pass the number of classes as an argument, in this case 1000. We call the superclass using the super operator, and then we start declaring all the parameters we want to have. In particular, we see that we have 5 convolutional layers, from conv_1 to conv_5, each with different number of filters. We got the numbers of filters from the paper. Then we want to have 3 pooling layers. They all have the same kernel_size and stride, so we define it only once. Similarly, we define once the ReLU nonlinearity. Finally, we have the three fully-connected layers, the last of which contains the number of classes.

## calcuate numbers
- Input Layer : All the input layer does is read the image. So, there are no parameters learn in here.
- Convolutional Layer : Consider a convolutional layer which takes “l” feature maps as the input and has “k” feature maps as output. The filter size is “n*m”.
Here the input has l=32 feature maps as inputs, k=64 feature maps as outputs and filter size is n=3 and m=3. It is important to understand, that we don’t simply have a 3*3 filter, but actually, we have 3*3*32 filter, as our input has 32 dimensions. And as an output from first conv layer, we learn 64 different 3*3*32 filters which total weights is “n*m*k*l”. Then there is a term called bias for each feature map. So, the total number of parameters are “(n*m*l+1)*k”.
- Pooling Layer: There are no parameters you could learn in pooling layer. This layer is just used to reduce the image dimension size.
- Fully-connected Layer: In this layer, all inputs units have a separable weight to each output unit. For “n” inputs and “m” outputs, the number of weights is “n*m”. Additionally, this layer has the bias for each output node, so “(n+1)*m” parameters.
- Output Layer: This layer is the fully connected layer, so “(n+1)m” parameters, when “n” is the number of inputs and “m” is the number of outputs.

## practice
- Your first CNN - __init__ method
You are going to build your first convolutional neural network. 
You're going to use the MNIST dataset as the dataset, which is made of handwritten digits from 0 to 9. 
- The convolutional neural network is going to have 2 convolutional layers, each followed by a ReLU nonlinearity, and a fully connected layer. 
We have already imported torch and torch.nn as nn. 
Remember that each pooling layer halves both the height and the width of the image, 
so by using 2 pooling layers, the height and width are 1/4 of the original sizes. MNIST images have shape (1, 28, 28)
- For the moment, you are going to implement the __init__ method of the net. In the next exercise, you will implement the .forward() method.
- NB: We need 2 pooling layers, but we only need to instantiate a pooling layer once, because each pooling layer will have the same configuration. Instead, we will use self.pool twice in the next exercise.
- [Ref](https://stackoverflow.com/questions/42786717/how-to-calculate-the-number-of-parameters-for-convolutional-neural-network)

In [5]:
'''
1. Instantiate two convolutional filters: the first one should have 5 channels, while the second one should have 10 channels. 
The kernel_size for both of them should be 3, and both should use padding=1. Use the names of the arguments (instead of using 1, use padding=1).
2. Instantiate a ReLU() nonlinearity.
3. Instantiate a max pooling layer which halves the size of the image in both directions.
4. Instantiate a fully connected layer which connects the units with the number of classes (we are using MNIST, so there are 10 classes).
'''
'''
Hint
Deduct the first size of the weights for the fully connected layers. Images start with shape (1, 28, 28) and two pooling operators (each halving the size of the image) are performed. What is the size of the image fed to the input layer (heigh * width * number_of_channels)?
In line 16, number_of_channels is the same as the number of channels in self.conv2.
MNIST images are black and white, so they contain one channel.
'''
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        
        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(7 * 7 * 10, 10)

In [None]:
# forward
'''
Apply the first convolutional layer, followed by the relu nonlinearity, then in the next line apply max-pooling layer.
Apply the second convolutional layer, followed by the relu nonlinearity, then in the next line apply max-pooling layer.
Transform the feature map from 4 dimensional to 2 dimensional space. The first dimension contains the batch size (-1), deduct the second dimension, by multiplying the values for height, width and depth.
Apply the fully-connected layer and return the result.
'''
class Net(nn.Module):
    def __init__(self, num_classes):
        super(Net, self).__init__()
		
        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(7 * 7 * 10, 10)

    def forward(self, x):

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv1(x))
        x = self.pool(x)

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv2(x))
        x = self.pool(x)

        # Prepare the image for the fully connected layer
        x = x.view(-1, 7*7*10)

        # Apply the fully connected layer and return the result
        return self.fc(x)

## Training Convolutional Neural Networks


In [None]:
for i, data in enumerate(train_loader, 0):
    inputs, labels = data
    optimizer.zero_grad()

    # Compute the forward pass
    outputs = net(inputs)
        
    # Compute the loss function
    loss = criterion(outputs, labels)
        
    # Compute the gradients
    loss.backward()
    
    # Update the weights
    optimizer.step()

In [None]:
# Iterate over the data in the test_loader
for i, data in enumerate(test_loader):
  
    # Get the image and label from data
    image, label = data
    
    # Make a forward pass in the net with your image
    output = net(image)
    
    # Argmax the results of the net
    _, predicted = torch.max(output.data, 1)
    if predicted == label:
        print("Yipes, your net made the right prediction " + str(predicted))
    else:
        print("Your net prediction was " + str(predicted) + ", but the correct label is: " + str(label))