## Convolutional NN
- additional convolution and pooling layers before FNN
![title](resource/cnn.png)

![title](resource/cnn_one.png)
- 28*28 gray scale image -> 1 Conv layer


![title](resource/kernel.png)

- kernel is sliding/convolving across the image -> 2 ops per patch
     - element-wise multiplication + summation
- more kernels = more feature map channels
: can capture more information about the input


- pooling layer
    - max pooling
    - average pooling
    
![title](resource/max_pooling.png)

### padding
- valid padding = zero padding
    - output size < input size
![title](resource/padding.png)

- same padding
    - output size = input size
![title](resource/padding2.png)

### 1.8 Dimension Calculations
- $ O = \frac {W - K + 2P}{S} + 1$
    - $O$: output height/length
    - $W$: input height/length
    - $K$: filter size (kernel size)
    - $P$: padding
        - $ P = \frac{K - 1}{2} $
    - $S$: stride

#### Example 1: Output Dimension Calculation for Valid Padding

- $W = 4$
- $K = 3$
- $P = 0$
- $S = 1$
- $O = \frac {4 - 3 + 2*0}{1} + 1 = \frac {1}{1} + 1 = 1 + 1 = 2 $

-> feature map 2 x 2

#### Example 2: Output Dimension Calculation for Same Padding
- $W = 5$
- $K = 3$
- $P = \frac{3 - 1}{2} = \frac{2}{2} = 1 $
- $S = 1 $
- $O = \frac {5 - 3 + 2*1}{1} + 1 = \frac {4}{1} + 1 = 5$

-> feature map 5 x 5

## Build a convoluationNN
- model A:
    - 2 convolutional layers (same padding)
    - 2 max pooling layers
    - 1 fully connected layer
![title](resource/cnn_1.png)

In [1]:
import datetime
import sys
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=False)
test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

In [3]:
train_dataset.train_data.size()



torch.Size([60000, 28, 28])

In [4]:
test_dataset.test_labels.size()



torch.Size([10000])

In [5]:
# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = int(n_iters / (len(train_dataset) / batch_size))

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)
teset_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size)

In [6]:
# output_formula = ((w - k + 2P)/ S) + 1
# kernel = 5

k = 5
# padding = (k - 1) / 2 = (5 - 1)/2 = 2
p = 2
S = 1
o = (28 - 5 + 2*p)/1 + 1
o

28.0

- pooling k = 2
- o = w/k

![title](resource/cnn_11.png)

In [7]:
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, # number of kernels = 16 = feature maps
                              kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
        
        # convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, # 16 feature maps
                             kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # max pool2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # fully connected
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
        # input dim 32(feature map) * 7 * 7
        #output dim = 10
    
    def forward(self, x):
        # conv 1
        out = self.relu1(self.cnn1(x))
        
        #max pool 1
        out = self.maxpool1(out)
        
        # conv 2
        out = self.relu2(self.cnn2(out))
        
        # max pool2
        out = self.maxpool2(out)
        
        # resize!!!! need to flatten
        # original size (100, 32, 7, 7)
        # out size (100)
        # new out size ( 100, 32*7*7)
        out = out.view(out.size(0), -1)
#         print(out.size())
        
        return self.fc1(out)

In [8]:
model = CNNModel()

In [9]:
criterion = nn.CrossEntropyLoss()

learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [10]:
len(list(model.parameters()))

6

In [11]:
# convolution 1, 16 kernels
print(list(model.parameters())[0].size())

# convlution 1 bias, 16 kernels
print(list(model.parameters())[1].size())

# convolution 2, 32 kernels
print(list(model.parameters())[2].size())

# convlution 2 bias, 32 kernels
print(list(model.parameters())[3].size())

# fully connected
print(list(model.parameters())[4].size())

# fully connected bias
print(list(model.parameters())[5].size())

torch.Size([16, 1, 5, 5])
torch.Size([16])
torch.Size([32, 16, 5, 5])
torch.Size([32])
torch.Size([10, 1568])
torch.Size([10])


In [15]:
# train
start_time = datetime.datetime.now()

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images= Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in teset_loader:
                images = Variable(images)
                outputs = model(images)
                
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct/ total
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration: 500. Loss: 0.0390019528567791. Accuracy: 98
Iteration: 1000. Loss: 0.0318228118121624. Accuracy: 98
Iteration: 1500. Loss: 0.009434083476662636. Accuracy: 97
Iteration: 2000. Loss: 0.007331760134547949. Accuracy: 98
Iteration: 2500. Loss: 0.004547544755041599. Accuracy: 98
Iteration: 3000. Loss: 0.07188664376735687. Accuracy: 98
Time 0:01:42.752246

## model B

- 2 convolutional layers
- 2 average pooling layers
- 1 fully connected layers

In [20]:
class CNNModel_wa(nn.Module):
    def __init__(self):
        super(CNNModel_wa, self).__init__()
        
        # convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, # number of kernels = 16 = feature maps
                              kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # avg pool 1
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)
        
        # convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, # 16 feature maps
                             kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # avg pool2
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)
        
        # fully connected
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
        # input dim 32(feature map) * 7 * 7
        #output dim = 10
    
    def forward(self, x):
        # conv 1
        out = self.relu1(self.cnn1(x))
        
        # pool 1
        out = self.avgpool1(out)
        
        # conv 2
        out = self.relu2(self.cnn2(out))
        
        # pool2
        out = self.avgpool2(out)
        
        # resize!!!! need to flatten
        # original size (100, 32, 7, 7)
        # out size (100)
        # new out size ( 100, 32*7*7)
        out = out.view(out.size(0), -1)
#         print(out.size())
        
        return self.fc1(out)

In [23]:
model = CNNModel_wa()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# train
start_time = datetime.datetime.now()

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images= Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in teset_loader:
                images = Variable(images)
                outputs = model(images)
                
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct/ total
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration: 500. Loss: 0.357828289270401. Accuracy: 96
Iteration: 1000. Loss: 0.1261773407459259. Accuracy: 97
Iteration: 1500. Loss: 0.059222232550382614. Accuracy: 98
Iteration: 2000. Loss: 0.04335835948586464. Accuracy: 98
Iteration: 2500. Loss: 0.2807896137237549. Accuracy: 98
Iteration: 3000. Loss: 0.1934695839881897. Accuracy: 98
Time 0:01:42.567814

max pooling test accuracy > average pooling test accuracy

## model C

- 2 convolutional layers (valid padding)
- 2 max pooling
- 1 fully connected

![title](resource/cnn_2.png)

In [26]:
# o = ((28 - 5 + 0) / 1) + 1 = 24

class CNNModel_wvp(nn.Module):
    def __init__(self):
        super(CNNModel_wvp, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1,
                              out_channels=16,
                              kernel_size=5,
                              stride=1,
                              padding=0)
        self.relu1 = nn.ReLU()
        self.maxpoo1 = nn.MaxPool2d(kernel_size=2)
        
        self.conv2 = nn.Conv2d(in_channels=16,
                              out_channels=32,
                              kernel_size=5,
                              stride=1,
                              padding=0)
        self.relu2 = nn.ReLU()
        self.maxpoo2 = nn.MaxPool2d(kernel_size=2)
        
        self.fc1 = nn.Linear(32*4*4, 10)
        
    def forward(self, x):
        out = self.relu1(self.conv1(x))
        out = self.maxpoo1(out)
        
        out = self.relu2(self.conv2(out))
        out = self.maxpoo2(out)
        
        out = out.view(out.size(0), -1)
        
        return self.fc1(out)

In [27]:
model = CNNModel_wvp()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# train
start_time = datetime.datetime.now()

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images= Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in teset_loader:
                images = Variable(images)
                outputs = model(images)
                
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct/ total
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration: 500. Loss: 0.14758828282356262. Accuracy: 96
Iteration: 1000. Loss: 0.03974378481507301. Accuracy: 98
Iteration: 1500. Loss: 0.027911577373743057. Accuracy: 98
Iteration: 2000. Loss: 0.012288711965084076. Accuracy: 98
Iteration: 2500. Loss: 0.04452335461974144. Accuracy: 98
Iteration: 3000. Loss: 0.00675430940464139. Accuracy: 98
Time 0:01:27.655863

- ways to expand a convolutional NN
    - more convolutional layers
    - less aggressive downsampling (smaller kernel size)
    - more fully connected layers
    
    - con : need a larger dataset
        - does not necessarily mean higher accuracy

In [31]:
model = CNNModel()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# train
start_time = datetime.datetime.now()

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images= Variable(images).requires_grad_().to(device)
        labels = Variable(labels).to(device)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in teset_loader:
                images = Variable(images)
                outputs = model(images)
                
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                
                correct += (predicted.cpu() == labels.cpu()).sum()
                
            accuracy = 100 * correct/ total
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration: 500. Loss: 0.11380062997341156. Accuracy: 97
Iteration: 1000. Loss: 0.056805793195962906. Accuracy: 98
Iteration: 1500. Loss: 0.019140100106596947. Accuracy: 98
Iteration: 2000. Loss: 0.11289659142494202. Accuracy: 98
Iteration: 2500. Loss: 0.0036157798022031784. Accuracy: 98
Iteration: 3000. Loss: 0.026368223130702972. Accuracy: 98
Time 0:01:52.323294