# 9. Convolutional Neural Network with PyTorch

## 1. About Convolutuional Neural Network

### 1.1 Transition From Feedforward Neural Network

#### 1 Hidden Layer Feedforward Neural Network

<img src='./images/09-01.png'>

#### Basic Convolutioanl Neural Network

- Additional **convolution** and **pooling** layers **before feedforward neural network**
- Layer with a **linear function & non-linearity: Fully connected layer**

<img src='./images/09-02.png'>

### 1.2 One Convolutioanl Layer: High Level View

- Input Depth = 1 <==> Single Color Image

<img src = "./images/09-03.png">

<img src = "./images/09-04.png">

<img src = "./images/09-05.png">

<img src = "./images/09-06.png">

<img src = "./images/09-07.png">

<img src = "./images/09-03.png">

- Input Depth = 3 <==> RGB, not gray scale

<img src = "./images/09-08.png">

- Using the same filter

<img src = "./images/09-09.png">

<img src = "./images/09-10.png">

<img src = "./images/09-11.png">

<img src = "./images/09-12.png" width=10%>

<img src = "./images/09-08.png">

### 1.2 One Convolutioanl Layer: High Level View Summary

<img src = "./images/09-03.png">

- As the **kernel is sliding/convolving** across the image $\rightarrow$ 2 operations done **per patch**
    1. Element-wise multiplication
    2. Summation
- More **kernels** = more **feature map channels**
    - Can capture **more information** about the input

### 1.3 Multiple Convolutional Layers: High Level View

- deeper $\uparrow$ , the number of kernels $\uparrow$ => more informations

<img src = "./images/09-13.png">

### 1.4 Pooling Layer: High Level View

- 2 Common Types
    - Max Pooling
    - Average Pooling

<img src = "./images/09-14.png">

<img src = "./images/09-15.png">

<img src = "./images/09-16.png">

<img src = "./images/09-17.png">

<img src = "./images/09-18.png">

<img src = "./images/09-19.png">

<img src = "./images/09-20.png">

<img src = "./images/09-21.png">

<img src = "./images/09-22.png">

### 1.5 Multiple Pooling Layers: High Level View

<img src = "./images/09-13.png">

### 1.6 Padding

<img src = "./images/09-23.png">

<img src = "./images/09-24.png">

<img src = "./images/09-25.png">

<img src = "./images/09-26.png">

<img src = "./images/09-27.png">

<img src = "./images/09-28.png">

<img src = "./images/09-29.png">

<img src = "./images/09-30.png">

<img src = "./images/09-31.png">

<img src = "./images/09-32.png">

<img src = "./images/09-33.png">

<img src = "./images/09-34.png">

### 1.7 Padding Summary

- **Valid** Padding (Zero Padding)
    - Output size < Input Size
- **Same** Padding
    - Output size = Input Size

### 1.8 Dimension Calculations

- $O = \frac{W-K+2P}{S} + 1 $
    - $O$: output height/length
    - $W$: input height/length
    - $K$: filter size (kernel size)
    - $P$: padding
        - $P = \frac{k-1}{2}$
    - $S$ : stride

#### Example 1: Output Dimension Calculation for Valid Padding

<img src = "./images/09-35.png">

- $W = 4$
- $K = 3$
- $P = 0$
- $S = 1$
- $O = \frac{4-3+2*0}{1} + 1 = \frac{1}{1} + 1 = 1 + 1 = 2$

#### Example 2: Output Dimension Calculation for Same Padding

<img src = "./images/09-27.png">

- $W = 5$
- $K = 3$
- $P = \frac{3-1}{2} = \frac{2}{2} = 1$
- $S = 1$
- $O = \frac{5-3+2*1}{1} + 1 = \frac{4}{1} + 1 = 5$

## 2. Building a Convolutioanl Neural Network with PyTorch

### Model A:

- 2 Convolutional Layers
    - Same Padding (same output size)
- 2 Max Pooling Layers
- 1 Fully Connected Layer

<img src = "./images/09-36.png">

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset
#### Images from 0 to 9

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root = './data',
                            train = True,
                            transform = transforms.ToTensor(),
                            download = True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform = transforms.ToTensor())

In [3]:
print (train_dataset.train_data.size())

torch.Size([60000, 28, 28])


In [4]:
print (train_dataset.train_labels.size())

torch.Size([60000])


In [5]:
print (test_dataset.test_data.size())

torch.Size([10000, 28, 28])


In [6]:
print (test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: Make Dataset Iterable

In [7]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Step3: Create Model Class

<img src = "./images/09-36.png">

#### Output Formula for Convolution
- $O = \frac{W-K+2P}{S} + 1$
    - $O$: output height/length
    - $W$: input height/length
    - $K$: **filter size (kernel size) = 5**
    - $P$: **same padding (non-zero)
        - $P = \frac{k-1}{2} = \frac{5-1}{2} = 2$ 
    - $S$: **stride = 1 **

#### Output Formula for Pooling
- $O = \frac{W}{K}$
    - W: input height/width
    - K: **filter size = 2**

<img src = "./images/09-37.png">

In [8]:
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        # in_channels=1 => MNIST images are single gray scaled
        # out_channels = 16 feature maps
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
        
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        # 32 * 7 * 7 is calculated above
        self.fc1 = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2
        out = self.maxpool2(out)
        
        # Resize
        # Original size : (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)
        
        # Linear function (readout)
        out = self.fc1(out)
        
        return out        

### Step 4: Instantiate Model Class

In [9]:
model = CNNModel()

### Step 5: Instantiate Loss Class

- Convolutional Neural Network: **Cross Entropy Loss**
    - *Feedforward Neural Network*: **Cross Entropy Loss**
    - *Logistic Regression*: **Cross Entropy Loss**
    - *Linear Regression*: **MSE**

In [10]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_{\theta}$
        - $\theta$: parameters (our variables)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_{\theta}$: parameters' gradients
        
    - Even simplier equation
        - parameters = parameters - learning_rate * parameters_gradients
        - **At every iteration, we update our model's parameters**

In [11]:
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

In [12]:
print (model.parameters())

<generator object Module.parameters at 0x10a158d00>


In [13]:
print (len(list(model.parameters())))

6


In [14]:
# Convolution 1: 16 Kernels
# 16 Kernels are in the first convolution of the size 5 times 5
# input from MNIST is one
print (list(model.parameters())[0].size())

torch.Size([16, 1, 5, 5])


In [15]:
# Convolution 1 Bias: 16 Kernels
print (list(model.parameters())[1].size())

torch.Size([16])


In [16]:
# Convolution 2: 32 Kernels with depth = 16
# 32 kernels, each kernel has the depth of 16
# Kernel depth depends on the input size(16 14 14)
print (list(model.parameters())[2].size())

torch.Size([32, 16, 5, 5])


In [17]:
# Convolution 2 Bias: 32 Kernels with depth = 16
print (list(model.parameters())[3].size())

torch.Size([32])


In [18]:
# Fully Connected Layer 1
print (list(model.parameters())[4].size())

torch.Size([10, 1568])


In [19]:
# Fully Connected Layer Bias
print (list(model.parameters())[5].size())

torch.Size([10])


### Step 7: Train Model

- Process
    1. **Convert inputs/labels to variable**
        - CNN Input: (1, 28, 28)
        - Feedforward NN Input: (1, 28\*28)<br>
    2. Clear gradient buffers
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - parameters = parameters - learning_rate * parameters_gradients
    7. REPEAT

In [20]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        # No need to change dimensions, such as images.view(-1, 28*28)
        # images.shape = (100, 1, 28, 28)
        images = Variable(images)
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')      

Iteration: 500, Loss: 0.2834421694278717, Accuracy: 90.04
Iteration: 1000, Loss: 0.21440020203590393, Accuracy: 92.64
Iteration: 1500, Loss: 0.18138611316680908, Accuracy: 94.64
Iteration: 2000, Loss: 0.07699497044086456, Accuracy: 95.89
Iteration: 2500, Loss: 0.08896749466657639, Accuracy: 96.3
Iteration: 3000, Loss: 0.12304940074682236, Accuracy: 96.72


### Model B:

- 2 Convolutional Layers
    - Same Padding (same output size)
- 2 **Average Pooling** Layers
- 1 Fully Connected Layer

<img src="./images/09-38.png">

<img src="./images/09-39.png">

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [21]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 3: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # Average pooling 1
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)
        
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Average pooling 2
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10)

    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Average pool 1
        out = self.avgpool1(out)
        
        # Convolution 2
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Average pool 2
        out = self.avgpool2(out)
        
        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)
        
        # Linear function (readout)
        out = self.fc1(out)
        
        return out
    
'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    
    for i, (images, labels) in enumerate(train_loader):
        
        # Load images as Variable
        images = Variable(images)
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')                   

Iteration: 500, Loss: 0.5866946578025818, Accuracy: 85.28
Iteration: 1000, Loss: 0.3715739846229553, Accuracy: 89.62
Iteration: 1500, Loss: 0.28837332129478455, Accuracy: 90.27
Iteration: 2000, Loss: 0.21806347370147705, Accuracy: 91.62
Iteration: 2500, Loss: 0.20227570831775665, Accuracy: 92.53
Iteration: 3000, Loss: 0.2200961858034134, Accuracy: 93.69


### Average Pooling Test Accuracy < Max Pooling Test Accuracy (Generally)

### Model C:

- 2 Convolutional Layers
    - **Valid Padding** (smaller output size)
- 2 **Max Pooling** Layers
- 1 Fully Connected Layer

<img src="./images/09-40.png">

<img src="./images/09-41.png">

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [22]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    
    for i, (images, labels) in enumerate(train_loader):
        
        # Load images as Variable
        images = Variable(images)
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')                   

Iteration: 500, Loss: 0.44883838295936584, Accuracy: 89.61
Iteration: 1000, Loss: 0.16485387086868286, Accuracy: 92.93
Iteration: 1500, Loss: 0.25254279375076294, Accuracy: 94.6
Iteration: 2000, Loss: 0.17668139934539795, Accuracy: 95.68
Iteration: 2500, Loss: 0.13033664226531982, Accuracy: 96.37
Iteration: 3000, Loss: 0.14758671820163727, Accuracy: 96.61


### Summary of Results

| Model A | Model B | Model C |
|------|------|------|
|Max Pooling|Average Pooling|Max Pooling|
|Same Padding|Same Padding|Valid Padding|
|96.72%|93.69%|96.61%|

| All Models |
|-----------|
|INPUT $\rightarrow$ CONV $\rightarrow$ POOL $\rightarrow$ CONV $\rightarrow$ POOL $\rightarrow$ FC|
|Convolution Kernel Size = 5 $\times$ 5|
|Convolution Kernel Stride = 1|
|Pooling Kernel Size = 2 $\times$ 2|

### Deep Learning
- 3 ways to expand a convolutional neural network
    - More convolutional layers
    - Less aggresive downsampling
        - Smaller kernel size for pooling (gradually downsampling)
    - More fully connected layers
- Cons
    - Need a larger dataset
        - Curse of dimensionality
    - Does not necessarily mean higher accuracy

## 3. Building a Convolutioanl Neural Network with PyTorch (GPU)

### Model A

<img src = "./images/09-36.png">

<img src = "./images/09-37.png">

GPU: 2 things must be on GPU
- model
- variables

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- **Step 4: Instantiate Model Class**
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- **Step 7: Train Model**

In [25]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''

model = CNNModel()

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 0.47827842831611633, Accuracy: 90.03
Iteration: 1000, Loss: 0.28341493010520935, Accuracy: 93.32
Iteration: 1500, Loss: 0.2496165633201599, Accuracy: 94.41
Iteration: 2000, Loss: 0.12156059592962265, Accuracy: 95.82
Iteration: 2500, Loss: 0.09511017054319382, Accuracy: 96.47
Iteration: 3000, Loss: 0.0689583569765091, Accuracy: 96.82


### Summary

- Transition from **Feedforward Neural Network**
    - Addition of **Convolutional & Pooling** Layers before Linear Layers
- One **Convolutional** Layer Basics
- One **Pooling** Layer Basics
    - Max pooling
    - Average pooling
- **Padding**
- **Output Dimension** Calculations and Examples
    - $O = \frac{W-K+2P}{S}+1$
- Convolutional Neural Networks
    - **Model A** : 2 Conv + 2 Max pool + 1 FC
        - Same Padding
    - **Model B** : 2 Conv + 2 Average pool + 1 FC
        - Same Padding
    - **Model C** : C Conv + 2 Max pool + 1 FC
        - Valid Padding
    - Model Variation in **Code**
        - Modifying only step 3
    - Ways to Expand Model's **Capacity**
        - More convolutions
        - Gradual pooling
        - More fully connected layers
    - **GPU** Code
        - 2 things on GPU
            - model
            - variable
        - Modifying only **Step 4 & Step 7**
    - **7 Step ** Model Building Recap
        - Step 1: Load Dataset
        - Step 2: Make Dataset Iterable
        - Step 3: Create Model Class
        - Step 4: Instantiate Model Class
        - Step 5: Instantiate Loss Class
        - Step 6: Instantiate Optimizer Class
        - Step 7: Train Model