
# **[2023-1] Image Processing & Vision (55397)**

* ### Hak Gu Kim
* ### Assistant Professor
* ### Graduate School of Advanced Imaging Science, Multimedia & Film (GSAIM)
* ### Chung-Ang University
* ### Webpage: www.irislab.cau.ac.kr


# **Homework V: Convolutional Neural Networks (CNNs)**

* ### **Deadline:** 21 June (Wed) at 11:59pm
* ### **Submission:** Upload the zip file to "과제 및 평가" on E-class
  * **Upload zip file:** ipv23_hw05-student number.zip
    * **Python code:** ipv23_hw05-student number.ipynb
    * **Report:** ipv23_hw05-student number.pdf  (page limit: 4 pages)
  

## **[Homework V-0]** Environmental Setting

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

## **[Homework V-1]** Dataset

## 1-1) MNIST Dataset

The MNIST dataset consists of 70,000 28x28 handwritten digits images in 10 classess. 60,000 images for training and 10,000 images for test.

- http://yann.lecun.com/exdb/mnist/
- https://pytorch.org/vision/stable/generated/torchvision.datasets.KMNIST.html#torchvision.datasets.KMNIST

In [26]:
# MNIST Dataset
mnist_train = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), target_transform=None, download=True)
mnist_test  = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor(), target_transform=None, download=True)
mnist_train, mnist_val = torch.utils.data.random_split(mnist_train, [50000, 10000])

# Data Loader for MNIST
mnist_train_loader = DataLoader(mnist_train, batch_size=128, shuffle=True)
mnist_val_loader   = DataLoader(mnist_val, batch_size=128, shuffle=False)
mnist_test_loader  = DataLoader(mnist_test, batch_size=128, shuffle=False)

## 1-2) CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

- https://www.cs.toronto.edu/~kriz/cifar.html
- https://pytorch.org/vision/stable/generated/torchvision.datasets.CIFAR10.html#torchvision.datasets.CIFAR10

In [None]:
# Define the Transforms for Training Dataset
transforms_train = transforms.Compose([
  transforms.RandomCrop(32, padding=4),
  transforms.RandomHorizontalFlip(),
  transforms.ToTensor(),
  transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# Define the Transforms for Testing Dataset
transforms_test = transforms.Compose([
  transforms.ToTensor(),
  transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# CIFAR-10 Dataset
cifar_train = torchvision.datasets.CIFAR10(root='./', train=True, download=True, transform=transforms_train)
cifar_test = torchvision.datasets.CIFAR10(root='./', train=False, download=True, transform=transforms_test)

# Data Loader for CIFAR-10
# cifar_train_loader = DataLoader(cifar_train, batch_size=128, shuffle=True)
# cifar_test_loader = DataLoader(cifar_test, batch_size=128, shuffle=False)
cifar_train_loader = DataLoader(cifar_train, batch_size=128, shuffle=True, num_workers=2)
cifar_test_loader = DataLoader(cifar_test, batch_size=128, shuffle=False, num_workers=2)

## **[Homework V-2]** (Practice) Implement Each Component of CNNs

- Convolutional Layer
- Batch Normalization
- Dropout Layer

## 2-1) Convolutional Layer

`nn.Conv2d`: Applies a 2D convolution over an input signal composed of several input planes.

- https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

**Parameters for** `nn.Conv2d`
- in_channels (int) – Number of channels in the input image

- out_channels (int) – Number of channels produced by the convolution

- kernel_size (int or tuple) – Size of the convolution filter (kernel)

- stride (int or tuple, optional) – Stride of the convolution (Default: `1`)

- padding (int, tuple or str, optional) – Padding added to boundaries of the input (Default: `0`)

- padding_mode (string, optional) – `zeros`, `reflect`, `replicate` or `circular` (Default: `zeros`)

- dilation (int or tuple, optional) – Spacing between kernel elements (Default: `1`)

**Examples**
- With square kernels and equal stride:

  `conv_layer = nn.Conv2d(16, 33, 3, stride=2)`

- non-square kernels and unequal stride and with padding:

  `conv_layer = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))`

- non-square kernels and unequal stride and with padding and dilation:

  `conv_layer = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))`

In [None]:
# Example of convolutional layer

# Input dimension: 1 x 3 x 32 x 32
# Convolutional layer: 32 5x5 filters with stride 2, padding 2

x = torch.randn(1, 3, 32, 32) # input: x

conv_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=2, padding=2)

print('Input size:\n', x.size())
print()
print('Output size:\n', conv_layer(x).size())

## 2-2) Batch Normalization

`nn.BatchNorm2d`: Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension)

- https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html
- S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, **ICML 2015** [[Link]](http://proceedings.mlr.press/v37/ioffe15.html)

**Parameters** for `nn.BatchNorm2d`
- num_features – $C$ from an expected input of size ($N, C, H, W$)

**Example**

- With learnable parameters

  `bn = nn.BatchNorm2d(100)`


In [None]:
# Batch Normalization

x = torch.randn(1, 3, 32, 32)

bn = nn.BatchNorm2d(num_features=3)

print('Input size:\n', x.size())
print()
print('Size of feature after BN:\n', bn(x).size()) # Please check the output size after the batch normalization whether the size of input is changed or not

## **[Homework V-3]** (Practice) Build Simple Convolutional Neural Networks

- `nn.Sequential`: A sequential container. Modules will be added to it in the order they are passed in the constructor.
- https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html

## 3-1) Define CNNs Architecture

- 2 conv layers with 7x7 kernel (Convolution + Batch normalization + ReLU)
- 1 fc layer for 10 classes

In [6]:
# Model: Simple Convolutional Neural Networks

class ConvNet(nn.Module):

    def __init__(self):
        super(ConvNet, self).__init__()
        # 1 input image channel, 32 output channels, 7x7 square convolution, 1 stride
        self.conv_layer1 = nn.Sequential(
            nn.Conv2d(1, 32, 7),
            nn.BatchNorm2d(32),
            nn.ReLU(),
        )
        # 32 input image channel, 64 output channels, 7x7 square convolution, 1 stride
        self.conv_layer2 = nn.Sequential(
            nn.Conv2d(32, 64, 7),
            nn.BatchNorm2d(64),
            nn.ReLU(),
        )

        self.fc = nn.Linear(64*16*16, 10)

    def forward(self, x):
        out_conv1 = self.conv_layer1(x)
        out_conv2 = self.conv_layer2(out_conv1)
        feature_1d = torch.flatten(out_conv2, 1)
        out = self.fc(feature_1d)
        return out


In [None]:
# Using GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

model = ConvNet()
model = model.to(device)

## 3-2) Define Optimizer & Loss
- Optimization using stochastic gradient descent (SGD)
- Learning rate α=0.01
- Loss function: Cross Entropy Loss

In [8]:
# Optimizer: Stochastic Gradient Descent Method

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [9]:
# Define Loss function (Cross Entropy Loss here)

loss_fn = nn.CrossEntropyLoss()

##3-3) Train the Simple CNNs Model
- Dataset: MNIST
- Epochs = 10

In [None]:
# Train the model
total_step = len(mnist_train_loader)
epochs = 10
for epoch in range(epochs):
    for i, (images, labels) in enumerate(mnist_train_loader):  # mini batch for loop
        
        # Upload to gpu
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = loss_fn(outputs, labels)
        
        # Backward pass & Optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, epochs, i+1, total_step, loss.item()))

##3-4) Test the Trained CNNs Model

In [None]:
# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in mnist_test_loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        
        _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of Simple CNN on MNIST test set: {} %'.format(100 * correct / total))

## **[Homework V-4]** Design Your Own Convolutional Neural Networks
**References**

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d


**Options**
- The number of convolutional layers
- Stride & padding & dilation
- Various activation functions
- Pooling layers (max pool, avg pool)
- The number of fully connected layers
- The dimension of hidden layers
- The size of kernels at each layer
- *etc*.

In [21]:
# Change the following CNNs architecture

class myConvNet(nn.Module):

    def __init__(self):
        super(myConvNet, self).__init__()
        # 3 input image channel, 32 output channels, 7x7 square convolution, 1 stride
        self.conv_layer1 = nn.Sequential(
            nn.Conv2d(3, 32, 7),
            nn.BatchNorm2d(32),
            nn.ReLU(),
        )
        # 32 input image channel, 64 output channels, 7x7 square convolution, 1 stride
        self.conv_layer2 = nn.Sequential(
            nn.Conv2d(32, 64, 7),
            nn.BatchNorm2d(64),
            nn.ReLU(),
        )

        self.fc = nn.Linear(64*20*20, 10)

    def forward(self, x):
        out_conv1 = self.conv_layer1(x)
        out_conv2 = self.conv_layer2(out_conv1)
        feature_1d = torch.flatten(out_conv2, 1)
        out = self.fc(feature_1d)
        return out


In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

model = myConvNet()
model = model.to(device)

##4-1) Train Your CNNs Model
You can change the number of epochs, learning rate, optimizer, *etc*.

In [23]:
# Optimizer: Stochastic Gradient Descent Method
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Define Loss function
loss_fn = nn.CrossEntropyLoss()

In [None]:
# Train the model
total_step = len(cifar_train_loader)
epochs = 5
for epoch in range(epochs):
    for i, (images, labels) in enumerate(cifar_train_loader):  # mini batch for loop
        
        # Upload to gpu
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = loss_fn(outputs, labels)
        
        # Backward pass & Optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, epochs, i+1, total_step, loss.item()))

##4-2) Test the Trained Your CNNs Model
Try to acheive the best performance!

In [None]:
# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in cifar_test_loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        
        _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of Your CNNs on CIFAR-10 test set: {} %'.format(100 * correct / total))