# 05. Object recognition of CIFAR10 dataset using ResNet


---
## Purpose

Carry out object recognition of the CIFAR10 dataset. The structure of this program is similar to the MNIST character recognition program, so refer to that tutorial for a basic explanation. This page describes differences with the MNIST character recognition program.

Compute neural network operations by using the GPU Also, confirm the effect data augmentation on learning. In this tutorial, we use a residual neural network (ResNet) as the convolutional neural network model.

## Preparations

### Confirm and change Google Colaboratory settings

In this tutorial, we use PyTorch to implement a neural network and carry out training and evaluation. **To process operations using the GPU, go to the menu bar at the top of screen and choose Runtime -> Change runtime type -> Hardware accelerator -> GPU.**

## Dataset

### CIFAR10 dataset

We use the CIFAR10 dataset for object recognition in this tutorial. The CIFAR10 dataset is composed of images in 10 different classes representing airplanes, dogs, etc.

![CIFAR10_sample.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/176458/b6b43478-c85f-9211-7bc6-227d9b387af5.png)

## Import modules


First, import the necessary modules.


### Confirm GPU settings

Confirm computation using GPU is enabled.

If `Use CUDA: True` is displayed, it is possible to use the GPU to perform computation in PyTorch. If Use CUDA: False is displayed, start from the procedures given in “Confirm and change Google Colaboratory settings” above and change the settings. Then import the modules again.




In [None]:
# import modules
from time import time
import numpy as np
import torch
import torch.nn as nn

import torchvision
import torchvision.transforms as transforms

import torchsummary

# confirm GPU settings
use_cuda = torch.cuda.is_available()
print('Use CUDA:', use_cuda)

## Read and confirm dataset
Load the training data (CIFAR10 dataset)．



In [None]:
transform_train = transforms.Compose([transforms.RandomCrop(32, padding=1),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor()])
transform_test = transforms.Compose([transforms.ToTensor()])

train_data = torchvision.datasets.CIFAR10(root="./", train=True, transform=transform_train, download=True)
test_data = torchvision.datasets.CIFAR10(root="./", train=False, transform=transform_test, download=True)

## Define neural network
Define the Residual Network (ResNet)．

A ResNet is composed of structures called bottlenecks. First, we use `BottleNeck(nn.Module)` to create a class that can define a bottleneck in an arbitrary form.
`in_planes`, a parameter of the `__init__` function, specifies the number of input feature map channels. `planes` specifies the number of feature map channels in a bottle neck. 

The function `nn.Sequential()` is used in `__init__` to define the layers. It receives a list containing multiple layers as an argument and defines an object bringing together these layers (container of layers). In the function below, convolution and batch normalization are performed within the list. When `self.convs`, defined by `nn.Sequential`, is operated on—in short, when `self.convs(x)` is used in the function `forward()`—arguments in the list are operated on and returned sequentially.

We define a `ResNet` (here, ResNet50) by using the bottleneck structure defined above. `self._make_layer()`, which is defined in a ResNet cluster, defines Residual Block (layer composed of multiple bottlenecks) of arbitrary form. A Residual Block specifies the number of channels `planes`, the number of bottlenecks `num_blocks` and the stride of the convolution `stride`.
Next, in accordance with these arguments, bottlenecks specified by quantity and parameters are stored in a list. Finally, using `nn.Sequential` as explained above, a block of layers is defined and returned to define a Residual Block with an arbitrary number of layers. 


Using `_make_layer()`, the entire ResNet is defined in `__init__`．Adaptive average Pooling (`AdaptiveAvgPool2d()`) applies average pooling to a feature map of arbitrary size.
The arguments `(1, 1)` specify performing average pooling so that an input feature map of any size will become a 1x1 feature map.   





In [None]:
class BottleNeck(nn.Module):
    expansion = 4
    def __init__(self, in_planes, planes, stride=1):
        super().__init__()
        self.convs = nn.Sequential(*[nn.Conv2d(in_planes, planes, kernel_size=1, bias=False),
                                     nn.BatchNorm2d(planes),
                                     nn.ReLU(inplace=True),
                                     nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False),
                                     nn.BatchNorm2d(planes),
                                     nn.ReLU(inplace=True),
                                     nn.Conv2d(planes, self.expansion * planes, kernel_size=1, bias=False),
                                     nn.BatchNorm2d(self.expansion * planes)])

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        out = self.convs(x)
        out += self.shortcut(x)
        out = self.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, n_class=10, n_blocks=[3, 4, 6, 3]):
        super().__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        
        self.res1 = self._make_layer(64, n_blocks[0], stride=1)
        self.res2 = self._make_layer(128, n_blocks[1], stride=2)
        self.res3 = self._make_layer(256, n_blocks[2], stride=2)
        self.res4 = self._make_layer(512, n_blocks[3], stride=2)
        
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(2048, n_class)

    def _make_layer(self, planes, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(BottleNeck(self.in_planes, planes, stride))
            self.in_planes = planes * BottleNeck.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        h = self.relu(self.bn1(self.conv1(x)))
        h = self.res1(h)
        h = self.res2(h)
        h = self.res3(h)
        h = self.res4(h)
        h = self.avgpool(h)
        h = torch.flatten(h, 1)
        h = self.fc(h)
        return h
        
        
class ResNet50(ResNet):
    def __init__(self, n_class=10):
        super(ResNet50, self).__init__(n_class, n_blocks=[3, 4, 6, 3])

## Create neural network

Create the neural network defined by the program above.

Call the CNN class to define the neural network model. If using the GPU （`use_cuda == True`）, the network model is placed in GPU memory. This makes it possible to perform operations using the GPU.

We use stochastic gradient descent with momentum (SGD with momentum) as the optimization technique when training. We pass 0.01 as the argument of the learning rate parameter and 0.9 as the argument of the momentum parameter.

Finally, `torchsummary.summary()` is used to display detailed information about the defined network. 

In [None]:
model = ResNet50(n_class=10)
if use_cuda:
    model.cuda()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# display detailed information about the defined network
torchsummary.summary(model, (3, 32, 32))

## Training
We set the data size for calculating errors for one pass (mini-batch size) as 128 and the number of training epochs as 100. We get the number of updates in one epoch by obtaining the size of CIFAR10 training data. The training model is given image and obtains the probability y for each class. The error between each class’s probability y and the teacher label is calculated by the softmax cross entropy error function. The recognition accuracy is also calculated. The error is then backpropagated by the backward function to update the neural network. 

In [None]:
# set mini-batch size and training epochs
batch_size = 128
epoch_num = 10
n_iter = len(train_data) / batch_size

# define data loader
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=2)

# set error (loss) function
criterion = nn.CrossEntropyLoss()
if use_cuda:
    criterion.cuda()

# switch training mode
model.train()

start = time()
for epoch in range(1, epoch_num+1):
    sum_loss = 0.0
    count = 0
    
    for image, label in train_loader:
        if use_cuda:
            image = image.cuda()
            label = label.cuda()

        y = model(image)
        loss = criterion(y, label)
        
        model.zero_grad()
        loss.backward()
        optimizer.step()
        
        sum_loss += loss.item()
        
        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)

    print("epoch: {}, mean loss: {}, mean accuracy: {}, elapsed_time :{}".format(epoch,
                                                                                 sum_loss / n_iter,
                                                                                 count.item() / len(train_loader),
                                                                                 time() - start))

## Testing

Evaluate by using the trained network model on the testing data.

In [None]:
# define data loader
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100, shuffle=False)

# switch evaluation mode
model.eval()

# begin evaluatin
count = 0
with torch.no_grad():
    for image, label in test_loader:

        if use_cuda:
            image = image.cuda()
            label = label.cuda()
            
        y = model(image)

        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)

print("test accuracy: {}".format(count.item() / 10000.))

## 課題


### 1. Change training settings and confirm the change in recognition accuracy.

**Hint: The following settings that can be changed in the program**
* Mini-batch size
* Number of training cycles (number of epochs)
* Learning rate
* Optimization method
  * Choices include `torch.optim.Adagrad()` and `torch.optim.Adam()`．
  * Optimization methods that can be used in PyTorch are summarized on [this page](https://pytorch.org/docs/stable/optim.html#algorithms).


### 2. Add types of data augmentation and carry out training.

**Hint: You can change the data augmentation used for training in transform_train.**

```python
transform_train = transforms.Compose([(Add augmentation you wish to use here) ,
                                      transforms.ToTensor()])
```

Data augmentations you can use in PyTorch (torchvision) are summarized on [this page](https://pytorch.org/docs/stable/torchvision/transforms.html).

