# 04. Object recognition of CIFAR10 dataset using data augmentation


---
## Purpose
Carry out object recognition of the CIFAR10 dataset. The structure of this program is similar to the MNIST character recognition program, so refer to that tutorial for a basic explanation. This page describes differences with the MNIST character recognition program.

Compute neural network operations by using the GPU．Also, confirm the effect of data augmentation on learning.


## Preparations

### Confirm and change Google Colaboratory settings

In this tutorial, we use PyTorch to implement a neural network and carry out training and evaluation. **To process operations using the GPU, go to the menu bar at the top of screen and choose Runtime -> Change runtime type -> Hardware accelerator -> GPU.**

## Dataset

### CIFAR10 dataset

We use the CIFAR10 dataset for object recognition in this tutorial. The CIFAR10 dataset is composed of images in 10 different classes representing airplanes, dogs, etc.

![CIFAR10_sample.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/176458/b6b43478-c85f-9211-7bc6-227d9b387af5.png)

## Import modules


First, import the necessary modules.


### Confirm GPU settings

Confirm computation using GPU is enabled.

If `Use CUDA: True` is displayed, it is possible to use the GPU to perform computation in PyTorch. If Use CUDA: False is displayed, start from the procedures given in “Confirm and change Google Colaboratory settings” above and change the settings. Then import the modules again.




In [None]:
# import modules
from time import time
import numpy as np
import torch
import torch.nn as nn

import torchvision
import torchvision.transforms as transforms

import torchsummary

# confirm GPU settings
use_cuda = torch.cuda.is_available()
print('Use CUDA:', use_cuda)

## Read and dataset and Data Augmentation

Load the training data (CIFAR10 dataset)．

At this time, we define `transform_train` and `transform_test`, which pre-process training and testing images. `transform_train` is defined using `transforms.Compose()`. `transforms.Compose()` takes image data as its argument and returns processed image data, which is defined for use in training and evaluation.

### In case of no augmentation
First, definition if no data augmentation is performed is explained.
As `transforms.Compose([transforms.ToTensor()])` shows, the list enclosed by the parentheses of function `transforms.ToTensor()` is passed as the argument. This function converts data to a tensor, which can be handled by PyTorch. At the same time, pixel values `[0, 255]` are normalized to `[0.0, 1.0]`. 


### In case of augmentation

If applying some types of augmentation, pass a list of transform functions you wish to perform as the argument of `transforms.Compose()`. In the example below
```
[transforms.RandomCrop(32, padding=1),
 transforms.RandomHorizontalFlip(),
 transforms.ToTensor()]
```
three functions are given in the list, which is passed as the argument of `transforms.Compose()`. `RandomCrop()` randomly crops an image. It returns a redefined CIFAR10 image, which is originally 32x32 pixels, by resizing it. `RandomHorizontalFlip()` defines an image by randomly flipping it left/right. `transforms.ToTensor()` converts augmented image data to a tensor and normalizes the pixel values.

Because augmentation is not applied to the testing data, only the `ToTensor()` function is used to define the data.


In [None]:
# NO augmentation #####
transform_train = transforms.Compose([transforms.ToTensor()])
transform_test = transforms.Compose([transforms.ToTensor()])

# augmentation #####
# transform_train = transforms.Compose([transforms.RandomCrop(32, padding=1),
#                                       transforms.RandomHorizontalFlip(),
#                                       transforms.ToTensor()])
# transform_test = transforms.Compose([transforms.ToTensor()])

train_data = torchvision.datasets.CIFAR10(root="./", train=True, transform=transform_train, download=True)
test_data = torchvision.datasets.CIFAR10(root="./", train=False, transform=transform_test, download=True)

print(train_data)
print(test_data)

## Defines neural network

Define the convolutional neural network.

The network in this tutorial consists of two convolutional layers and three fully connected layers.

The first convolutional layer has 1 input channel, 16 output feature maps, and a 3x3 convolution filter. The second convolutional layer has 16 input channels, 32 output feature maps, and convolution filter that also has a size of 3x3. The first fully connected layer has an indefinite number of units and 1024 output units. The next fully connected layer has 1024 input units and 1024 output units. The output layer has 1024 input units and 10 output units. We define the composition of each layer using the `__init__` function.


The `forward` function describes how to connect and process the defined layers. The `forward` function’s parameter x represents the input data. This parameter’s argument is inputted to conv1 defined by the `__init__` function. The output of that function is passed to the activation function relu. This output is passed to max_pooling_2d. The result of pooling is outputted as h, which is passed to conv2 for convolutional processing and pooling. This output h is passed to l1 for fully connected layer processing. The second convolutional layer is also processed using the same procedures. Finally, after processing of l3 fully connected layer, the output h is returned.


In [None]:
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.l1 = nn.Linear(8 * 8 * 32, 1024)
        self.l2 = nn.Linear(1024, 1024)
        self.l3 = nn.Linear(1024, 10)
    
    def forward(self, x):
        h = self.pool(self.relu(self.conv1(x)))
        h = self.pool(self.relu(self.conv2(h)))
        h = h.view(h.size()[0], -1)
        h = self.relu(self.l1(h))
        h = self.relu(self.l2(h))
        h = self.l3(h)
        return h

## Create neural network

Create the neural network defined by the program above.

Call the CNN class to define the neural network model. If using the GPU （`use_cuda == True`）, the network model is placed in GPU memory. This makes it possible to perform operations using the GPU.

We use stochastic gradient descent with momentum (SGD with momentum) as the optimization technique when training. We pass 0.01 as the argument of the learning rate parameter and 0.9 as the argument of the momentum parameter.

Finally, `torchsummary.summary()` is used to display detailed information about the defined network.


In [None]:
model = CNN()
if use_cuda:
    model.cuda()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# display detiald information about the defined network
torchsummary.summary(model, (3, 32, 32))

## Training

Carry out training by using the loaded CIFAR10 dataset and the created neural network.

We set the data size for calculating errors for one pass (mini-batch size) as 64 and the number of training epochs as 10.

Next, we define the data loader. The data loader uses the training dataset (`train_data`) that was loaded above and creates an object that reads the data in the mini-batch size as specified by the assignment statement below. For this training, we set `shuffle=True` to specify that the data is to be read randomly each time.

Next, we set the error function. Because we are dealing with a classification problem here, we define `criterion` to be `CrossEntropyLoss` to calculate cross entropy error.

Begin training.

For each update, the data to be learned and the teacher data are given the names `image` and `label`, respectively. The training model is given an image and obtains the probability y for each class. The error between each class’s probability y and the teacher label is calculated by `criterion`. The recognition accuracy is also calculated. The error is then backpropagated by the backward function to update the neural network.


In [None]:
# set the mini-batch size and training epochs
batch_size = 64
epoch_num = 10
n_iter = len(train_data) / batch_size

# define data loader
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

# set the error (loss) function
criterion = nn.CrossEntropyLoss()
if use_cuda:
    criterion.cuda()

# switch to training mode
model.train()

start = time()
for epoch in range(1, epoch_num+1):
    sum_loss = 0.0
    count = 0
    
    for image, label in train_loader:
        
        if use_cuda:
            image = image.cuda()
            label = label.cuda()

        y = model(image)

        loss = criterion(y, label)
        
        model.zero_grad()
        loss.backward()
        optimizer.step()
        
        sum_loss += loss.item()
        
        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)
        
    print("epoch: {}, mean loss: {}, mean accuracy: {}, elapsed_time :{}".format(epoch,
                                                                                 sum_loss / n_iter,
                                                                                 count.item() / len(train_loader),
                                                                                 time() - start))

## Testing
Evaluate by using the trained network model on the testing data.


In [None]:
# define data loader
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100, shuffle=False)

# switch evaluation mode
model.eval()

# begin evaluation
count = 0
with torch.no_grad():
    for image, label in test_loader:

        if use_cuda:
            image = image.cuda()
            label = label.cuda()
            
        y = model(image)

        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)

print("test accuracy: {}".format(count.item() / 10000.))

## Problems

### 1. Change the neural network structure and confirm the change in recognition accuracy.

**Hint: The following items can change the neural network structure.**
* The number of units in intermediate layers, convolution kernel size of convolution
* Number of layers
* The activation function
  * For example, `nn.Tanh()` or `nn.ReLU()`, `nn.LeakyReLU()`, etc.
  * Other activation functions that can be used in PyTorch are summarized on [this page](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity).


### 2. Change training settings and confirm the change in recognition accuracy.

**Hint: The following settings that can be changed in the program**
* Mini-batch size
* Number of training cycles (number of epochs)
* Learning rate
* Optimization method
  * Choices include `torch.optim.Adagrad()` and `torch.optim.Adam()`．
  * Optimization methods that can be used in PyTorch are summarized on [this page](https://pytorch.org/docs/stable/optim.html#algorithms).


### 3. Add types of data augmentation and carry out training.

**Hint: You can change the data augmentation used for training in transform_train.**

```python
transform_train = transforms.Compose([(Add augmentation you wish to use here) ,
                                      transforms.ToTensor()])
```

Data augmentations you can use in PyTorch (torchvision) are summarized on [this page](https://pytorch.org/docs/stable/torchvision/transforms.html).

