# 01. Character recognition of MNIST dataset using MLP


---
## Purpose
Carry out character recognition of the MNIST dataset using a Multilayer Perceptron (MLP).
For evaluation, calculate the recognition rate of each class using a confusion matrix.

## Dataset


We use the [MNIST Dataset](http://yann.lecun.com/exdb/mnist/) to train and test character recognition in this tutorial. The MNIST dataset is a dataset composed of images depicting numerical digits from 0 to 9. 

We process the images of the MNIST dataset as shown below to allow easier recognition using black and white values.



![MNIST_sample.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/143078/559938dc-9a99-d426-010b-e000bca0aac6.png)

## Import modules
First, import the necessary modules.

For this tutorial, import `torch` (PyTorch).

In [None]:
from time import time
import numpy as np
import torch
import torch.nn as nn

import torchvision
import torchvision.transforms as transforms

import torchsummary

## Read and confirm dataset

Load the training data (MNIST dataset).

Confirm the size of the loaded data. The training data size is 60,000 images. The testing data size is 10,000 images. The size of each data is 28x28 pixels, 786 dimensions.

In [None]:
train_data = torchvision.datasets.MNIST(root="./", train=True, transform=transforms.ToTensor(), download=True)
test_data = torchvision.datasets.MNIST(root="./", train=False, transform=transforms.ToTensor(), download=True)

print(type(train_data.data), type(train_data.targets))
print(type(test_data.data), type(test_data.targets))
print(train_data.data.size(), train_data.targets.size())
print(test_data.data.size(), test_data.targets.size())

### Display MNIST dataset

Display the images in the MNIST dataset. Here we use a program that uses matplotlib to display multiple images.

In [None]:
import matplotlib.pyplot as plt

cols = 10

plt.clf()
fig = plt.figure(figsize=(14, 1.4))
for c in range(cols):
    ax = fig.add_subplot(1, cols, c + 1)
    ax.imshow(train_data[c][0].view(28, 28), cmap=plt.get_cmap('gray'))
    ax.set_axis_off()
plt.show()

## Display MNIST dataset

Display the images in the MNIST dataset. Here we use a program that uses matplotlib to display multiple images.

Define neural network model
Define the neural network. For this example, we create a three-layer neural network consisting of the input layer, intermediate layer, and output layer. 
The number of units in the input layer depends on the size of the input data. In this case, the image size (pixels) is `28 x 28 = 784`, so we specify a one-dimensional array of pixel values as the sorted data to be inputted.

The numbers of units in the intermediate layer and output layer are given by parameters `n_hidden` and `n_out`, respectively. In PyTorch, each layer is defined by passing arguments to these parameters in the `__init__` function. Each layer is treated as a linear function. The activation function is specified by `self.act`. Here we specify the sigmoid function as the activation function.

The `forward` function describes how to connect and process the defined layers. The `forward` function’s parameter `x` represents the input data. This parameter’s argument is inputted sequentially to the intermediate layer `l1` defined by the `forward` function and the activation function `act`. The output is `h1`, which is passed to input layer `l2`. The output of that layer is called `h2`. 

In [None]:
class MLP(nn.Module):
    def __init__(self, n_hidden, n_out):
        super().__init__()
        self.l1 = nn.Linear(28*28, n_hidden)
        self.l2 = nn.Linear(n_hidden, n_out)
        self.act = nn.Sigmoid()
        
    def forward(self, x):
        h1 = self.act(self.l1(x))
        h2 = self.l2(h1)
        return h2

## Create neural network

Create the neural network defined by the program above.

First, we define the numbers of units of the intermediate layer and output layers. Here, we set `hidden_num`, the number of units in the intermediate layers, as 16. We set `out_num`, the number of units in output layers, as 10. This number corresponds to the number of classes in the MNIST dataset.

The network model is defined by passing the numbers of units in each layer as the arguments of `MLP` class defined above.

For training, we use stochastic gradient descent with momentum (SGD with momentum) as the optimization method. We set the learning rate as 0.01 and the momentum to 0.9 in the arguments. 

Finally, `torchsummary.summary()` is used to display detailed information about the defined network. The first argument specifies the model for which to display details. The second argument specifies the size of the data being input to the network. In this way, you can confirm the structure of the neural network.

In [None]:
# define the number of units
hidden_num = 16
out_num = 10

# create network model
model = MLP(n_hidden=hidden_num, n_out=out_num)

# setup optimization method
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# display detailed information about the defined network
torchsummary.summary(model, (1, 28*28), device='cpu')

## Training
Carry out training by using the loaded MNIST dataset and the created neural network.

We set the data size for calculating errors for one pass (mini-batch size) as 100 and the number of training epochs as 10.

Next, we define the data loader. The data loader uses the training dataset (`train_data`) that was loaded above and creates an object that reads the data in the mini-batch size as specified by the assignment statement below. For this training, we set shuffle=True to specify that the data is to be read randomly each time. 

Next, we set the error function. Because we are dealing with a classification problem here, we define `criterion` to be `CrossEntropyLoss` to calculate cross entropy error.

Begin training.

For each update, the data to be learned and the teacher data are given the names `image` and `label`, respectively. The training model is given an image and obtains the probability `y` for each class. The error between each class’s probability `y` and the teacher `label` is calculated by `criterion`. The recognition accuracy is also calculated. The error is then backpropagated by the `backward` function to update the neural network.  


In [None]:
# set mini-batch size and the number of training eopchs
batch_size = 100
epoch_num = 10

# define data loader
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

# set error (loss) function
criterion = nn.CrossEntropyLoss()

# swich network execution mode as training mode
model.train()

# begin training
for epoch in range(1, epoch_num+1):
    sum_loss = 0.0
    count = 0
    
    for image, label in train_loader:
        image = image.view(image.size()[0], -1)
        y = model(image)
        
        loss = criterion(y, label)
        model.zero_grad()
        loss.backward()
        optimizer.step()
        
        sum_loss += loss.item()
        
        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)

    print("epoch:{}, mean loss: {}, mean accuracy: {}".format(epoch, sum_loss/600, count.item()/60000.))

## Testing

Evaluate the trained network and confirm the recognition rate on the testing data. Apply `model.eval()` to change network operations to evaluation mode. This enables different operations (e.g. dropout) to behave differently in evaluation mode instead of training mode. Apply `torch.no_grad()` to carry out operations without keeping gradient information that is required during training.


In [None]:
# define data loader
test_loader = torch.utils.data.DataLoader(test_data, batch_size=100, shuffle=False)

# swich network execution mode as evaluation mode
model.eval()

# begin evaluation
count = 0
with torch.no_grad():
    for image, label in test_loader:
        image = image.view(image.size()[0], -1)
        y = model(image)

        pred = torch.argmax(y, dim=1)
        count += torch.sum(pred == label)

print("test accuracy: {}".format(count.item() / 10000.))

## Problem

### 1. Change the neural network structure and confirm the change in recognition accuracy.

**Hint: The following items can change the neural network structure.**
* Number of units in intermediate layers
*	Number of layers
*	The activation function
  * For example, `nn.Tanh()` or `nn.ReLU()`, `nn.LeakyReLU()`, etc.
  * ther activation functions that can be used in PyTorch are summarized on [this page](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity).

\* After changing the neural network structure, use the function `torchsummary.summary()` to view changes in the number of parameters.

  
