In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
torch.__version__

'1.0.0'

# 3.2 MNIST data set handwritten digit recognition

## 3.2.1 Data set introduction
MNIST includes 60,000 28x28 training samples and 10,000 test samples. Many tutorials will "start" with it and almost become a "model". It can be said that it is the Hello World in computer vision. So we will also use MNIST for actual combat.

When I introduced the convolutional neural network, I mentioned LeNet-5. The reason why LeNet-5 is so powerful is that it increased the recognition rate of MNIST data to 99% in the environment at that time. Here we also build a convolutional neural network from scratch. The network also achieves 99% accuracy

## 3.2.2 Handwritten Digit Recognition
First, we define some hyperparameters

In [2]:
BATCH_SIZE=512 #Probably need 2G of video memory
EPOCHS=20 # Total training batch
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Let torch determine whether to use GPU, it is recommended to use GPU environment, because it will be much faster

Because Pytorch contains the MNIST data set, we can use it directly here.
If it is executed for the first time, a data folder will be generated and so it will take some time to download.
If it has been downloaded before, it will not be downloaded again

Since the official has implemented dataset, DataLoader can be used directly to read the data

In [3]:
train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=BATCH_SIZE, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


Test set

In [4]:
test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('data', train=False, transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=BATCH_SIZE, shuffle=True)

Below we define a network, that contains two convolutional layers, conv1 and conv2, and then two linear layers as the output, and finally output 10 dimensions. These 10 dimensions are used as 0-9 identifiers to determine the identification Is that number

It is recommended that you mark the input and output dimensions of each layer as comments, so that it will be much easier to read the code later

In [5]:
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        # batch*1*28*28 (each batch of samples will be sent, input channel number 1 (black and white image), image resolution is 28x28)
        # The first parameter of the convolutional layer Conv2d below refers to the number of input channels, the second parameter refers to the number of output channels, and the third parameter refers to the size of the convolution kernel
        self.conv1 = nn.Conv2d(1, 10, 5) # Input channel number 1, output channel number 10, core size 5
        self.conv2 = nn.Conv2d(10, 20, 3) # The number of input channels is 10, the number of output channels is 20, and the size of the core is 3
        # The first parameter of the following fully connected layer Linear refers to the number of input channels, and the second parameter refers to the number of output channels
        self.fc1 = nn.Linear(20*10*10, 500) # The number of input channels is 2000, and the number of output channels is 500
        self.fc2 = nn.Linear(500, 10) # The number of input channels is 500, and the number of output channels is 10, which means 10 categories
    def forward(self,x):
        in_size = x.size(0) # In this example, in_size=512, which is the value of BATCH_SIZE. The input x can be regarded as a tensor of 512*1*28*28.
        out = self.conv1(x) # batch*1*28*28 -> batch*10*24*24 (the 28x28 image undergoes a core convolution of 5x5, and the output becomes 24x24)
        out = F.relu(out) # batch*10*24*24 (the activation function ReLU does not change the shape))
        out = F.max_pool2d(out, 2, 2) # batch*10*24*24 -> batch*10*12*12 (2*2 pooling layer will be halved)
        out = self.conv2(out) # batch*10*12*12 -> batch*20*10*10 (convolution again, the size of the core is 3)
        out = F.relu(out) # batch*20*10*10
        out = out.view(in_size, -1) # batch*20*10*10 -> batch*2000 (The second dimension of out is -1, which means it is automatically calculated. In this example, the second dimension is 20*10* 10)
        out = self.fc1(out) # batch*2000 -> batch*500
        out = F.relu(out) # batch*500
        out = self.fc2(out) # batch*500 -> batch*10
        out = F.log_softmax(out, dim=1) # calculate log(softmax(x))
        return out

We instantiate a network and use the .to method to move the network to the GPU after instantiation

For the optimizer, we also directly choose the simple and violent Adam

In [6]:
model = ConvNet().to(DEVICE)
optimizer = optim.Adam(model.parameters())

Let's define the training function, we will encapsulate all the training operations into this function

In [7]:
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if(batch_idx+1)%30 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

The operation of the test is also encapsulated into a function

In [8]:
def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # Add a batch of losses
            pred = output.max(1, keepdim=True)[1] # Find the subscript with the highest probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Let’s start training, here is the benefit of encapsulation. Just write two lines.

In [9]:
for epoch in range(1, EPOCHS + 1):
    train(model, DEVICE, train_loader, optimizer, epoch)
    test(model, DEVICE, test_loader)


Test set: Average loss: 0.1018, Accuracy: 9695/10000 (97%)


Test set: Average loss: 0.0523, Accuracy: 9825/10000 (98%)


Test set: Average loss: 0.0408, Accuracy: 9866/10000 (99%)


Test set: Average loss: 0.0433, Accuracy: 9859/10000 (99%)


Test set: Average loss: 0.0339, Accuracy: 9885/10000 (99%)


Test set: Average loss: 0.0348, Accuracy: 9881/10000 (99%)


Test set: Average loss: 0.0346, Accuracy: 9895/10000 (99%)


Test set: Average loss: 0.0323, Accuracy: 9886/10000 (99%)


Test set: Average loss: 0.0294, Accuracy: 9909/10000 (99%)


Test set: Average loss: 0.0361, Accuracy: 9902/10000 (99%)


Test set: Average loss: 0.0309, Accuracy: 9905/10000 (99%)


Test set: Average loss: 0.0318, Accuracy: 9902/10000 (99%)


Test set: Average loss: 0.0358, Accuracy: 9906/10000 (99%)


Test set: Average loss: 0.0376, Accuracy: 9896/10000 (99%)


Test set: Average loss: 0.0346, Accuracy: 9909/10000 (99%)


Test set: Average loss: 0.0357, Accuracy: 9905/10000 (99%)


Test set: Average loss:

Let’s take a look at the results, the accuracy is 99%, no problem

If your model can't even handle MNIST, then your model has no value

Even if your model gets MNIST, your model may not have any value

MNIST is a very simple data set. Due to its limitations, it can only be used for research purposes and has very limited value for practical applications. But through this example, we can fully understand the workflow of an actual project

We find the data set, preprocess the data, define our model, adjust the hyperparameters, test training, and then adjust the hyperparameters or adjust the model through the training results.

And through this actual combat, we already have a good template, and future projects can use this template as an example