In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
torch.__version__

'1.6.0'

# 3.2  MNIST数据集手写数字识别

## 3.2.1  数据集介绍

* MNIST 包括6万张28x28的训练样本，1万张测试样本。
* 很多教程会对它”下手”，几乎成为一个 “典范”，可以说是计算机视觉里的Hello World。
* 所以这里也会使用MNIST进行实战。

* LeNet-5之所以强大，就是因为当时将MNIST数据的识别率提高到了99%
* 这里自己从头搭建一个卷积神经网络，达到99%的准确率

## 3.2.2 手写数字识别
首先定义一些超参数

In [2]:
# 大概需要2G显存
BATCH_SIZE = 512

# 总共训练批次
EPOCHS = 20

# torch判断是否使用GPU，建议用GPU环境，会快很多
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

* Pytorch里包含**MNIST数据集**，直接使用即可。
* 第一次执行会生成**data文件夹**，并且需要一些时间下载，如果以前下载过不会再次下载
* 官方已经实现dataset，可以直接用**DataLoader读取数据**

In [3]:
train_loader = torch.utils.data.DataLoader(datasets.MNIST(
    'data',
    train=True,
    download=True,
    transform=transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307, ), (0.3081, ))])),
                                           batch_size=BATCH_SIZE,
                                           shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


测试集

In [5]:
test_loader = torch.utils.data.DataLoader(datasets.MNIST(
    'data',
    train=False,
    transform=transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307, ), (0.3081, ))])),
                                          batch_size=BATCH_SIZE,
                                          shuffle=True)

### 下面定义一个网络：
- 网络包含两个卷积层conv1和conv2
- 紧接着两个线性层作为输出
- 最后输出10个维度，这10个维度作为0-9标识，确定识别出的数字
- 将每一层的输入和输出维度都标注出来，阅读代码会方便很多

In [7]:
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()

        # batch*1*28*28 每次送入batch个样本，输入通道数1（黑白图像），图像分辨率28x28
        # 卷积层Conv2d的第一个参数输入通道数，第二个参数输出通道数，第三个参数卷积核大小

        # 输入通道数1，输出通道数10，核大小5
        self.conv1 = nn.Conv2d(1, 10, 5)

        # 输入通道数10，输出通道数20，核大小3
        self.conv2 = nn.Conv2d(10, 20, 3)

        # 全连接层Linear第一个参数输入通道数，第二个参数输出通道数

        # 输入通道数2000，输出通道数500
        self.fc1 = nn.Linear(20 * 10 * 10, 500)

        # 输入通道数500，输出通道数10，即10分类
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):

        # 本例in_size=512，BATCH_SIZE值。输入x可以看成512*1*28*28的张量。
        in_size = x.size(0)

        # batch*1*28*28 -> batch*10*24*24
        # 28x28的图像经过一次核为5x5的卷积，输出变为24x24
        out = self.conv1(x)

        # batch*10*24*24
        # 激活函数ReLU不改变形状
        out = F.relu(out)

        # batch*10*24*24 -> batch*10*12*12
        # 2*2的池化层，减半
        out = F.max_pool2d(out, 2, 2)

        # batch*10*12*12 -> batch*20*10*10
        # 再卷积一次，核大小3
        out = self.conv2(out)

        # batch*20*10*10
        out = F.relu(out)

        # batch*20*10*10 -> batch*2000
        # out第二维-1是自动推算，本例第二维是20*10*10
        out = out.view(in_size, -1)

        # batch*2000 -> batch*500
        out = self.fc1(out)

        # batch*500
        out = F.relu(out)

        # batch*500 -> batch*10
        out = self.fc2(out)

        # 计算log(softmax(x))
        out = F.log_softmax(out, dim=1)

        return out

实例化一个网络，实例化后用.to方法将网络移动到GPU
优化器直接选择简单暴力的**Adam**

In [9]:
model = ConvNet().to(DEVICE)
optimizer = optim.Adam(model.parameters())
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)

定义训练的函数，将训练所有操作封装到这个函数中

In [10]:
def train(model, device, train_loader, optimizer, epoch):
    model.train()

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

        optimizer.zero_grad()

        output = model(data)
        loss = F.nll_loss(output, target)

        loss.backward()

        optimizer.step()

        if (batch_idx + 1) % 30 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

测试操作也封装成一个函数

In [11]:
def test(model, device, test_loader):

    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)

            # 将一批的损失相加
            test_loss += F.nll_loss(output, target, reduction='sum').item()

            # 找到概率最大的下标
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print(
        '\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))

开始训练，这里体现出封装的好处，只要写两行就可以

In [12]:
for epoch in range(1, EPOCHS + 1):
    train(model, DEVICE, train_loader, optimizer, epoch)
    test(model, DEVICE, test_loader)


Test set: Average loss: 0.1097, Accuracy: 9659/10000 (97%)


Test set: Average loss: 0.0578, Accuracy: 9816/10000 (98%)


Test set: Average loss: 0.0468, Accuracy: 9845/10000 (98%)


Test set: Average loss: 0.0392, Accuracy: 9861/10000 (99%)


Test set: Average loss: 0.0368, Accuracy: 9881/10000 (99%)


Test set: Average loss: 0.0349, Accuracy: 9894/10000 (99%)


Test set: Average loss: 0.0302, Accuracy: 9905/10000 (99%)


Test set: Average loss: 0.0294, Accuracy: 9900/10000 (99%)


Test set: Average loss: 0.0381, Accuracy: 9878/10000 (99%)


Test set: Average loss: 0.0330, Accuracy: 9898/10000 (99%)


Test set: Average loss: 0.0319, Accuracy: 9896/10000 (99%)


Test set: Average loss: 0.0313, Accuracy: 9901/10000 (99%)


Test set: Average loss: 0.0346, Accuracy: 9901/10000 (99%)


Test set: Average loss: 0.0298, Accuracy: 9911/10000 (99%)


Test set: Average loss: 0.0351, Accuracy: 9898/10000 (99%)


Test set: Average loss: 0.0332, Accuracy: 9911/10000 (99%)


Test set: Average loss:

准确率99%，没问题。如果模型连MNIST都搞不定，那没有任何价值。即使模型搞定MNIST，也可能没有任何价值。

MNIST是一个简单数据集，由于局限性只能作为研究用途，实际应用的价值非常有限。但通过这个例子，可以了解一个实际项目的工作流程。

* 找到数据集，预处理数据
* 定义模型
* 调整超参数
* 测试训练
* 通过训练结果调整超参数，或调整模型

通过这个实战已经有一个很好模板，以后项目都可以以这个模板为样例