## 深度卷积神经网络(AlexNet)

* LeNet-5   1998年  
通用GPU这个概念在2001年开始兴起,涌现出OpenCL和CUDA之类的编程框架.使得GPU在``2010年前后``开始被机器学习社区使用.  
* AlexNet  2012年  
首次证明学习到的特征可以超越手工设计的特征.



在LeNet网络中,可以看到,神经网络可以直接基于图像的原始像素进行分类.这种称为``端到端(end-to-end)``的方法节省了很多中间步骤.  

早期的研究过程中,更流行的是研究者通过勤劳所设计并生成的手工特征,这类图像分类研究的主要流程如下:  
1.获取图像数据集;  
2.使用已有的特征提取函数生成图像特征;  
3.使用机器学习模型对图像的特征分类.  


机器学习研究者:    致力于使用优雅的定理证明许多``分类器``的性质,机器学习领域生机勃勃,严谨而及其有用.  
计算机视觉研究者:  致力于``数据和特征``.即使用干净和较有效的特征甚至比机器学习模型的选择对图像分类结果的影响更大.

### AlexNet
* 包含``8层变换``,其中5层卷积和2层全连接隐藏及1层全连接输出.  
* 将Sigmoid激活函数换成了更加简单的``ReLU函数``.
* 使用了``丢弃法(dropout)``控制全连接层的模型复杂度.
* 引入了大量的``图像增广``,如翻转,裁剪和颜色变化,从而进一步``扩大数据集来缓解过拟合``.

In [1]:
import time
import torch
from torch import nn, optim
import torchvision

import sys
sys.path.append("..")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=96, kernel_size=11, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # 减小卷积窗口,增大输出通道数
            nn.Conv2d(96, 256, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(3, 2),
            # 接下来连续3个卷积层,前两个卷积层后不使用池化层来减小输入的高和宽
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(3, 2)
        )

        self.fc = nn.Sequential(
            nn.Linear(256*5*5, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            # 输出层
            nn.Linear(4096, 10) # 不经过激活函数的logits
        )
        
    def forward(self, img):
        feature = self.conv(img)
        ouput = self.fc(feature.view(img.shape[0], -1))
        return ouput


In [2]:
net = AlexNet()
print(net)

AlexNet(
  (conv): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU()
    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU()
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=6400, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (

### 读取数据

``组合实现``一个``transform``,对输入图像进行变换,将图像高和宽扩大到AlexNet使用的高和宽224.

In [3]:
resize = 224
trans = []
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans) # 将两个变换串联起来

In [4]:
mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transform)

batch_size = 128
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

### 训练

In [34]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('train on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n_train, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            # print("y.shape", y.shape) # [128]
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            # train_l_sum += l.cpu().item() # loss复制到cpu上
            train_l_sum += l.item()
            # train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n_train += y.shape[0]
            batch_count += 1
        # test_acc = evaluate_accuracy(test_iter, net)
        with torch.no_grad():
            test_acc_sum, n_test = 0.0, 0 # 创建在内存(CPU)
            for X_t, y_t in test_iter:
                net.eval() # 评估模式
                test_acc_sum += (net(X_t.to(device)).argmax(dim=1) == y_t.to(device)).sum().item()  # 对Tensor进行.item()取值后,得到的就是一个Python Scalar.
                net.train() # 训练模式
                n_test += y_t.shape[0]
            test_acc = test_acc_sum / n_test

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
        % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n_train, test_acc, time.time() - start))

In [35]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

train on cuda
epoch 1, loss 0.1484, train acc 0.948, test acc 0.938, time 60.5 sec
epoch 2, loss 0.1371, train acc 0.950, test acc 0.875, time 60.3 sec
epoch 3, loss 0.1377, train acc 0.951, test acc 1.000, time 60.4 sec
epoch 4, loss 0.1291, train acc 0.953, test acc 0.938, time 60.2 sec
epoch 5, loss 0.1216, train acc 0.956, test acc 0.875, time 60.1 sec
