原论文地址：https://www.researchgate.net/publication/323419312_Twelve-layer_deep_convolutional_neural_network_with_stochastic_pooling_for_tea_category_classification_on_GPU_platform

鄙人不是人工智能专业的，但是对这个领域非常好奇，最近找到了唐老师的一篇论文，用所学的知识进行了简单的复现。

但是由于理论知识和对pytorch使用不够，其中的随机池化（stochastic pooling），目前我无法实现，只能使用torch自带的模块进行模型的搭建。


### 开发环境
由于本科是做大数据的，经常需要集群，一套大数据服务下来内存动不动占用20G+，所以内存直接给到了32G，但是测试跑深度模型内存占不了太多，重要的是显卡，本人显卡3060 Laptop，功耗130W，6G显存，算力49左右，基本可以跑模型。
* 操作系统：win11
* cpu：i7-11800H
* 显卡：3060 Laptop（6G）
* 内存：32G
* cuda版本：11.6
* python版本：3.6.9
* pytorch版本：1.9


In [1]:
# -*- coding:utf-8 -*-
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset
from torchvision import datasets, transforms
from torchsummary import summary
from torch.utils.tensorboard import SummaryWriter
from data_augmentation import data_enhance_rotate, data_enhance_gamma
# 使用GPU训练，3060显卡，30个epoch一共需要不到半小时
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### 制作数据集
没有原文茶叶数据集，所以我就在kaggle上搜索了相似的花分类数据集

kaggle地址在：https://www.kaggle.com/alxmamaev/flowers-recognition?select=flowers

其中有五类花共4317个图片，只采用了3类，一共选取900张图片

数据集目录树：

    datasets

    +---test_data
    
    |   +---daisy
    
    |   +---dandelion
    
    |   \---rose
    
    \---train_data
    
        +---daisy
        
        +---dandelion
        
        \---rose
        
其中训练集共有300张，三类分别各有100张

测试集共有600张，三类各有200张

经过数据增强后，训练集共有18300张

数据增强函数在data_augmentation.py中

In [2]:
data_dir = './datasets'
BATCH_SIZE = 256  # 256实测占用4-5G显存

data_transforms = {
    'train_data': transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])  # 均值，标准差
        # transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])  # 均值，标准差
    ]),
    'test_data': transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in
                  ['train_data', 'test_data']}
# 旋转图片，从-15度到15度，每次递增1，跳过0度，一共产生9000张
image_datasets['train_data'] += data_enhance_rotate(data_dir)
# 伽马纠正，从0.7到1.3，每次递增0.02，总共30次，一共产生9000张
image_datasets['train_data'] += data_enhance_gamma(data_dir)
train_loader = torch.utils.data.DataLoader(image_datasets.get("train_data"), batch_size=BATCH_SIZE, shuffle=True)
test_loader = torch.utils.data.DataLoader(image_datasets.get("test_data"), batch_size=BATCH_SIZE, shuffle=True)
# train_loader一共18300数据，test_loader一共600数据

### 查看数据

In [11]:
writer = SummaryWriter("./logs")
# 查看第一批的数据集，总共256张，（256, 3, 256, 256）：
for imgs, labels in train_loader:
    writer.add_images("imgs", imgs)
    break
    
writer.close()

<img src="./images/images_show.png" alt="images_show" style="zoom: 80%;" />

### 模型搭建
按照原文进行模型搭建，随机池化没有实现，就用torch自带的最大池化代替了

模型一共1,627,563个参数

In [3]:

"""
    原文中的网络模型，不过随机池化没有实现
"""
class CNN_SP(nn.Module):
    def __init__(self):
        super().__init__()
        # 下面的卷积层Conv2d的第一个参数指输入通道数，第二个参数指输出通道数，第三个参数指卷积核的大小
        self.conv1 = nn.Conv2d(3, 40, 3, stride=3, padding=1)
        self.conv2 = nn.Conv2d(40, 80, 5, stride=3, padding=0)
        self.conv3 = nn.Conv2d(80, 120, 3, stride=3, padding=1)
        self.conv4 = nn.Conv2d(120, 120, 3, stride=1, padding=1)
        self.conv5 = nn.Conv2d(120, 120, 3, stride=1, padding=1)
        # 原文中dropout的比率为0.1，防止过拟合
        self.dropout_layer = torch.nn.Dropout(0.1)
        # 下面的全连接层Linear的第一个参数指输入通道数，第二个参数指输出通道数
        self.fc1 = nn.Linear(120 * 10 * 10, 100)
        self.fc2 = nn.Linear(100, 3)

    def forward(self, x):
        in_size = x.size(0)
        out = self.conv1(x)
        out = F.relu(out)
        out = F.max_pool2d(out, 3, 1, 1)
        # out = self.pool1(out)
        out = self.conv2(out)
        out = F.relu(out)
        out = F.max_pool2d(out, 3, 1, 1)
        # out = self.pool2(out)
        out = self.conv3(out)
        out = F.relu(out)
        out = F.max_pool2d(out, 3, 1, 1)
        # out = self.pool3(out)
        out = self.conv4(out)
        out = F.relu(out)
        out = F.max_pool2d(out, 3, 1, 1)
        # out = self.pool4(out)
        out = self.conv5(out)
        out = F.relu(out)
        out = F.max_pool2d(out, 3, 1, 1)
        # out = self.pool5(out)
        out = out.view(in_size, -1)
        out = self.fc1(out)
        out = F.relu(out)
        out = self.dropout_layer(out)
        out = self.fc2(out)
        out = F.log_softmax(out, dim=1)  # 计算log(softmax(x))
        return out


# model = StochasticPooling().to(DEVICE)
model = CNN_SP().to(DEVICE)
# summary(model, (40, 86, 86))
summary(model, (3, 256, 256))

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 40, 86, 86]          1,120
├─Conv2d: 1-2                            [-1, 80, 28, 28]          80,080
├─Conv2d: 1-3                            [-1, 120, 10, 10]         86,520
├─Conv2d: 1-4                            [-1, 120, 10, 10]         129,720
├─Conv2d: 1-5                            [-1, 120, 10, 10]         129,720
├─Linear: 1-6                            [-1, 100]                 1,200,100
├─Dropout: 1-7                           [-1, 100]                 --
├─Linear: 1-8                            [-1, 3]                   303
Total params: 1,627,563
Trainable params: 1,627,563
Non-trainable params: 0
Total mult-adds (M): 106.47
Input size (MB): 0.75
Forward/backward pass size (MB): 3.01
Params size (MB): 6.21
Estimated Total Size (MB): 9.97


Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 40, 86, 86]          1,120
├─Conv2d: 1-2                            [-1, 80, 28, 28]          80,080
├─Conv2d: 1-3                            [-1, 120, 10, 10]         86,520
├─Conv2d: 1-4                            [-1, 120, 10, 10]         129,720
├─Conv2d: 1-5                            [-1, 120, 10, 10]         129,720
├─Linear: 1-6                            [-1, 100]                 1,200,100
├─Dropout: 1-7                           [-1, 100]                 --
├─Linear: 1-8                            [-1, 3]                   303
Total params: 1,627,563
Trainable params: 1,627,563
Non-trainable params: 0
Total mult-adds (M): 106.47
Input size (MB): 0.75
Forward/backward pass size (MB): 3.01
Params size (MB): 6.21
Estimated Total Size (MB): 9.97

### 自适应学习率，优化器，tensorboard的配置
* 学习率：开始0.01，每10个epch就将lr减少10倍，测试了每3个epoch减少一次lr比固定lr准确率高了5%，但是10个epoch减少一次lr不知为何准确率上不去
* 优化器：相比较SGDM与Adam，Adam拟合速度比较快，但最终结果差不多
* tensorboard：将数据写入logs文件夹中

In [4]:
# tensorboard, 记录loss和acc
writer = SummaryWriter("./logs")
start_lr = 0.01
optimizer = optim.SGD(model.parameters(), lr=start_lr, momentum=0.9)
# optimizer = optim.Adam(model.parameters(), lr=start_lr)

'''
    自适应学习率，复现原文中的每10个epoch将学习率减少10倍
'''
def adjust_learning_rate(optimizer, epoch, start_lr):
    lr = start_lr * (0.1 ** (epoch // 10))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

### 训练与测试函数

In [5]:
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    adjust_learning_rate(optimizer, epoch, start_lr)
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        # loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if (batch_idx + 1) % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f} \tLr:{:.2E}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item(),
                optimizer.state_dict()['param_groups'][0]['lr']))
            writer.add_scalar('train_loss', loss.item(), (epoch - 1) * len(train_loader) + batch_idx)


def test(model, device, test_loader, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.cross_entropy(output, target, reduction='sum').item()  # 将一批的损失相加
            pred = output.max(1, keepdim=True)[1]  # 找到概率最大的下标
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    writer.add_scalar('test_acc', 100. * correct / len(test_loader.dataset), epoch)
    writer.add_scalar('test_loss', test_loss, epoch)

### 开始训练

In [6]:
EPOCHS = 30
for epoch in range(1, EPOCHS + 1):
    train(model, DEVICE, train_loader, optimizer, epoch)
    test(model, DEVICE, test_loader, epoch)
writer.close()


Test set: Average loss: 0.8285, Accuracy: 392/600 (65%)


Test set: Average loss: 0.7292, Accuracy: 444/600 (74%)


Test set: Average loss: 0.8427, Accuracy: 429/600 (72%)


Test set: Average loss: 1.2516, Accuracy: 448/600 (75%)


Test set: Average loss: 1.1084, Accuracy: 393/600 (66%)


Test set: Average loss: 1.4732, Accuracy: 461/600 (77%)


Test set: Average loss: 1.4280, Accuracy: 439/600 (73%)


Test set: Average loss: 2.0407, Accuracy: 447/600 (74%)


Test set: Average loss: 1.9628, Accuracy: 461/600 (77%)


Test set: Average loss: 2.0243, Accuracy: 458/600 (76%)


Test set: Average loss: 2.0913, Accuracy: 457/600 (76%)


Test set: Average loss: 2.1379, Accuracy: 456/600 (76%)


Test set: Average loss: 2.1743, Accuracy: 456/600 (76%)


Test set: Average loss: 2.2068, Accuracy: 454/600 (76%)


Test set: Average loss: 2.2453, Accuracy: 452/600 (75%)


Test set: Average loss: 2.2781, Accuracy: 453/600 (76%)




Test set: Average loss: 2.2891, Accuracy: 450/600 (75%)


Test set: Average loss: 2.3235, Accuracy: 451/600 (75%)


Test set: Average loss: 2.3502, Accuracy: 452/600 (75%)


Test set: Average loss: 2.3517, Accuracy: 452/600 (75%)


Test set: Average loss: 2.3540, Accuracy: 450/600 (75%)


Test set: Average loss: 2.3559, Accuracy: 451/600 (75%)


Test set: Average loss: 2.3566, Accuracy: 450/600 (75%)


Test set: Average loss: 2.3591, Accuracy: 450/600 (75%)


Test set: Average loss: 2.3600, Accuracy: 449/600 (75%)


Test set: Average loss: 2.3633, Accuracy: 449/600 (75%)


Test set: Average loss: 2.3654, Accuracy: 451/600 (75%)


Test set: Average loss: 2.3669, Accuracy: 451/600 (75%)


Test set: Average loss: 2.3699, Accuracy: 449/600 (75%)


Test set: Average loss: 2.3700, Accuracy: 449/600 (75%)



### 训练过程可视化
从上到下，从左往右三图依次是训练集的loss变化、测试集的正确率变化、测试集的loss变化
<img src="./images/res_show.png" />

### 总结
#### 1. 测试集的正确率基本在第10个epoch(76%)就开始下降了, 而此时lr为0.001
#### 2. 训练集的loss在不断下降，但测试集的loss值在不断上升，推测觉得是数据集太少或者数据集处理不当导致的，即过拟合了
#### 3. 最高一次准确率在81%，是自适应学习率每3个epoch减少10倍，但是要有个下限