## 1.二维卷积层

### 1.1 二维互相关运算

卷积层得名于卷积运算,但在卷积层中我们通常使用的是更加直观的``互相关运算``.  
几个名词:  
二维输入数组  
二维核(卷积核,过滤器)  
卷积核窗口(卷积窗口)  


In [1]:
# 输入数组X, 核数组K, 输出数组Y.
import torch
from torch import nn
def corr2d(X, K):
    h, w = K.shape
    Y = torch.zeros((X.shape[0]-h+1, X.shape[1]-w+1))

    # 这个思路不错
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i+h, j:j+w]*K).sum()
    return Y

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]], device = device)
K = torch.tensor([[0, 1], [2, 3]]).to(device)
Y = corr2d(X, K).cuda()
print(Y)
Y.device

tensor([[19., 25.],
        [37., 43.]], device='cuda:0')


device(type='cuda', index=0)

### 1.2 二维卷积层

将输入和卷积核做互相关运算,并加上一个``标量偏差``来得到输出.  
模型参数包括卷积核和标量偏差.

In [3]:
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))
    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

In [4]:
X = torch.ones(6, 8)
X[:, 2:6] = 0
# print(X)
K = torch.tensor([[1, -1]])
Y = corr2d(X, K)
# 实例化一个核数组形状是(1, 2)的二维卷积层
conv2d = Conv2D(kernel_size=(1, 2))

step = 20
lr = 0.01
for i in range(step):
    Y_hat = conv2d(X)
    l = ((Y_hat - Y) ** 2).sum()
    l.backward()

    # 梯度下降
    conv2d.weight.data -= lr * conv2d.weight.grad
    conv2d.bias.data -= lr * conv2d.bias.grad

    # 梯度清0
    conv2d.weight.grad.fill_(0)
    conv2d.bias.grad.fill_(0)

    if (i + 1) % 5 == 0:
        print('Setp %d, los %.3f' % (i+1, l.item()))


Setp 5, los 1.865
Setp 10, los 0.385
Setp 15, los 0.092
Setp 20, los 0.024


观察20次迭代后学习到的卷积核的参数.

In [5]:
print("weight:", conv2d.weight.data)
print(conv2d.weight)
print("bias:", conv2d.bias.data)

weight: tensor([[ 0.9663, -0.9568]])
Parameter containing:
tensor([[ 0.9663, -0.9568]], requires_grad=True)
bias: tensor([-0.0053])


### 1.3 互相关运算和卷积运算


实际上,卷积运算与互相关运算类似.  
``为了得到卷积运算的输出,只需要将核数组左右翻转并上下翻转,再与输入数组做互相关运算.``  

为什么卷积层能够使用互相关运算替代卷积运算?  
核数组都是学习出来的,卷积层无论使用互相关运算还是卷积运算都不影响模型预测时的输出.不同的地方只在与学习出来的参数的形式不同.

## 2.填充和步幅

In [6]:
import torch
from torch import nn

def comp_conv2d(conv2d, X):
    print(X.shape)
    X = X.view((1, 1) +  X.shape)
    print(X.shape)
    Y = conv2d(X)
    return Y.view(Y.shape[2:])

conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
X = torch.rand(8, 8)
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])
torch.Size([1, 1, 8, 8])


torch.Size([8, 8])

### 给Tensor增加维度.  
X.shape => \[8, 8\]  
X.view((1, 1) + X.shape) => \[1, 1, 8, 8\]  
X.view((1,) + X.shape+ (1,)) => \[1, 8, 8, 1\]

## 3.多输入通道和多输出通道

### 1x1卷积层

调整通道数

1x1卷积的计算主要发生在通道维度上.  
输出中的每个元素来自输入中在高和宽上相同位置的元素在不同通道之间的按权重累加.假设我们将通道维度当做特征维,将高和宽维度上的元素当成数据样本,``那么1x1卷积层的作用就与全连接层等价``.

* 1x1卷积层可以被当做保持高和宽维度形状不变的全连接层使用.这样便可以通过``调整网络层之间的通道数``来控制模型复杂度.

## 4.池化层

它的提出是为了``缓解卷积层对位置的过度敏感性``.

## 5.卷积神经网络

LeNet5 1998年

Lenet分为``卷积层块``和``全连接层块``连个部分.

* 卷积层块  
基本单位是``卷积层后接最大池化层``:卷积层用来识别图像里的``空间模式``,如线条和物体局部,之后的``最大池化层``则用来``降低卷积层对位置的敏感性``.  
Lenet的卷积层块由两个这样的基本单位重复堆叠构成.

Lenet第一个卷积层输出通道数为6,第二个卷积层输出通道数则增加到16.这是因为第二个卷积层比第一个卷积层的输入的高和宽要小,所以``增加输出通道使连个卷积层的参数尺寸类似``.

### LeNet模型
下面通过Sequential类来实现Lenet模型.

In [7]:
import time
import torch
from torch import nn, optim
import torch.utils.data as Data
import torchvision
import torchvision.transforms as transforms

import sys
sys.path.append("..")
# import d2lzh_pytorch as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
            nn.Sigmoid(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(6, 16, 5),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2)
        )
        self.fc = nn.Sequential(
            nn.Linear(16*4*4, 120),
            nn.Sigmoid(),
            nn.Linear(120, 84),
            nn.Sigmoid(),
            nn.Linear(84, 10)
        )

    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

net = LeNet()
print(net)


LeNet(
  (conv): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Sigmoid()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): Sigmoid()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=120, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)


In [8]:
isinstance(net, torch.nn.Module)

True

In [9]:
list(net.parameters())[0].device

device(type='cpu')

In [10]:
device

device(type='cuda')

In [11]:
net.to(device)

LeNet(
  (conv): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Sigmoid()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): Sigmoid()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=120, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)

#### 获取数据和训练模型

In [12]:
mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())

In [13]:
batch_size = 256
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)

test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

In [14]:
def evaluate_accuracy(data_iter, net, device=None):
    if device is None and isinstance(net, torch.nn.Module):
        # 如果没有指定device就使用net的device.
        device = list(net.parameters())[0].device
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(net, torch.nn.Module):
                net.eval() # 评估模式,关闭dropout
                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().item()
                net.train() # 改回训练模式
                
            # else: # 自定义的模型
            #     if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数
            #         # 将is_traning设置成False
            #         acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item()
            #     else:
            #         acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
            n += y.shape[0]
    return acc_sum / n

In [15]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print("training on", device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            # print(y_hat.shape)
            # print(y.shape)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))


In [16]:
lr, num_epochs = 0.001, 5
optimizer  = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

training on cuda
epoch 1, loss 1.8064, train acc 0.331, test acc 0.587, time 3.9 sec
epoch 2, loss 0.9569, train acc 0.625, test acc 0.664, time 3.7 sec
epoch 3, loss 0.7969, train acc 0.706, test acc 0.723, time 3.6 sec
epoch 4, loss 0.7031, train acc 0.736, test acc 0.743, time 3.7 sec
epoch 5, loss 0.6454, train acc 0.752, test acc 0.760, time 3.6 sec
