## 神经网络

PyTorch中构建神经网络是通过`torch.nn`包。`nn`是建立在`autograd`的基础上，一个`nn.Module`包括了层(layers)和`forward(input)`方法。



一个典型的神经网络训练过程可分为以下步骤：

- 定义神经网络和对应的可学习参数
- 迭代输入数据
- 将输入传入网络中
- 计算loss
- 传播梯度到网络的参数
- 更新网络参数，通常地使用一个简单的规则：$$weight=weight-learn\_rate*gradient$$


### 定义网络

In [1]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Conv-->ReLU-->maxpool
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) #Max pooling stride=(2, 2)
        # 如果卷积核是正方形，可以指定一个值
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x)) # view就是reshape操作，相当于flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        '''计算张量中除了第一个维度，其他维度所有元素值'''
        size = x.size()[1:]  # 取出除了batch的其他维度
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


模型的可学习参数可通过`net.parameters()`获取:

In [2]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 5, 5])


前向传播的输入和输出都是`autograd.Variable`。设置的输入为32x32，使用MNIST训练时将输入放缩到32x32.


In [3]:
input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)

Variable containing:
1.00000e-02 *
 -2.5047  0.7743  0.4515  3.1252 -9.7592  1.3915  4.4318 -7.4003 -9.2446  2.8687
[torch.FloatTensor of size 1x10]



注意`torch.nn`只支持Mini-batch，不支持单个样本，对于`nn.Conv2d`接收一个4D的张量`(nSample,nChannel,Height,Width)`。如果只有一个样本，使用`input.unsqueeze(0)`添加一个假的batch张量。

In [4]:
inputz = Variable(torch.randn(3, 32, 32))
print(inputz.size())
inputz.unsqueeze_(0) # 添加一个假的维度
print(inputz.size())

torch.Size([3, 32, 32])
torch.Size([1, 3, 32, 32])


### Loss Function
loss函数以(output,target)作为输入，计算输出和目标值之间的差距。在`nn`包下有几种不同的loss函数，例如`nn.MSELoss`就是平均方误差：

In [5]:
output = net(input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Variable containing:
 38.7233
[torch.FloatTensor of size 1]



如果调用`loss.backward()`，整个图都在做微分，graph中所有的变量的`.grad`计算梯度。

In [6]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x000001F1D48FFC88>
<AddmmBackward object at 0x000001F1D48FFC88>
<ExpandBackward object at 0x000001F1D48C9DD8>


### BackProp
为了误差反向传播，我们要使用`loss.backward()`，还要先清除已存在的梯度。

In [7]:
net.zero_grad()     # 清除所有的梯度

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward() 

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
None
conv1.bias.grad after backward
Variable containing:
 0.0744
 0.0178
-0.0417
-0.1284
 0.0986
-0.0333
[torch.FloatTensor of size 6]



### Update the weights

一个最简单的更新规则是 Stochastic Gradient Descent (SGD):$$weight = weight - learning\_rate * gradient$$

在使用神经网络时，我们希望使用不同的优化器，例如SGD, Nesterov-SGD, Adam, RMSProp, etc. PyTorch提供了`torch.optim`用于实现这些方法：

In [8]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update