# A Gentle introducyion to torch.autograd

torch.autograd是驱动神经网络训练的pytorch的自动微分引擎。在这一部分当中，会介绍autograd如何帮助神经网络训练（其实主要的优化算法就是SGD算法）

# 背景

神经网络是一个由一些输入数据运行的nested函数的集合。折线函数是由参数（权重和偏差）定义的。在pytorch中这些参数存贮在tensors当中。
训练一个神经网络有两步工作：

1.前向传播：关于正确的输出做最佳猜想。通过运行输入数据通过每一个函数来进行猜测。

2.后向传播：NN调整它的参数使之于误差成比例。通过从输出反向回传、收集与函数的参数相关的误差的导数（梯度）并使用梯度下降法优化参数，它实现
了这一点。

# 在Pytorch中的使用

Let’s take a look at a single training step. For this example, we load a pretrained resnet18 model from torchvision. We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values.

In [2]:
import torch,torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to C:\Users\Administrator/.cache\torch\checkpoints\resnet18-5c106cde.pth


HBox(children=(FloatProgress(value=0.0, max=46827520.0), HTML(value='')))




Next, we run the input data through the model through each of its layers to make a prediction. This is the forward pass.

In [3]:
prediction = model(data) # forward pass

We use the model’s prediction and the corresponding label to calculate the error (loss). The next step is to backpropagate this error through the network. Backward propagation is kicked off when we call .backward() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter’s .grad attribute.


In [4]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

In [5]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Finally, we call .step() to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in .grad.

In [6]:
optim.step()

# Differentiation in Autograd

Let’s take a look at how autograd collects gradients. We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked(可以追踪).

In [33]:
import torch

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor Q from a and b.

Q=3a^3−b^2

In [34]:
Q=3*a**3-b**2

我们假定a和b是神经网络的系数而Q是误差。在神经网络的训练当中，当我们使用.backward()于Q上的时候，autograd计算这些梯度并将它们存储于相应的tenso的.grad属性当中。

我们需要准确地于Q.argument中传递一个梯度向量因为它是一个向量。gradient是一个和Q有着相同形状的tensor并且它代表着Q本身的梯度，即：dQ/dQ=1



In [35]:
external_grad=torch.tensor([1.,1.])
Q.backward(gradient=external_grad)


In [36]:
print(9*a**2 ==a.grad)
print(-2*b==b.grad)

tensor([True, True])
tensor([True, True])


# Computational Graph(计算图)

注意，虽然我们在优化器中注册了所有参数，但计算梯度的唯一参数（因此在梯度下降中更新）是分类器的权重和偏差。