如果设置 ``.requires_grad`` = ``True``, 将记录对张量的操作. 
调用 ``.backward()`` 自动求导. 

梯度的数值存放在张量的 ``.grad`` 属性.

调用 ``.detach()`` 停止记录对张量的操作.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``.

如果张量不是标量调用 ``.backward()``,需要指定梯度方向


In [1]:
import torch

x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


张量操作



In [2]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [3]:
print(y.grad_fn)

<AddBackward0 object at 0x7fc8d8fe5c50>


In [4]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


既往无关

In [5]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7fc8d8f13fd0>


求导

In [6]:
out.backward()

输出 d(out)/dx




In [7]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


求导的数学计算

$o = \frac{1}{4}\sum_i z_i$,

$z_i = 3(x_i+2)^2$ 

$z_i\bigr\rvert_{x_i=1} = 27$.


$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, 


$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



一个相量函数 $\vec{y}=f(\vec{x})$,
$\vec{y}$ 对 $\vec{x}$ 的梯度为雅克比矩阵:

\begin{align}J=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\end{align}

``torch.autograd`` 即是对于
$v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}$,
计算 $v^{T}\cdot J$. 

当 $v$ 为 $l=g\left(\vec{y}\right)$ 的梯度,
也即,
$v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$, $l$ 对 $\vec{x}$ 梯度:

\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\left(\begin{array}{c}
   \frac{\partial l}{\partial y_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial y_{m}}
   \end{array}\right)=\left(\begin{array}{c}
   \frac{\partial l}{\partial x_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial x_{n}}
   \end{array}\right)\end{align}


对于非标量,指定梯度方向


In [8]:
x = torch.randn(3, requires_grad=True)

y = x * 2
y = y * 2

print(y)

tensor([-3.3393, -4.9518,  2.7830], grad_fn=<MulBackward0>)


In [9]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([4.0000e-01, 4.0000e+00, 4.0000e-04])
