### [自动微分官方参考链接](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)

### AUTOGRAD: AUTOMATIC DIFFERENTIATION（自动微分技术）

**torch.Tensor的特点:**
* set its attribute **.requires_grad** as True,开始追踪上面的所有操作
* finish your computation you can call **.backward()** ,自动完成所有的梯度计算  
* **.grad** attribute会累计所有的gradient
* stop a tensor from tracking history,**.detach()** to detach it from the computation history, and to prevent future computation from being tracked
* **评估模型的重要技巧：**wrap the code block in with **torch.no_grad()**. This can be particularly helpful when evaluating a model.模型已经有训练好的参数，不需要求导数，因此可以关闭导数。
                 
**还有一个类对自动微分实现十分重要,该类名字是 Function**

类Tensor与Function相互关联并建立一个无环图，图中编码了一个完整的计算历史。
每个Tensor都有.grad_fn属性用于引用Function,这个Function创建了Tensor(except for Tensors created by the user - their grad_fn is None)

**导数的计算：**

   If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.
   
**常用的属性**
* **a.requires_grad_(True)**：开启梯度操作记录, .requires_grad_( ... ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.

**Gradients**

* **out.backward()**：进行梯度计算
* **x.grad**:查看计算的梯度值
* **关闭自动梯度记录的方法**：2种

**自动梯度计算数学上说明**
有vector valued function $\vec{y}=f(\vec{x})$，then the gradient of $\vec{y}$ with respect to $\vec{x}$ is a Jacobian matrix:
\begin{split}J=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
\vdots & \ddots & \vdots\\
\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\end{split}

\begin{split}J^{T}\cdot v=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
\vdots & \ddots & \vdots\\
\frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\left(\begin{array}{c}
\frac{\partial l}{\partial y_{1}}\\
\vdots\\
\frac{\partial l}{\partial y_{m}}
\end{array}\right)=\left(\begin{array}{c}
\frac{\partial l}{\partial x_{1}}\\
\vdots\\
\frac{\partial l}{\partial x_{n}}
\end{array}\right)\end{split}

In [8]:
import torch

In [9]:
x = torch.ones(2, 2, requires_grad=True)   # 追踪tensor上的所有操作
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [10]:
y = x + 2   # y was created as a result of an operation, so it has a grad_fn.
print(y)    # grad_fn:y是通过张量x进行加法操作得到的

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [11]:
z = y * y * 3
print(z)
out = z.mean()
print(z, out)    # 张量z先通过乘法操作。然后再取平均

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


#### 打开梯度操作

In [12]:
print(out)   # 这里的out是一个one item tensor
out.backward()  # 进行梯度计算

tensor(27., grad_fn=<MeanBackward0>)


**Print gradients d(out)/dx:得出梯度的计算结果**
***

You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.


### $o = \frac{1}{4}\sum_i z_i \quad z_i = 3(x_i+2)^2$ 
### $\frac{\partial o}{\partial z_i} = \frac{1}{4} \quad \frac{\partial z_i}{\partial x_i} = 6(x_i+2) \quad \frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$
### $\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$

In [13]:
print(x.grad)    # 计算最终结果对x求导的值 

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [14]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)   # 开启梯度操作记录
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x00000291CBE10AC8>


[norm()的作用](https://blog.csdn.net/devcy/article/details/89218480)
### 不知道这个例子是啥意思

In [15]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:    # data (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy ndarray, scalar, and other types
    print(y.data.norm)
    y = y * 2

print(y)

<bound method Tensor.norm of tensor([ 3.2700, -2.3328,  0.4107])>
<bound method Tensor.norm of tensor([ 6.5400, -4.6656,  0.8213])>
<bound method Tensor.norm of tensor([13.0801, -9.3312,  1.6426])>
<bound method Tensor.norm of tensor([ 26.1602, -18.6625,   3.2853])>
<bound method Tensor.norm of tensor([ 52.3204, -37.3250,   6.5706])>
<bound method Tensor.norm of tensor([104.6407, -74.6500,  13.1411])>
<bound method Tensor.norm of tensor([ 209.2814, -149.2999,   26.2823])>
<bound method Tensor.norm of tensor([ 418.5628, -298.5999,   52.5645])>
tensor([ 837.1257, -597.1997,  105.1291], grad_fn=<MulBackward0>)


In [16]:
print(x)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([ 1.6350, -1.1664,  0.2053], requires_grad=True)
tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


#### stop autograd from tracking history on Tensors
* .requires_grad=True 
* by wrapping the code block in with torch.no_grad():

In [17]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)

True
True
False


In [18]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)
