In [1]:
#include "stdio.h"
#include "stdlib.h"
#include <iostream>
#include <vector>  

/*a workaround to solve cling issue*/
#include "../macos_cling_workaround.hpp"
/*set libtorch path, load libs*/
#include "../load_libtorch.hpp"
/*import custom defined macros*/
#include "../custom_def.hpp"
/*import libtorch header file*/
#include <torch/torch.h>

std::cout << std::boolalpha;

> ***自动求梯度是PyTorch非常有特色的一项功能，不论是机器学习还是深度学习，我们经常需要对函数求梯度，特别是深度神经网络在执行反向传播操作时，求梯度是必备功能之一。***

> ***PyTorch提供的autograd包能够根据输入和前向传播过程自动构建计算图，并执行反向传播。***

# 1. 简单示例

PyTorch(libtorch)提供的autograd功能是围绕着torch::Tensor来的,如果将tensor的属性.requires_grad设置为True，它将开始追踪(track)在其上的所有操作（这样就可以利用链式法则进行梯度传播了）。完成计算后，可以调用.backward()来完成所有梯度计算。此Tensor的梯度将累积到.grad属性中。如果不想要被继续追踪，可以调用.detach()将其从追踪记录中分离出来，这样就可以防止将来的计算被追踪，这样梯度就传不过去了。此外，还可以用`torch::NoGradGuard no_grad;` 将不想被追踪的操作代码块包裹起来，这种方法在评估模型的时候很常用，因为在评估模型时，我们并不需要计算可训练参数（requires_grad=True）的梯度。

Function是另外一个很重要的类。Tensor和Function互相结合就可以构建一个记录有整个计算过程的有向无环图（DAG）。每个Tensor都有一个.grad_fn属性，该属性即创建该Tensor的Function, 就是说该Tensor是不是通过某些运算得到的，若是，则grad_fn返回一个与这些运算相关的对象，否则是None。

下面先通过一些例子看看autograd功能是如何使用的(注：此处我们并不完全按照“dive into DL pytorch”教程中的方式来，而是结合了[官方教程](https://pytorch.org/tutorials/advanced/cpp_autograd.html))

In [2]:
//创建一个Tensor并设置requires_grad=true去跟踪其计算过程:
torch::Tensor x = torch::ones({2,2}, torch::requires_grad(true));
printT(x);

x = x+2;
//对于tensor加法而言，其梯度函数为：AddBackward1
printT(x.grad_fn()->name());

//如果上面参数torch::requires_grad设置为false，则上面语句会报错：
//null passed to a callee that requires a non-null argument [-Wnonnull]

// 下面再看几个运算符的反向传播函数名字
auto y = x * x *3;
auto out = y.mean();

printT(y);
printT(y.grad_fn()->name());

printT(out);
printT(out.grad_fn()->name());

x = 
 1  1
 1  1
[ CPUFloatType{2,2} ]
<<--->>

x.grad_fn()->name() = 
AddBackward1
<<--->>

y = 
 27  27
 27  27
[ CPUFloatType{2,2} ]
<<--->>

y.grad_fn()->name() = 
MulBackward1
<<--->>

out = 
27
[ CPUFloatType{} ]
<<--->>

out.grad_fn()->name() = 
MeanBackward0
<<--->>



***初始化Tensor后再修改其autograd属性：***

In [3]:
/* ***********************************************
 * 新创建的Tensor是默认不支持autograd属性的，
 * 如下面这个例子，实际运行时会报错：
 *
 * torch::Tensor a = torch::ones({2,2});
 * auto b = a*3;
 * std::cout << b.grad_fn()->name() << std::endl;
 * 
 * 但是可以使用requires_grad_()这个内建函数来改变：
 * a.requires_grad_(true);
 ************************************************* */

auto a = torch::randn({2, 2});
a = ((a * 3) / (a - 1));
printT(a.requires_grad());

a.requires_grad_(true);
printT(a.requires_grad());

auto b = (a * a).sum();
printT(b.grad_fn()->name());


a.requires_grad() = 
false
<<--->>

a.requires_grad() = 
true
<<--->>

b.grad_fn()->name() = 
SumBackward0
<<--->>



***用tensor的内建函数backward()求梯度***

我们来看看out关于x的梯度 $\frac{d(out)}{dx}$，
我们令out为 $o$ , 因为 $$ o=\frac14\sum_{i=1}^4z_i=\frac14\sum_{i=1}^43(x_i+2)^2 $$ 所以 $$ \frac{\partial{o}}{\partial{x_i}}\bigr\rvert_{x_i=1}=\frac{9}{2}=4.5 $$ 所以上面的输出是正确的。

数学上，如果有一个函数值和自变量都为向量的函数 $\vec{y}=f(\vec{x})$, 那么 $\vec{y}$ 关于 $\vec{x}$ 的梯度就是一个雅可比矩阵（Jacobian matrix）: $$ J=\left(\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\ \vdots & \ddots & \vdots\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} \end{array}\right) $$ 而torch.autograd这个包就是用来计算一些雅克比矩阵的乘积的。例如，如果 $v$ 是一个标量函数的 $l=g\left(\vec{y}\right)$ 的梯度： $$ v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right) $$ 那么根据链式法则我们有 $l$ 关于 $\vec{x}$ 的雅克比矩阵就为: $$ v J=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right) \left(\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\ \vdots & \ddots & \vdots\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} \end{array}\right)=\left(\begin{array}{ccc}\frac{\partial l}{\partial x_{1}} & \cdots & \frac{\partial l}{\partial x_{n}}\end{array}\right) $$

注意：grad在反向传播过程中是累加的(accumulated)，这意味着每一次运行反向传播，梯度都会累加之前的梯度，所以一般在反向传播之前需把梯度清零。

In [4]:
//再回到刚开始那个例子，给out变量做一次反向传播运算，看看其梯度是多少
torch::Tensor x = torch::ones({2,2}, torch::requires_grad(true));
auto y = x+2;
auto z = y * y *3;
auto out = z.mean();
//求梯度，只能针对scalar类型的变量进行
out.backward();

printT(x.grad());
//梯度是累加的，求完后需要清理一下
x.grad().data().zero_();


x.grad() = 
 4.5000  4.5000
 4.5000  4.5000
[ CPUFloatType{2,2} ]
<<--->>



In [5]:
torch::Tensor x = torch::randn(3, torch::requires_grad());

torch::Tensor y = x * 2;
// while (y.norm().item<double>() < 1000) {
//   y = y * 2;
// }

printT(y);
printT(y.grad_fn()->name());

auto v = torch::tensor({0.1, 1.0, 0.0001}, torch::kFloat);
y.backward(v);

printT(x.grad());

std::cout << x.requires_grad() << std::endl;
y = x.detach();
std::cout << y.requires_grad() << std::endl;
std::cout << x.eq(y).all().item<bool>() << std::endl;

y = 
 1.6286
-5.3786
 1.0283
[ CPUFloatType{3} ]
<<--->>

y.grad_fn()->name() = 
MulBackward1
<<--->>

x.grad() = 
 0.2000
 2.0000
 0.0002
[ CPUFloatType{3} ]
<<--->>

true
false
true


# 2.复杂点的例子，构造自己的前向/反向传播函数

在上面一节里，我们通过一些基本的算术运算来演示了反向传播计算的过程，下面通过构造一个线性运算类来演示如何自行编写前向、反向传播函数；

In [6]:
using namespace torch::autograd;

// Inherit from Function
class LinearFunction : public Function<LinearFunction> {
 public:
  // Note that both forward and backward are static functions

  // bias is an optional argument
  static torch::Tensor forward(
      AutogradContext *ctx, torch::Tensor input, torch::Tensor weight, torch::Tensor bias = torch::Tensor()) {
    ctx->save_for_backward({input, weight, bias});
    auto output = input.mm(weight.t());
    if (bias.defined()) {
      output += bias.unsqueeze(0).expand_as(output);
    }
    return output;
  }

  static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
    auto saved = ctx->get_saved_variables();
    auto input = saved[0];
    auto weight = saved[1];
    auto bias = saved[2];

    auto grad_output = grad_outputs[0];
    auto grad_input = grad_output.mm(weight);
    auto grad_weight = grad_output.t().mm(input);
    auto grad_bias = torch::Tensor();
    if (bias.defined()) {
      grad_bias = grad_output.sum(0);
    }

    return {grad_input, grad_weight, grad_bias};
  }
};


////////////////////////////////////////////////////////
//
////////////////////////////////////////////////////////


auto x = torch::randn({2, 3}).requires_grad_();
auto weight = torch::randn({4, 3}).requires_grad_();
auto y = LinearFunction::apply(x, weight);
y.sum().backward();

std::cout << x.grad() << std::endl;
std::cout << weight.grad() << std::endl;

std::cout << y.grad_fn()->name() << std::endl;

y.detach();

 0.8849 -0.5697  3.1114
 0.8849 -0.5697  3.1114
[ CPUFloatType{2,3} ]
 2.7916  0.3543 -1.0179
 2.7916  0.3543 -1.0179
 2.7916  0.3543 -1.0179
 2.7916  0.3543 -1.0179
[ CPUFloatType{4,3} ]
torch::autograd::CppNode<LinearFunction>


### Python API与C++ API对照表


| Python                       | C++                                                          |
| :---------------------------- | :------------------------------------------------------------ |
| `torch.autograd.backward`    | `torch::autograd::backward` ([link](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1afa9b5d4329085df4b6b3d4b4be48914b.html)) |
| `torch.autograd.grad`        | `torch::autograd::grad` ([link](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1e03c42b14b40c306f9eb947ef842d9c.html)) |
| `torch.Tensor.detach`        | `torch::Tensor::detach` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor6detachEv)) |
| `torch.Tensor.detach_`       | `torch::Tensor::detach_` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor7detach_Ev)) |
| `torch.Tensor.backward`      | `torch::Tensor::backward` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor8backwardERK6Tensorbb)) |
| `torch.Tensor.register_hook` | `torch::Tensor::register_hook` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4I0ENK2at6Tensor13register_hookE18hook_return_void_tI1TERR1T)) |
| `torch.Tensor.requires_grad` | `torch::Tensor::requires_grad_` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor14requires_grad_Eb)) |
| `torch.Tensor.retain_grad`   | `torch::Tensor::retain_grad` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor11retain_gradEv)) |
| `torch.Tensor.grad`          | `torch::Tensor::grad` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor4gradEv)) |
| `torch.Tensor.grad_fn`       | `torch::Tensor::grad_fn` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor7grad_fnEv)) |
| `torch.Tensor.set_data`      | `torch::Tensor::set_data` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor8set_dataERK6Tensor)) |
| `torch.Tensor.data`          | `torch::Tensor::data` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor4dataEv)) |
| `torch.Tensor.output_nr`     | `torch::Tensor::output_nr` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor9output_nrEv)) |
| `torch.Tensor.is_leaf`       | `torch::Tensor::is_leaf` ([link](https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor7is_leafEv)) |



# 3.结合源码，看看autograd的实现

通常，我们采用下述方式创建支持autograd功能的tensor：

In [7]:
torch::Tensor t1 = torch::empty({2,2}, torch::requires_grad());

//or

torch::Tensor t2 = torch::empty({2,2}, torch::requires_grad(true));

问题来了，这个requires_grad(...)是哪里来的？ 
看下面代码片段可知，其来自TensorOptions.h:

```
/// Convenience function that returns a `TensorOptions` object with the
/// `requires_grad` set to the given one.
inline TensorOptions requires_grad(bool requires_grad = true) {
    return TensorOptions().requires_grad(requires_grad);
}

```

即， `torch::requires_grad(...)` 返回了一个TensorOptions对象，且其requires_grad_属性默认是true；

顺着上述代码，可以看到TensorOptions的requires_grad(...)函数：

``` 
/// Sets the `requires_grad` property of the `TensorOptions`.
C10_NODISCARD TensorOptions requires_grad(c10::optional<bool> requires_grad) const noexcept { 
  TensorOptions r = *this;
  r.set_requires_grad(requires_grad);
  return r;                                                                                                   }

```

```
/// Mutably set the `requires_grad` property of `TensorOptions`.
void set_requires_grad(c10::optional<bool> requires_grad) & noexcept {                                                                                                                 
  if (requires_grad) {
    requires_grad_ = *requires_grad;
    has_requires_grad_ = true;
  } else {
    has_requires_grad_ = false;
  }   
}

```
鼓捣半天，其实就是给这个TensorOptions对象的requires_grad_属性赋值 ：（
显然，这还不是我们想要的结果，我们想看的是tensor对象的autograd实现，继续跟踪源码：