# 误差反向传播
求解梯度有两种方法：
- 上一章中的数值微分法
- 本章中的反向传播法

理解反向传播的过程有两种放方法：
- 数学公式
- 计算图

使用计算图的好处：
- 局部计算：无论计算图有多么复杂，都可以只关注局部的计算
- 使用计算图可以更好地理解反向传播的过程
- 使用计算图可以高效的计算梯度

链式法则：
- 定义：如果某个函数可以由复合函数表示，则该复合函数的导数可以由构成该符合函数的各个函数的导数乘积表示
- 使用公式表达为：$$\frac{\partial f}{\partial x} = \prod \limits_{i=0}^n \frac{\partial f_i}{\partial x_i}$$
- 加法节点的反向传播会将上游的值原封不动的传递给下游
- 乘法节点的反向传播会将输入信号翻转后传给下游

### 乘法层的实现
- 将梯度翻转后传递给下游

In [3]:
# 乘法层反向传播的简单实现
class MulLayer:
    def __init__(self):
        # self.x = None
        # self.y = None
        pass

    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x * y

        return out

    def backward(self, dout):
        dx = dout * self.y
        dy = dout * self.x

        return dx, dy

In [4]:
# 测试乘法层的正向、反向传播
apple = 100
apple_num = 2
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

print(f'apple_price: {apple_price}')
print(f'tax_price: {price}')

# backforward
# 最终价格自己关于自己的梯度是1
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)
print(f'grad_apple_price:{dapple_price}')
print(f'grad_tax:{dtax}')
print(f'grad_apple:{dapple}')
print(f'grad_apple_num:{dapple_num}')

apple_price: 200
tax_price: 220.00000000000003
grad_apple_price:1.1
grad_tax:200
grad_apple:2.2
grad_apple_num:110.00000000000001


### 加法层的实现
- 将梯度原封不动的传递给下游

In [5]:
class AddLayer:
    def __init__(self):
        pass
    def forward(self, x, y):
        out = x + y
        return out
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy

In [6]:
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
apple_layer = MulLayer()
orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = apple_layer.forward(apple, apple_num)
orange_price = orange_layer.forward(orange, orange_num)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)
price = mul_tax_layer.forward(all_price, tax)

# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)
dorange_price, dorange_num = orange_layer.backward(dorange_price)
dapple_price, dapple_num = apple_layer.backward(dapple_price)

# 查看反向传播的梯度
print(f'grad_apple_price:{dapple_price}')
print(f'grad_orange_price:{dorange_price}')
print(f'grad_tax:{dtax}')
print(f'grad_apple:{dapple_num}')
print(f'grad_orange:{dorange_num}')


grad_apple_price:2.2
grad_orange_price:3.3000000000000003
grad_tax:650
grad_apple:110.00000000000001
grad_orange:165.0


### ReLU层的实现
- 当x>0时，将信号原封不动的传递给下游
- 当x<=0时，传递给下游的信号将在这里终止

In [1]:
class ReluLayer():
    def __init__(self):
        pass
    def forward(self, x):
        self.mask = (x<=0)
        out = x.copy()
        out[self.mask] = 0
        return out
    def backward(self, dout):
        dout[self.mask]=0
        dx = dout
        return dx

# Sigmoid层的实现
- 上面加法和乘法层的实现，用的是具体的数值来向下游反向传递梯度的，这里的梯度就是具体的值
- Sigmoid这里直接用的函数的导数，把导数当成梯度，向下游传递把导数当成梯度反向传递

In [None]:
import numpy as np
class Sigmoid:
    def __init__(self):
        self.out = None
    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        self.out = out
        return out
    # 这里用了比较多的数学推导
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out
        return dx