# 5章　誤差逆伝播法
学習の際，重みパラメータに関する損失関数の勾配を数値微分によって求めたが，数値微分より，効率よく行う手法

* 計算グラフ (computational graph)
* 順伝播 (forward propagation)
* 逆伝播 (backward propagation)

In [5]:
# jupyter notebook に画像を表示するために下記を書いた
%matplotlib inline

In [4]:
import numpy as np

In [1]:
# coding: utf-8
# p.137 乗算layer
class MulLayer:
    def __init__(self):
        self.x = None # 順伝播時の入力値を保持するため cf.p.135
        self.y = None

    def forward(self, x, y): # 順伝播
        self.x = x
        self.y = y
        out = x * y

        return out

    def backward(self, dout): # 逆伝播 dout=微分
        dx = dout * self.y # 乗算ノードなので x と y をひっくり返す
        dy = dout * self.x

        return dx, dy

In [4]:
# coding: utf-8
# p.138
from layer_naive import *


apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num) # 200
price = mul_tax_layer.forward(apple_price, tax) # 220

print(int(price))

220


In [19]:
# p.138
# backward
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice) # 1.1 200
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

"""
dapple, dapple_num = mul_apple_layer.backward(1.1): 
        dapple     = 1.1 * self.y=2=apple_num # 乗算ノードなので微分と，保持した伝播時の入力値をかけている cf.p.134
        dapple_num = 1.1 * self.x=100=apple

        return dapple=2.2, dapple_num=110
"""
print(dapple,int(dapple_num), int(dtax))

2.2 110 200


In [6]:
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dTax:", dtax)

dApple: 2.2
dApple_num: 110
dTax: 200


In [None]:
# p.139
class AddLayer: # 加算layer
    def __init__(self):
        pass

    def forward(self, x, y): # 順伝播
        out = x + y

        return out

    def backward(self, dout): # 逆伝播
        dx = dout * 1
        dy = dout * 1

        return dx, dy

In [1]:
# coding: utf-8
# p.140
from layer_naive import *

apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer() # (1)
mul_orange_layer = MulLayer() # (2)
add_apple_orange_layer = AddLayer() # (3)
mul_tax_layer = MulLayer() # (4)

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)
orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)
price = mul_tax_layer.forward(all_price, tax)  # (4)

# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)

print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dOrange:", dorange)
print("dOrange_num:", int(dorange_num))
print("dTax:", dtax)

price: 715
dApple: 2.2
dApple_num: 110
dOrange: 3.3000000000000003
dOrange_num: 165
dTax: 650


In [2]:
dall_price

1.1

In [6]:
# p.142
x = np.array([[1.0, -0.5], [-2.0, 3.0]])
print(x)

[[ 1.  -0.5]
 [-2.   3. ]]


In [7]:
mask=(x <= 0)
print(mask)

[[False  True]
 [ True False]]


cf.p.143
$
\begin{eqnarray}
y
&=& 
\dfrac {1} { 1+\exp \left( -x\right)}
\tag{5.9}
\end{eqnarray}
$

なので，式 $(5.9)$ より
$$
y^{2}
=
\dfrac {1^{2}} {\left( 1+\exp \left( -x\right) \right) ^{2}}
$$

$
\begin{eqnarray}
\dfrac {\partial L} {dy}y^{2}\exp \left( -x\right)
&=& 
\dfrac {\partial L} {dy}\dfrac {1} {\left( 1+\exp \left( -x\right) \right) ^{2}}\exp \left( -x\right)\\
&=& 
\dfrac {\partial L} {dy}\dfrac {1^{2}} {\left( 1+\exp \left( -x\right) \right)}\dfrac{\exp \left( -x\right)}{\left( 1+\exp \left( -x\right) \right)}\\
&=& 
\dfrac {\partial L} {dy}y\dfrac{\exp \left( -x\right)}{ 1+\exp \left( -x\right)}\\
&=& 
\dfrac {\partial L} {dy}y\left(\dfrac {\exp \left( -x\right)} { 1+\exp \left( -x\right)}+\dfrac {1} { 1+\exp \left( -x\right)}-\dfrac {1} { 1+\exp \left( -x\right)}\right)\\
&=& 
\dfrac {\partial L} {dy}y\left(\dfrac {1+\exp \left( -x\right)} { 1+\exp \left( -x\right)}-\dfrac {1} { 1+\exp \left( -x\right)}\right)\\
&=& 
\dfrac {\partial L} {dy}y\left(1-\dfrac {1} { 1+\exp \left( -x\right)}\right)\\
&=& 
\dfrac {\partial L} {dy}y(1-y)
\end{eqnarray}
$

**Affine(アフィン)変換**  
幾何学の分野で呼ばれる，ニューラルネットワークの順伝播で行う行列の内積(p.147)

In [9]:
# p.151
X_dot_W = np.array([[0, 0, 0],[10, 10, 10]])
X_dot_W

array([[ 0,  0,  0],
       [10, 10, 10]])

In [10]:
B = np.array ([1, 2, 3])
X_dot_W + B

array([[ 1,  2,  3],
       [11, 12, 13]])

In [11]:
dY = np.array([[1,2,3],[4,5,6]])
dY

array([[1, 2, 3],
       [4, 5, 6]])

In [12]:
dB = np.sum(dY, axis=0) # axis=0 は0番目の軸に対して cf.p.80
dB

array([5, 7, 9])

正誤表
https://github.com/oreilly-japan/deep-learning-from-scratch/wiki/errata