## 简单示例推导神经网络的反向传播

In [85]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= u"神经网络_反向传播.png", width=400, height=400)


- 输入: [0.05, 0.1]
- 输出: [0.01, 0.99]
- 初始权重: [0.15, 0.2, 0.25, 0.3], [0.4, 0.45, 0.5, 0.55]
- bias: [0.35, 0.6]
- 神经元都使用sigmoid函数
$$f(x)=\frac{1}{1+e^{-x}}$$
***
**目标**: 训练各个权重，使得输出值尽量接近[0.01, 0.99]

### 计算输出的过程

***
#### 计算隐藏层的输入
$$h1\_input = i1 * w1 + i2 * w2 + b1 * 1$$

In [86]:
w1, w2, w3, w4, w5, w6, w7, w8 = [0.15, 0.2, 0.25, 0.3, 0.4, 0.45, 0.5, 0.55]
b1 = 0.35
b2 = 0.6
target_1 = 0.01
target_2 = 0.99

h1_input = 0.05 * w1 + 0.1 * w2 + b1
h1_input

0.3775

$$h2\_input = i1 * w3 + i2 * w4 + b1 * 1$$

In [87]:
h2_input = 0.05 * w3 + 0.1 * w4 + b1
h2_input

0.39249999999999996

***
#### 计算隐藏层输出
$$h1\_output = \frac{1}{1+e^{-h1\_input}}$$

In [88]:
from scipy.special import expit
h1_output = expit(h1_input)
h1_output

0.59326999210718723

$$h2\_output = \frac{1}{1+e^{-h2\_input}}$$

In [89]:
h2_output = expit(h2_input)
h2_output

0.59688437825976703

***
#### 计算输出层的输入
$$o1\_input = h1\_output * w5 + h2\_output * w6 + b2 * 1$$

In [90]:
o1_input = h1_output * w5 + h2_output * w6 + b2
o1_input

1.10590596705977

$$o2\_input = h1\_output * w7 +  h2\_output * w8 + b2$$

In [91]:
o2_input = h1_output * w7 + h2_output * w8 + b2
o2_input

1.2249214040964653

***
#### 计算输出层的输出
$$o1\_output = \frac{1}{1+e^{-o1\_input}}$$
$$o2\_output = \frac{1}{1+e^{-o2\_input}}$$

In [92]:
o1_output = expit(o1_input)
print "o1 output is: %f" % o1_output
o2_output = expit(o2_input)
print "o2 output is: %f" % o2_output

o1 output is: 0.751365
o2 output is: 0.772928


### 反向传播的过程

#### 计算总误差

$$E_{total} = \sum\limits_{i=1}^{2}\frac{1}{2}(o_i\_output - target_i)^2$$

In [93]:
E_total = ((o1_output - target_1)**2 + (o2_output - target_2)**2) / 2
E_total

0.29837110876000272

#### 反馈
以w5这条边为例，为了减少总误差，**其变化应该沿着总误差对w5的偏导数方向进行**
先求其偏导
\begin{equation}
\begin{split}
\frac{\partial E_{total}}{\partial w5} &= \frac{\partial E_{total}}{\partial o1\_output} \frac{\partial o1\_output}{\partial o1\_input} \frac{\partial o1\_input}{\partial w5} \\
&= (o1\_output - target_1) (o1\_output)(1 - o1\_output)h1\_output\\
\end{split}
\end{equation}
注意其中sigmoid函数
$y = \frac{1}{1+e^{-x}}$的导数是$y(1-y)$

In [94]:
w5_partial = o1_output * h1_output * (o1_output - target_1) * (1 - o1_output)
w5_partial

0.08216704056423077

我们设置步长系数为0.5，得到新的w5
$$w5^\prime = w5 - \eta\frac{\partial E_{total}}{\partial w5}$$

In [95]:
w5_n = w5 - 0.5 * w5_partial
w5_n

0.35891647971788465

同理，需要求出新的w6, w7, w8
$$\frac{\partial E_{total}}{\partial w6} = (o1\_output - target_1) (o1\_output)(1 - o1\_output)h2\_output$$
$$\frac{\partial E_{total}}{\partial w7} = (o2\_output - target_2) (o2\_output)(1 - o2\_output)h1\_output$$
$$\frac{\partial E_{total}}{\partial w8} = (o2\_output - target_2) (o2\_output)(1 - o2\_output)h2\_output$$

In [96]:
w6_partial = (o1_output - target_1)* o1_output * (1 - o1_output) * h2_output
w6_n = w6 - 0.5 * w6_partial
w7_partial = (o2_output - target_2)* o2_output * (1 - o2_output) * h1_output
w7_n = w7 - 0.5 * w7_partial
w8_partial = (o2_output - target_2)* o2_output * (1 - o2_output) * h2_output
w8_n = w8 - 0.5 * w8_partial

print "w6 new = %f, w7 new = %f, w8 new = %f" % (w6_n, w7_n, w8_n)

w6 new = 0.408666, w7 new = 0.511301, w8 new = 0.561370


同时需要更新隐藏层的权值w1，w2，w3，w4，才能算完成了一次迭代过程，隐藏层的反馈迭代稍微复杂点，如下
***
#### 隐藏层权值的更新
以w1更新为例，首先求总误差对w1的偏导
\begin{equation}
\begin{split}
\frac{\partial E_{total}}{\partial w1} &= \frac{\partial E_{total}}{\partial h1\_output}\frac{\partial h1\_output}{\partial h1\_input}\frac{\partial h1\_input}{\partial w1}\\
&=  (\frac{\partial E_{total}}{\partial o1\_output}\frac{\partial o1\_output}{\partial o1\_input}\frac{\partial o1\_input}{\partial h1\_output} + \frac{\partial E_{total}}{\partial o2\_output}\frac{\partial o2\_output}{\partial o2\_input}\frac{\partial o2\_input}{\partial h1\_output})\frac{\partial h1\_output}{\partial h1\_input}\frac{\partial h1\_input}{\partial w1}\\
&= ((o1\_output - target_1) (o1\_output)(1 - o1\_output)w5 + (o2\_output - target_2) (o2\_output)(1 - o2\_output)w7)h1\_output(1-h1\_output)i1
\end{split}
\end{equation}

In [98]:
w1_partial =  (o1_output * (o1_output - target_1) * (1 - o1_output) * w5 + (o2_output - target_2)* o2_output * (1 - o2_output) * w7) * h1_output * (1 - h1_output) * 0.05
w1_n = w1 - 0.5 * w1_partial
print "w1 partial is %f, and w1 new is: %f" % (w1_partial, w1_n)


w1 partial is 0.000439, and w1 new is: 0.149781


同理，得到w2,w3,w4的新值，完成迭代。