# [雅可比矩阵](https://zh.wikipedia.org/wiki/%E9%9B%85%E5%8F%AF%E6%AF%94%E7%9F%A9%E9%98%B5)
$$
\text{给定向量}
\bf{x} = 
\begin{bmatrix}
 x{_1} \\ x{_2} \\ \vdots \\ x{_n}
\end{bmatrix}
,
\bf{y} = 
\begin{bmatrix}
 y{_1} \\ y{_2} \\ \vdots \\ y{_m}
\end{bmatrix}
$$
设函数f是一个从n维欧式空间映射到m维欧式空间的函数,函数由m个实数组成:
$$
y{_1}(x{_1},\cdots,x{_n}),\cdots,y{_m}(x{_1},\cdots,x{_n})
$$
则这些函数的偏导数可以组成一个m x n的矩阵,这个矩阵就是雅可比矩阵
$$
\begin{bmatrix}
\frac{\partial{y{_1}}}{\partial{x{_1}}} & \cdots & \frac{\partial{y{_1}}}{\partial{x{_n}}} \\
\vdots & \ddots & \vdots \\
\frac{\partial{y{_m}}}{\partial{x{_1}}} & \cdots & \frac{\partial{y{_m}}}{\partial{x{_n}}} \\
\end{bmatrix}
$$

# [矩阵求导布局规范](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions)

| matrix | vector | scalar |
| ------------- | ------------- | ------------- |
| $\bf{X},\bf{Y}$ | $\bf{x},\bf{y}$ | $x,y$ |

本质上的问题是:当向量对向量求导时,即$\frac{\partial{\bf{y}}}{\partial{\bf{x}}}$,可以写作成两种矛盾的格式.假设$\bf{y}$是m维列向量,$\bf{x}$是n维度列向量,则求导结果可以是 n×m matrix 也可以是m×n matrix

- Numerator layout: 求导结果根据$\bf{y}$和$\bf{x{^T}}$布局,这也就是Jacobian formulation.$\frac{\partial{\bf{y}}}{\partial{x}}$布局为行向量,$\frac{\partial{y}}{\partial{\bf{x}}}$布局为列向量

- Denominator layout : 也被叫做Hessian formulation,是Jacobian formulation的转置
<table class="wikitable">
<caption>Result of differentiating various kinds of aggregates with other kinds of aggregates
</caption>
<tbody><tr>
<th colspan="2" rowspan="2">
</th>
<th colspan="2">Scalar <i>y</i>
</th>
<th colspan="2">Column vector <b>y</b> (size <i>m</i>×<i>1</i>)
</th>
<th colspan="2">Matrix <b>Y</b> (size <i>m</i>×<i>n</i>)
</th></tr>
<tr>
<th>Notation</th>
<th>Type
</th>
<th>Notation</th>
<th>Type
</th>
<th>Notation</th>
<th>Type
</th></tr>
<tr>
<th rowspan="2">Scalar <i>x</i>
</th>
<th>Numerator
</th>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial y}{\partial x}}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>y</mi>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>x</mi>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial y}{\partial x}}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0deac2b96aa5d0329450647f183f9365584c67b2" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:3.484ex; height:5.676ex;" alt="\frac{\partial y}{\partial x}"></span>
</td>
<td rowspan="2">Scalar
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>x</mi>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/67d5f2cf89374e95eb31cdf816533244b4d45d1d" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:3.565ex; height:5.676ex;" alt="\frac{\partial \mathbf{y}}{\partial x}"></span>
</td>
<td>Size-<i>m</i> <a href="/wiki/Column_vector" class="mw-redirect" title="Column vector">column vector</a>
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">Y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>x</mi>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/565884c84274a792e9b5af680a30f550eaf5e3a6" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:4.174ex; height:5.509ex;" alt="\frac{\partial \mathbf{Y}}{\partial x}"></span>
</td>
<td><i>m</i>×<i>n</i> matrix
</td></tr>
<tr>
<th>Denominator
</th>
<td>Size-<i>m</i> <a href="/wiki/Row_vector" class="mw-redirect" title="Row vector">row vector</a>
</td>
<td>
</td></tr>
<tr>
<th rowspan="2">Column vector <b>x</b><br>(size <i>n</i>×<i>1</i>)
</th>
<th>Numerator
</th>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>y</mi>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">x</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/01a7fae63303065a57b24c2bb67ab80468a24263" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:3.565ex; height:5.676ex;" alt="\frac{\partial y}{\partial \mathbf{x}}"></span>
</td>
<td>Size-<i>n</i> <a href="/wiki/Row_vector" class="mw-redirect" title="Row vector">row vector</a>
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">x</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/734fea892fc38deec1d53fa88abed4ca213c0d25" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:3.565ex; height:5.676ex;" alt="\frac{\partial \mathbf{y}}{\partial \mathbf{x}}"></span>
</td>
<td><i>m</i>×<i>n</i> matrix
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {x} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">Y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">x</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {x} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/433f2d2da465f5a3f2aa8dff5c9d6dd8e9947eef" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:4.174ex; height:5.509ex;" alt="\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}"></span>
</td>
<td rowspan="2">
</td></tr>
<tr>
<th>Denominator
</th>
<td>Size-<i>n</i> <a href="/wiki/Column_vector" class="mw-redirect" title="Column vector">column vector</a>
</td>
<td><i>n</i>×<i>m</i> matrix
</td></tr>
<tr>
<th rowspan="2">Matrix <b>X</b><br>(size <i>p</i>×<i>q</i>)
</th>
<th>Numerator
</th>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mi>y</mi>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">X</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/877eb58a8159dedbc4bc47afc9749803d75d5e35" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:4.174ex; height:5.676ex;" alt="\frac{\partial y}{\partial \mathbf{X}}"></span>
</td>
<td><i>q</i>×<i>p</i> matrix
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {X} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">X</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {X} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/86a7d5bedcc1bc202bd55040b26137a6c1740850" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:4.174ex; height:5.676ex;" alt="\frac{\partial \mathbf{y}}{\partial \mathbf{X}}"></span>
</td>
<td rowspan="2">
</td>
<td rowspan="2" style="text-align:center;"><span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {X} }}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="true" scriptlevel="0">
        <mrow class="MJX-TeXAtom-ORD">
          <mfrac>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">Y</mi>
              </mrow>
            </mrow>
            <mrow>
              <mi mathvariant="normal">∂<!-- ∂ --></mi>
              <mrow class="MJX-TeXAtom-ORD">
                <mi mathvariant="bold">X</mi>
              </mrow>
            </mrow>
          </mfrac>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {X} }}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0d7d1744e8920b3885bde9168c70643df3a49cd3" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.005ex; width:4.174ex; height:5.509ex;" alt="\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}"></span>
</td>
<td rowspan="2">
</td></tr>
<tr>
<th>Denominator
</th>
<td><i>p</i>×<i>q</i> matrix
</td></tr></tbody></table>


# neural network

## [activation functions](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html)

$$
singmod = ...
$$

$$
tanh(x) = \frac{e{^z}-e{^{-z}}}{e{^z}+e{^{-z}}}
$$

$$
\text{rectified linear unit:   }relu(z) = 
\begin{cases} 
0, & \text {if $z<0$} \\ 
z, & \text{if $z>0$} 
\end{cases} 
$$

$$
\text{leaking relu}(z) = 
\begin{cases} 
0.0 1z, & \text {if $z<0$} \\ 
z, & \text{if $z>0$} 
\end{cases} 
$$

$$
\text{softmax: }  \hat{y_{i}} =p_{i} =  
\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}}
$$

## [loss function](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html)

$$
\text{Cross Entropy:  }
E = 
\begin{cases}
-(y\log{(p)} + (1-y)\log{(1-p)}) & 
\text{binary classification}\\
-\sum_{i=1}^{m}y_{i}\log{(p_{i})} & 
\text{multiclass classification,  m is the number of classes }
\end{cases}
$$

## Derivative of Cross Entropy and Softmax

$
\bf{y}\in{\bf{R}^{m\times1}},
\bf{p}\in{\bf{R}^{m\times1}}
$
对loss function 求导:

$$
\frac{\partial{E}}{\partial{p_{j}}} = 
\frac{-\sum_{i=1}^{m}y_{i}\log{(p_{i})}}{\partial{p_{j}}} = 
\frac{y_{j}}{p_{j}}
$$

$$
\frac{\partial{E}}{\partial{\bf{P}}} = 
\begin{bmatrix}
\frac{y_{1}}{p_{1}}　& \cdots & \frac{y_{m}}{p_{m}}
\end{bmatrix}=
(\frac{\bf{y}}{\bf{p}})^{T}
, \in{\bf{R}^{1\times m}}
$$

对 activation functions 求导

$$
\frac{\partial{p_{i}}}{\partial{z_{j}}} = \frac{\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}}}{z_{j}}
= 
\begin{cases}
p_{i}(1-p_{i})& j=i \\
p_{i}p_{j}& j\neq{i}
\end{cases}
, \in{\bf{R}^{m\times m}}
$$

$$
\frac{\partial{P}}{\partial{\bf{Z}}} \in \bf{R}^{m \times m}
$$

综合以上

$$
\begin{bmatrix}
\frac{y_{1}}{p_{1}}　& \cdots & \frac{y_{m}}{p_{m}}
\end{bmatrix}
\begin{bmatrix}
p_{1}(1-p_{1}) & p_{1}p_{2} & \cdots & p_{1}p_{m} \\
p_{2}p_{1} & p_{2}(1-p_{2}) &  \cdots & p_{2}p_{m} \\
\vdots & \vdots & \ddots & \vdots\\
p_{m}p_{1} & p_{m}p_{2} &  \cdots & p_{m}(1-p_{m}) \\ 
\end{bmatrix}=
\begin{bmatrix}
p_{1}-y_{1} &
p_{2}-y_{2} &
\cdots &
p_{m}-y_{m}&
\end{bmatrix}
$$

$$
\frac{\partial{E}}{\partial{Z}} = \frac{\partial{E}}{\partial{\bf{P}}} \cdot
\frac{\partial{P}}{\partial{\bf{Z}}}
,\in{\bf{R^{1 \times m}}}
$$

若拆开计算,特别注意

$$
\frac{\partial{E}}{\partial{z_{j}}} = \sum_{i = 1}^{m}
\frac{\partial{E}}{\partial{p_{i}}} 
\frac{\partial{p_{i}}}{\partial{z_{j}}}
$$

## 对于单层神经网络
输入层$\bf{x}\in{R^{a \times 1}}$,
隐藏层$\bf{h}\in{R^{b \times 1}}$,
输出层$\bf{o}\in{R^{c \times 1}}$,

$$
\begin{aligned}
& \bf{z}^{[1]} = \bf{W}^{[1]}\bf{x} + \bf{b}^{1} \\
& \bf{h} = sigmoid(\bf{z}^{[1]}) \\
& \bf{z}^{[2]} = \bf{W}^{[2]}\bf{h} + \bf{b}^{[2]} \\
& \bf{p} = softmax(\bf{z^{[2]}}) \\
& \bf{E} = -\bf{y}^{T}\log{(\bf{p})}\\
\end{aligned}
$$

需要求
$\frac{\partial{E}}{\partial{w}^{[1]}}$,
$\frac{\partial{E}}{\partial{b}^{[1]}}$,
$\frac{\partial{E}}{\partial{h}}$
$\frac{\partial{E}}{\partial{w}^{[2]}}$,
$\frac{\partial{E}}{\partial{b}^{[2]}}$

$$
\begin{aligned}
& \frac{\partial{E}}{\partial{\bf{W}}^{[2]}} = \frac{\partial{E}}{\partial{\bf{p}}}
\frac{\partial{\bf{p}}}{\partial{\bf{z}^{[2]}}} \frac{{\partial{\bf{z}^{[2]}}}}{{\partial{\bf{W}^{[2]}}}} \\
& \frac{{\partial{\bf{z}^{[2]}}}}{{\partial{\bf{W}^{[2]}}}} = 
\begin{bmatrix}
\frac{{\partial{\bf{z}^{[2]}_{1}}}}{{\partial{\bf{W}^{[2]}}}} \\
\frac{{\partial{\bf{z}^{[2]}_{2}}}}{{\partial{\bf{W}^{[2]}}}} \\
\vdots \\
\frac{{\partial{\bf{z}^{[2]}_{c}}}}{{\partial{\bf{W}^{[2]}}}}
\end{bmatrix} \in{R^{c \times 1}}\\
&
\frac{{\partial{\bf{z}^{[2]}_{i}}}}{{\partial{{W}_{xy}^{[2]}}}}=
\frac{\partial{\sum_{k=1}^{b}w_{ik}h_{k}}}{{\partial{W_{xy}^{[2]}}}}\\
& 可知当 x=i,y=j时\frac{{\partial{\bf{z}^{[2]}_{i}}}}{{\partial{{W}_{xy}^{[2]}}}} = h_{y} \\
& \frac{{\partial{\bf{z}^{[2]}_{i}}}}{{\partial{\bf{W}^{[2]}}}} = 
\begin{bmatrix}
0&0&\cdots&0\\
\vdots&\vdots& \ddots &\vdots\\
h_{1} &h_{2} &\cdots & h_{b}\\
\vdots&\vdots& \ddots &\vdots\\
0&0&\cdots&0\\
\end{bmatrix} \in{R^{c \times b}}\\
& 所以有\frac{\partial{E}}{\partial{\bf{W}}^{[2]}} = 
\sum_{i=1}^{c}
\frac{{\partial{\bf{z}^{[2]}_{i}}}}{{\partial{\bf{W}^{[2]}}}}
(p_{i}-y_{i}) = 
\begin{bmatrix}
(p_{1}-y_{1})h_{1} & (p_{1}-y_{1})h_{2}& \cdots &(p_{1}-y_{1})h_{b} \\
(p_{2}-y_{2})h_{1} & (p_{2}-y_{2})h_{2}& \cdots &(p_{2}-y_{2})h_{b} \\
\vdots & \vdots& \ddots &\vdots \\
(p_{c}-y_{c})h_{1} & (p_{c}-y_{c})h_{2}& \cdots &(p_{2}-y_{2})h_{b} \\
\end{bmatrix} \in{R^{c \times b}}\\
&观察可发现又等于 : [\bf{p}-\bf{y}]^{T}h^{T}\\
\\
& \frac{\partial{E}}{\partial{h}} = (\bf{p}-\bf{y} )
\frac{\partial{\bf{z}^{[2]}}}{\partial{h}}
=  (\bf{p}-\bf{y} )^{T}\bf{W^{[2]}} \\
& \frac{\partial{E}}{\partial{b}^{[1]}} = \frac{\partial{E}}{\partial{h}} 
\frac{\partial{h}}{\partial{z{^{[1]}}}} \frac{\partial{z{^{[1]}}}}{\bf{b^{[1]}}}=
[(\bf{p}-\bf{y} )^{T}\bf{W^{[2]}}\circ sigmoid^{'}(z^{[1]})]^{T}\\
& \frac{\partial{E}}{\partial{w}^{[1]}} = \frac{\partial{E}}{\partial{h}} 
\frac{\partial{h}}{\partial{z{^{[1]}}}} \frac{\partial{z{^{[1]}}}}{\bf{W^{[1]}}}=
[(\bf{p}-\bf{y} )^{T}\bf{W^{[2]}} \circ sigmoid^{'}(z^{[1]})]^{T}x^{T}
\end{aligned}
$$

In [8]:
import numpy as np
import time
import os
from functools import partial
print(os.getcwd())


def get_data():
    with open('src/data/mnist_train.csv', 'r') as f:
        data = [x.strip().split(',') for x in f]
        data = np.asarray(data, dtype='float').T
        # print(data[0][:10])
        y = np.zeros((10, len(data[0])))
        for i, x in enumerate(data[0]):
            y[:, i][int(x)] = 1
        x = np.delete(data, 0, 0)
        x = (x / 255.0 * 0.99) + 0.01
        return x, y


def get_test_data():
    with open('src/data/mnist_test.csv', 'r') as f:
        data = [x.strip().split(',') for x in f]
        data = np.asarray(data, dtype='float').T
        # print(data[0][:10])
        y = data[0]
        x = np.delete(data, 0, 0)
        x = (x / 255.0 * 0.99) + 0.01
        return x, y


# 激活函数
def sigmoid(x, derivative=False):
    res = 1. / (1. + np.exp(-x))
    if derivative:
        return res * (1-res)
    return res


# def softmax(x):
#     # RuntimeWarning: overflow encountered in exp
#     x = np.exp(x)
#     return x / np.sum(x)
def softmax(x):
    if x.shape[1] == 1:
        exp = np.exp(x - np.max(x))
        return exp / np.sum(exp)
    else:
        exp = np.exp(x-np.max(x, axis=0, keepdims=True))
        return exp / np.sum(exp, axis=0, keepdims=True)


def relu(x, derivative=False):
    if derivative:
        x[x <= 0] = 0
        x[x > 0] = 1
        return x
    return np.maximum(x, 0)


def cross_entropy(predictions, targets, epsilon=1e-12):
    predictions = np.clip(predictions, epsilon, 1. - epsilon)
    N = predictions.shape[1]
    ce = -np.sum(targets*np.log(predictions))/N
    print(f'loss:{ce}')
    return ce


class SimpleNetwork:
    def __init__(self, learning_rate, input_node, hidden_node, output_node, hidden_activation_function):
        self.learning_rate = learning_rate
        self.input_node = input_node
        self.w1 = np.random.randn(hidden_node, input_node)*0.01
        self.b1 = np.zeros((hidden_node, 1))
        self.w2 = np.random.randn(output_node, hidden_node)*0.01
        self.b2 = np.zeros((output_node, 1))
        self.hidden_activation_function = hidden_activation_function
        self.derivative_of_hidden_activation_function = partial(hidden_activation_function, derivative=True)
        self.output_activation_function = softmax

    def train_vectorization(self, samples, marks):
        size_of_samples = samples.shape[1]

        z1 = np.dot(self.w1, samples) + self.b1
        h = self.hidden_activation_function(z1)
        z2 = np.dot(self.w2, h) + self.b2
        p = self.output_activation_function(z2)
        loss = cross_entropy(p, marks)

        dz2 = (p-marks).T / size_of_samples
        dw2 = np.dot(dz2.T, h.T)
        db2 = np.sum(dz2.T, axis=1, keepdims=True)

        dz1 = np.dot(dz2, self.w2).T * self.derivative_of_hidden_activation_function(z1)
        db1 = np.sum(dz1, axis=1, keepdims=True)
        dw1 = np.dot(dz1, x.T)

        # check dimensionality
        # print('dw2', dw2.shape, self.w2.shape)
        # print('db2', db2.shape, self.b2.shape)
        # print('db1', db1.shape, self.b1.shape)
        # print('dw1',dw1.shape,self.w1.shape)

        self.w1 = self.w1 - self.learning_rate*dw1
        self.b1 = self.b1 - self.learning_rate*db1
        self.w2 = self.w2 - self.learning_rate*dw2
        self.b2 = self.b2 - self.learning_rate*db2

    def train_single(self, samples, marks):
        for data in zip(samples.T, marks.T):
            x = data[0].reshape((data[0].shape[0],1))
            y = data[1].reshape((data[1].shape[0],1))

            z1 = np.dot(self.w1, x) + self.b1
            h = self.hidden_activation_function(z1)
            z2 = np.dot(self.w2, h) + self.b2
            p = self.output_activation_function(z2)

            dw2 = np.dot((p - y), h.T)
            db2 = p - y
            dz1 = np.dot((p - y).T, self.w2).T * self.derivative_of_hidden_activation_function(z1)
            db1 = dz1
            dw1 = np.dot(dz1, x.T)

            self.w1 = self.w1 - self.learning_rate * dw1
            self.b1 = self.b1 - self.learning_rate * db1
            self.w2 = self.w2 - self.learning_rate * dw2
            self.b2 = self.b2 - self.learning_rate * db2
        # loss = cross_entropy(p, y)

    def predict(self, features):
        features = features.reshape((self.input_node,1))

        z1 = np.dot(self.w1, features) + self.b1
        h = self.hidden_activation_function(z1)
        z2 = np.dot(self.w2, h) + self.b2
        p = self.output_activation_function(z2)

        p = p.tolist()
        return p.index(max(p))

    def test(self, samples, marks):
        false = 0
        for i in zip(samples.T, marks):
            predict_number = self.predict(i[0])
            if predict_number != i[1]:
                false += 1
        print('ratio',(len(samples)-false)/100)


if __name__ == 'main':
    x, y = get_data()
    x_t, y_t = get_test_data()
    nw = SimpleNetwork(0.2, 784, 300, 10, relu, softmax)
    s = time.time()
    print('start')
    # 多样本单次训练
    for i in range(1000):
        # print(i)
        nw.train_vectorization(x, y)
    
    # 单样本多次训练
    # for i in range(100):
    #     nw.train_single(x, y)
    #     nw.test(x_t, y_t)
    
    print('time',time.time()-s)
    nw.test(x_t,y_t)



C:\Users\10444\Desktop\learning log


start


loss:2.302892957408862


loss:2.2981502685024644


loss:2.293378590936806


loss:2.28837803902809


loss:2.2829596967501953


loss:2.276942071745195


loss:2.2701483440936276


loss:2.2623948707264994


loss:2.2534947068432962


loss:2.2432552071453293


loss:2.231478090407141


loss:2.2179579576311648


loss:2.2024889209773018


loss:2.184860297824268


loss:2.1648517086212467


loss:2.1422471140625867


loss:2.1168416057774


loss:2.088449950158824


loss:2.056922344834348


loss:2.0221585336226573


loss:1.9841198661143216


loss:1.9428460393011893


loss:1.8984678718694334


loss:1.85121305646855


loss:1.8014117998484587


loss:1.7494960629755196


loss:1.6959860866634273


loss:1.6414717317426277


loss:1.5865784180313858


loss:1.531934941450699


loss:1.4781319243732904


loss:1.4256885869442655


loss:1.375029622108639


loss:1.3264696267878866


loss:1.280215174998943


loss:1.2363763460411554


loss:1.1949830821879255


loss:1.156003111268336


loss:1.1193601275856306


loss:1.0849503135397836


loss:1.0526525920666538


loss:1.022339304643515


loss:0.9938815774403659


loss:0.9671527566400661


loss:0.9420327866411375


loss:0.918406984073871


loss:0.8961684365755088


loss:0.8752178691959072


loss:0.8554626803686398


loss:0.8368173988021034


loss:0.8192029847224233


loss:0.8025462908260523


loss:0.7867799521875285


loss:0.7718421199240657


loss:0.7576752302794677


loss:0.7442263397144981


loss:0.731446435061877


loss:0.7192904592076949


loss:0.7077170714530074


loss:0.696687836752112


loss:0.6861674359873892


loss:0.6761230874061721


loss:0.6665245676057143


loss:0.6573440726358187


loss:0.6485558301023973


loss:0.6401358321673214


loss:0.6320618971721614


loss:0.624313605936086


loss:0.6168721447254752


loss:0.609719974554049


loss:0.6028407751784212


loss:0.5962194698125779


loss:0.5898419920440532


loss:0.5836952830510793


loss:0.5777670048253354


loss:0.5720457607477342


loss:0.5665210491057663


loss:0.5611831961615927


loss:0.5560229620386113


loss:0.551031679031346


loss:0.5462012651137987


loss:0.5415240973421765


loss:0.53699311785943


loss:0.5326016099428981


loss:0.5283433800834464


loss:0.5242124889130158


loss:0.5202034391069795


loss:0.5163110528771669


loss:0.5125304370320535


loss:0.5088569266628846


loss:0.5052862121094684


loss:0.5018140477000842


loss:0.4984365632765256


loss:0.4951500043550225


loss:0.49195083159235503


loss:0.48883583725952207


loss:0.4858019361566421


loss:0.48284606988693946


loss:0.4799652726612955


loss:0.4771568636539976


loss:0.4744182722951218


loss:0.4717469738890764


loss:0.4691404819641983


loss:0.46659664451265814


loss:0.4641134267853221


loss:0.46168879161819093


loss:0.45932068547516414


loss:0.4570072485058773


loss:0.4547466260935726


loss:0.45253702379313254


loss:0.4503766438853921


loss:0.4482639183212939


loss:0.4461973176957883


loss:0.4441754288722459


loss:0.44219675522093504


loss:0.44026007314244786


loss:0.43836406476869433


loss:0.43650746622865416


loss:0.43468903286930966


loss:0.4329076223754619


loss:0.4311621960158911


loss:0.4294515661779018


loss:0.42777476062303627


loss:0.4261307454967295


loss:0.4245184895708167


loss:0.42293715425415224


loss:0.42138585870722645


loss:0.4198636552061458


loss:0.41836980249360106


loss:0.41690325831699543


loss:0.4154631453365485


loss:0.4140488236534341


loss:0.4126596889045114


loss:0.41129503850956955


loss:0.40995417360370817


loss:0.4086363347039365


loss:0.4073410564402011


loss:0.4060676720530106


loss:0.4048155481846357


loss:0.403584203244202


loss:0.40237307171717185


loss:0.4011817409738711


loss:0.4000093768510029


loss:0.3988555574779429


loss:0.3977197429403122


loss:0.3966016533959978


loss:0.3955006924950004


loss:0.39441657033820837


loss:0.39334884332550113


loss:0.39229712796438615


loss:0.39126105159055


loss:0.39024020090341754


loss:0.38923417942449895


loss:0.38824265547640235


loss:0.3872652278832972


loss:0.3863014684044238


loss:0.38535118852664335


loss:0.38441391502017036


loss:0.38348926434461283


loss:0.3825769465485808


loss:0.38167672208150266


loss:0.38078837783237146


loss:0.3799115415736071


loss:0.37904610105904024


loss:0.3781918135190419


loss:0.3773483304175218


loss:0.3765155623222887


loss:0.37569301333786176


loss:0.3748806341531963


loss:0.37407825573582365


loss:0.3732856947001495


loss:0.3725026941280384


loss:0.3717290857613579


loss:0.370964532276131


loss:0.37020877927958773


loss:0.36946165217150556


loss:0.36872294480981477


loss:0.3679926247144561


loss:0.36727049683494545


loss:0.36655622447551295


loss:0.3658499440863905


loss:0.3651513298978176


loss:0.36446023606301287


loss:0.36377643409000265


loss:0.3630999008818766


loss:0.36243049702945906


loss:0.3617680510365278


loss:0.3611125753215931


loss:0.3604637058324398


loss:0.3598215512188074


loss:0.35918591614446715


loss:0.35855672887739515


loss:0.3579339073722057


loss:0.3573171391698053


loss:0.35670613641458826


loss:0.35610108278671027


loss:0.355501952205311


loss:0.3549087290401705


loss:0.3543211028241756


loss:0.35373907009080047


loss:0.353162173789094


loss:0.3525904260588837


loss:0.35202407948037184


loss:0.35146278796424857


loss:0.3509063461810606


loss:0.35035495316717485


loss:0.34980837010232935


loss:0.34926641828728494


loss:0.3487289346128439


loss:0.3481961026253567


loss:0.34766756889267536


loss:0.3471434739289903


loss:0.3466233777587896


loss:0.3461072961916911


loss:0.34559542452577047


loss:0.3450877675706357


loss:0.344584076123234


loss:0.3440841859022285


loss:0.34358821534787504


loss:0.34309589862583223


loss:0.3426073928676141


loss:0.3421227180958049


loss:0.3416416314068599


loss:0.3411641134373091


loss:0.34069018535489365


loss:0.34021990828236043


loss:0.33975317194899207


loss:0.3392898179488553


loss:0.33882972777513903


loss:0.338372919854528


loss:0.33791942899532923


loss:0.3374693754510524


loss:0.3370226947880667


loss:0.33657910005467795


loss:0.3361385604202481


loss:0.33570108625808165


loss:0.33526642831216075


loss:0.3348347838339835


loss:0.3344061421820487


loss:0.33398036229808276


loss:0.33355747613338166


loss:0.3331371770755377


KeyboardInterrupt: 