# Backpropagation Algorithm

![backpropagation](imgs/backpropagation.png)

Mean squared error:

$$
\begin{align*}
    MSE &= \frac{1}{2} \sum^{1}_{n=0} (t_i - y_i)^2 \\
    &= \frac{1}{2} (t_0 - y_0)^2 + \frac{1}{2} (t_1 - y_1)^2
\end{align*}
$$

In general:
$$
\begin{align*}
    \frac{\partial MSE}{\partial y_{i}} &= -(t_i - y_i) \\
    &= y_i - t_i 
\end{align*}
$$

In specific case:
$$
\begin{align*}
    \frac{\partial MSE}{\partial y_{0}} &= -(t_0 - y_0) \\
    &= y_0 - t_0 
\end{align*}
$$

$$
\begin{align*}
    \frac{\partial MSE}{\partial y_{1}} &= -(t_1 - y_1) \\
    &= y_1 - t_1 
\end{align*}
$$

Linear activation (before final output):
$$
\begin{align*}
    y_i &= z_i^{(L)}
    \hspace{35pt}
    where
    \hspace{10pt}
    \{0, 1, ..., l, ..., L-1, L\} \,\, layers
\end{align*}
$$

In general:
$$
\begin{align*}
    \frac{\partial y_i}{\partial z^{(L)}_{i}} &= \frac{\partial a^{(L)}_i}{\partial z^{(L)}_{i}} = 1 \\
\end{align*}
$$

In specific case:
$$
\begin{align*}
    \frac{\partial y_0}{\partial z^{(L)}_{0}} &= \frac{\partial a^{(L)}_0}{\partial z^{(L)}_{i}} = 1 \\
\end{align*}
$$

$$
\begin{align*}
    \frac{\partial y_1}{\partial z^{(L)}_{1}} &= \frac{\partial a^{(L)}_1}{\partial z^{(L)}_{i}} = 1 \\
\end{align*}
$$

Before activation function:

- In matrix form:
$$
\begin{align*}
    z^{(l)} &= W^{(l)}_{*} \, a^{(l-1)}_{*} + b^{l} \\
    &= W^{(l)} \, a^{l-1}
\end{align*}
$$

Note that with a trick we can write:
$$
\begin{align*}
    z^{(l)} &= W^{(l)}_{*} \, a^{l-1}_{*} + b^{l}
    \hspace{35pt}
    where
    \hspace{10pt}
    W^{(l)} \in \mathbb{R}^{M \, x \, N}, a^{(l-1)} \in \mathbb{R}^{N}, b^{(l)} \in \mathbb{R}^{M} \\
    %
    &=
    \begin{bmatrix}
        w^{(l)}_{00} & w^{(l)}_{01} & \dots & w^{(l)}_{0N} \\
        w^{(l)}_{10} & w^{(l)}_{11} & \dots & w^{(l)}_{1N} \\
        \vdots & \vdots & & \vdots \\
        w^{(l)}_{M0} & w^{(l)}_{M1} & \dots & w^{(l)}_{MN}
    \end{bmatrix}
    %
    \begin{bmatrix}
        a^{(l-1)}_{0} \\
        a^{(l-1)}_{1} \\
        \vdots \\
        a^{(l-1)}_{N}
    \end{bmatrix}
    %
    +
    \begin{bmatrix}
        b^{(l)}_{0} \\
        b^{(l)}_{1} \\
        \vdots \\
        b^{(l)}_{M}
    \end{bmatrix} \\
    % 
    &=
    \begin{bmatrix}
        w^{(l)}_{00} a^{(l-1)}_{0} + w^{(l)}_{01} a^{(l-1)}_{1} + \dots + w^{(l)}_{0N} a^{(l-1)}_{N} \\
        w^{(l)}_{10} a^{(l-1)}_{0} + w^{(l)}_{11} a^{(l-1)}_{1} + \dots + w^{(l)}_{1N} a^{(l-1)}_{N} \\
        \vdots \\
        w^{(l)}_{M0} a^{(l-1)}_{0} + w^{(l)}_{M1} a^{(l-1)}_{1} + \dots + w^{(l)}_{MN} a^{(l-1)}_{N}
    \end{bmatrix}
    %
    +
    \begin{bmatrix}
        b^{(l)}_{0} \\
        b^{(l)}_{1} \\
        \vdots \\
        b^{(l)}_{M}
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        w^{(l)}_{00} a^{(l-1)}_{0} + w^{(l)}_{01} a^{(l-1)}_{1} + \dots + w^{(l)}_{0N} a^{(l-1)}_{N} + 1b^{(l)}_{0} \\
        w^{(l)}_{10} a^{(l-1)}_{0} + w^{(l)}_{11} a^{(l-1)}_{1} + \dots + w^{(l)}_{1N} a^{(l-1)}_{N} + 1b^{(l)}_{1} \\
        \vdots \\
        w^{(l)}_{M0} a^{(l-1)}_{0} + w^{(l)}_{M1} a^{(l-1)}_{1} + \dots + w^{(l)}_{MN} a^{(l-1)}_{N} + 1b^{(l)}_{M}
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        w^{(l)}_{00} & w^{(l)}_{01} & \dots & w^{(l)}_{0N} & b^{(l)}_{0} \\
        w^{(l)}_{10} & w^{(l)}_{11} & \dots & w^{(l)}_{1N} & b^{(l)}_{1} \\
        \vdots & \vdots & & \vdots \\
        w^{(l)}_{M0} & w^{(l)}_{M1} & \dots & w^{(l)}_{MN} & b^{(l)}_{M}
    \end{bmatrix}
    %
    \begin{bmatrix}
        a^{(l-1)}_{0} \\
        a^{(l-1)}_{1} \\
        \vdots \\
        a^{(l-1)}_{N} \\
        1
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        w^{(l)}_{00} & w^{(l)}_{01} & \dots & w^{(l)}_{0N} & w^{(l)}_{0(N+1)} \\
        w^{(l)}_{10} & w^{(l)}_{11} & \dots & w^{(l)}_{1N} & w^{(l)}_{1(N+1)} \\
        \vdots & \vdots & & \vdots \\
        w^{(l)}_{M0} & w^{(l)}_{M1} & \dots & w^{(l)}_{MN} & w^{(l)}_{M(N+1)}
    \end{bmatrix}
    %
    \begin{bmatrix}
        a^{(l-1)}_{0} \\
        a^{(l-1)}_{1} \\
        \vdots \\
        a^{(l-1)}_{N} \\
        a^{(l-1)}_{N+1}
    \end{bmatrix} \\
    %
    &= 
    W^{(l)} \, a^{l-1}
\end{align*}
$$

- For instante, we fix $l=2$ (that is last layer $L$), therefore we want compute between last layer and middle layer:

$$
\begin{align*}
    z^{(l)} &= W^{(l)} \, a^{l-1} \\
    &= W^{(2)} \, a^{1} \\
    &=
    \begin{bmatrix}
        w^{(2)}_{00} & w^{(2)}_{01} & w^{(2)}_{02} & b^{(2)}_{0} \\
        w^{(2)}_{10} & w^{(2)}_{11} & w^{(2)}_{12} & b^{(2)}_{1}
    \end{bmatrix}
    %
    \begin{bmatrix}
        a^{(1)}_{0} \\
        a^{(1)}_{1} \\
        a^{(1)}_{2} \\
        1
    \end{bmatrix} \\
    % 
    &=
    \begin{bmatrix}
        w^{(2)}_{00} & w^{(2)}_{01} & w^{(2)}_{02} & w^{(2)}_{03} \\
        w^{(2)}_{10} & w^{(2)}_{11} & w^{(2)}_{12} & w^{(2)}_{13}
    \end{bmatrix}
    %
    \begin{bmatrix}
        a^{(1)}_{0} \\
        a^{(1)}_{1} \\
        a^{(1)}_{2} \\
        a^{(1)}_{3}
    \end{bmatrix} \\
    % 
    &=
    \begin{bmatrix}
        w^{(2)}_{00} a^{(1)}_{0} + w^{(2)}_{01} a^{(1)}_{1} + w^{(2)}_{02} a^{(1)}_{2} + w^{(2)}_{03} a^{(1)}_{3} \\
        w^{(2)}_{10} a^{(1)}_{0} + w^{(2)}_{11} a^{(1)}_{1} + w^{(2)}_{12} a^{(1)}_{2} + w^{(2)}_{13} a^{(1)}_{3}
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        z^{(2)}_{0} \\
        z^{(2)}_{1}
    \end{bmatrix}
\end{align*}
$$

- For instante, we fix $l=1$, therefore we want compute between middle layer and input:

$$
\begin{align*}
    z^{(l)} &= W^{(l)} \, a^{l-1} \\
    &= W^{(1)} \, a^{0} \\
    &= W^{(1)} \, x \\  
    &=
    \begin{bmatrix}
        w^{(1)}_{00} & w^{(1)}_{01} & b^{(1)}_{0} \\
        w^{(1)}_{10} & w^{(1)}_{11} & b^{(1)}_{1} \\
        w^{(1)}_{20} & w^{(1)}_{21} & b^{(1)}_{2}
    \end{bmatrix}
    %
    \begin{bmatrix}
        x_{0} \\
        x_{1} \\
        1
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        w^{(1)}_{00} & w^{(1)}_{01} & w^{(1)}_{02} \\
        w^{(1)}_{10} & w^{(1)}_{11} & w^{(1)}_{12} \\
        w^{(1)}_{20} & w^{(1)}_{21} & w^{(1)}_{22}
    \end{bmatrix}
    %
    \begin{bmatrix}
        x_{0} \\
        x_{1} \\
        x_{2}
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        w^{(1)}_{00} x_{0} + w^{(1)}_{01} x_{1} + w^{(1)}_{02} x_{2} \\
        w^{(1)}_{10} x_{0} + w^{(1)}_{11} x_{1} + w^{(1)}_{12} x_{2} \\
        w^{(1)}_{20} x_{0} + w^{(1)}_{21} x_{1} + w^{(1)}_{22} x_{2}
    \end{bmatrix} \\
    %
    &=
    \begin{bmatrix}
        z^{(1)}_{0} \\
        z^{(1)}_{1} \\
        z^{(1)}_{2}
    \end{bmatrix}
\end{align*}
$$

Therefore, we understood that $b$ is just a weight that has always $1$ as input.

- In scalar form:
$$
\begin{align*}
    z^{(l)}_{i} &= \sum^{len(l)}_{i=0} \sum^{len(l-1)}_{j=0} w^{(l)}_{ij} \, a^{(l-1)}_{j} \\
\end{align*}
$$

$$
\begin{align*}
    z^{(2)}_{0} &= w^{(2)}_{00} a^{(1)}_{0} + w^{(2)}_{01} a^{(1)}_{1} + w^{(2)}_{02} a^{(1)}_{2} + w^{(2)}_{03} a^{(1)}_{3} \\
    z^{(2)}_{1} &= w^{(2)}_{10} a^{(1)}_{0} + w^{(2)}_{11} a^{(1)}_{1} + w^{(2)}_{12} a^{(1)}_{2} + w^{(2)}_{13} a^{(1)}_{3} \\
    \\
    z^{(1)}_{0} &= w^{(1)}_{00} x_{0} + w^{(1)}_{01} x_{1} + w^{(1)}_{02} x_{2} \\
    z^{(1)}_{1} &= w^{(1)}_{10} x_{0} + w^{(1)}_{11} x_{1} + w^{(1)}_{12} x_{2} \\
    z^{(1)}_{2} &= w^{(1)}_{20} x_{0} + w^{(1)}_{21} x_{1} + w^{(1)}_{22} x_{2}
\end{align*}
$$

In general:

$$
\begin{align*}
    \frac{\partial z^{(l)}_{i}}{\partial a^{(l-1)}_{j}} = w_{ij}^{(l)}
    \hspace{35pt}
    where
    \hspace{10pt}
    a^{(0)}_{j} = x_j
\end{align*}
$$

In specific case:

$l=2$

$$
\begin{align*}
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{0}} = w_{00}^{(2)} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{1}} = w_{01}^{(2)} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{2}} = w_{02}^{(2)} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{3}} = w_{03}^{(2)} \\
    \\
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{0}} = w_{10}^{(2)} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{1}} = w_{11}^{(2)} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{2}} = w_{12}^{(2)} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{3}} = w_{13}^{(2)}
\end{align*}
$$

$l=1$

$$
\begin{align*}
    \frac{\partial z^{(1)}_{0}}{\partial a^{(1)}_{0}} = \frac{\partial z^{(1)}_{0}}{\partial x_{0}} = w_{00}^{(1)} &&&&
    \frac{\partial z^{(1)}_{0}}{\partial a^{(1)}_{1}} = \frac{\partial z^{(1)}_{0}}{\partial x_{1}} = w_{01}^{(1)} &&&&
    \frac{\partial z^{(1)}_{0}}{\partial a^{(1)}_{2}} = \frac{\partial z^{(1)}_{0}}{\partial x_{2}} = w_{02}^{(1)} \\
    \\
    \frac{\partial z^{(1)}_{1}}{\partial a^{(1)}_{0}} = \frac{\partial z^{(1)}_{1}}{\partial x_{0}} = w_{10}^{(1)} &&&&
    \frac{\partial z^{(1)}_{1}}{\partial a^{(1)}_{1}} = \frac{\partial z^{(1)}_{1}}{\partial x_{1}} = w_{11}^{(1)} &&&&
    \frac{\partial z^{(1)}_{1}}{\partial a^{(1)}_{2}} = \frac{\partial z^{(1)}_{1}}{\partial x_{2}} = w_{12}^{(1)} \\
    \\
    \frac{\partial z^{(1)}_{2}}{\partial a^{(1)}_{0}} = \frac{\partial z^{(1)}_{2}}{\partial x_{0}} = w_{20}^{(1)} &&&&
    \frac{\partial z^{(1)}_{2}}{\partial a^{(1)}_{1}} = \frac{\partial z^{(1)}_{2}}{\partial x_{1}} = w_{21}^{(1)} &&&&
    \frac{\partial z^{(1)}_{2}}{\partial a^{(1)}_{2}} = \frac{\partial z^{(1)}_{2}}{\partial x_{2}} = w_{22}^{(1)}
\end{align*}
$$

In general:

$$
\begin{align*}
    \frac{\partial z^{(l)}_{i}}{\partial w^{(l)}_{ij}} = a^{(l-1)}_{j}
    \hspace{35pt}
    where
    \hspace{10pt}
    a^{(0)}_{j} = x_j
\end{align*}
$$

In specific case:

$l=2$

$$
\begin{align*}
    \frac{\partial z^{(2)}_{0}}{\partial w_{00}^{(2)}} = a^{(1)}_{0} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial w_{01}^{(2)}} = a^{(1)}_{1} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial w_{02}^{(2)}} = a^{(1)}_{2} &&&&
    \frac{\partial z^{(2)}_{0}}{\partial w_{03}^{(2)}} = a^{(1)}_{3} 
    \\
    \frac{\partial z^{(2)}_{1}}{\partial w_{10}^{(2)}} = a^{(1)}_{0} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial w_{11}^{(2)}} = a^{(1)}_{1} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial w_{12}^{(2)}} = a^{(1)}_{2} &&&&
    \frac{\partial z^{(2)}_{1}}{\partial w_{13}^{(2)}} = a^{(1)}_{3}
\end{align*}
$$

$l=2$

$$
\begin{align*}
    \frac{\partial z^{(1)}_{0}}{\partial w_{00}^{(1)}} = a^{(1)}_{0} = x_{0} &&&&
    \frac{\partial z^{(1)}_{0}}{\partial w_{01}^{(1)}} = a^{(1)}_{1} = x_{1} &&&&
    \frac{\partial z^{(1)}_{0}}{\partial w_{02}^{(1)}} = a^{(1)}_{2} = x_{2} \\
    \\
    \frac{\partial z^{(1)}_{1}}{\partial w_{10}^{(1)}} = a^{(1)}_{0} = x_{0} &&&&
    \frac{\partial z^{(1)}_{1}}{\partial w_{11}^{(1)}} = a^{(1)}_{1} = x_{1} &&&&
    \frac{\partial z^{(1)}_{1}}{\partial w_{12}^{(1)}} = a^{(1)}_{2} = x_{2} \\
    \\
    \frac{\partial z^{(1)}_{2}}{\partial w_{20}^{(1)}} = a^{(1)}_{0} = x_{0} &&&&
    \frac{\partial z^{(1)}_{2}}{\partial w_{21}^{(1)}} = a^{(1)}_{1} = x_{1} &&&&
    \frac{\partial z^{(1)}_{2}}{\partial w_{22}^{(2)}} = a^{(1)}_{2} = x_{2}
\end{align*}
$$

Activation function:
$$
\begin{align*}
    a^{(l)}_{i} = \frac{1}{1 + e^{-z^{(l)}_{i}}} \\
\end{align*}
$$

In general:

$$
\begin{align*}
    \frac{\partial a^{(l)}_{i}}{\partial z^{(l)}_{i}} = a^{(l)}_{i} (1 - a^{(l)}_{i})
    \hspace{35pt}
    where
    \hspace{10pt}
    1 \le l < L
\end{align*}
$$

Note that for $L=2$ we have linear activation that we compute in a different way, as we have already seen before. But for the hidden layer we have the no linear activation functions.

Update weights for $l=2$:
$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(2)}_{00}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \,
    \frac{\partial z^{(2)}_{0}}{\partial w^{(2)}_{00}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{01}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \,
    \frac{\partial z^{(2)}_{0}}{\partial w^{(2)}_{01}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{02}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \,
    \frac{\partial z^{(2)}_{0}}{\partial w^{(2)}_{02}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{03}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \,
    \frac{\partial z^{(2)}_{0}}{\partial w^{(2)}_{03}}
\end{align*}
$$

$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(2)}_{10}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \,
    \frac{\partial z^{(2)}_{1}}{\partial w^{(2)}_{10}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{11}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \,
    \frac{\partial z^{(2)}_{1}}{\partial w^{(2)}_{11}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{12}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \,
    \frac{\partial z^{(2)}_{1}}{\partial w^{(2)}_{12}} &&&&
    \frac{\partial MSE}{\partial w^{(2)}_{13}} = 
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \,
    \frac{\partial z^{(2)}_{1}}{\partial w^{(2)}_{13}}
\end{align*}
$$

Update weights for $l=1$:
$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(1)}_{00}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{00}} + &&&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{01}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{01}} + &&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{02}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{02}} + \\
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{00}} &&&&&
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{01}} &&&&
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{0}} \,
    \frac{\partial a^{(1)}_{0}}{\partial z^{(1)}_{0}} \,
    \frac{\partial z^{(1)}_{0}}{\partial w^{(1)}_{02}}
\end{align*}
$$


$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(1)}_{10}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{10}} + &&&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{11}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{11}} + &&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{12}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{12}} + \\
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{10}} &&&&&
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{11}} &&&&
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{1}} \,
    \frac{\partial a^{(1)}_{1}}{\partial z^{(1)}_{1}} \,
    \frac{\partial z^{(1)}_{1}}{\partial w^{(1)}_{12}}
\end{align*}
$$


$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(1)}_{20}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{10}} + &&&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{21}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{11}} + &&&&
    %
    \frac{\partial MSE}{\partial w^{(1)}_{22}} &= 
    \frac{\partial MSE}{\partial y^{(2)}_{0}} \,
    \frac{\partial y^{(2)}_{0}}{\partial z^{(2)}_{0}} \, 
    \frac{\partial z^{(2)}_{0}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{12}} + \\
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{20}} &&&&&
    % 
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{21}} &&&&
    %
    &
    \frac{\partial MSE}{\partial y^{(2)}_{1}} \,
    \frac{\partial y^{(2)}_{1}}{\partial z^{(2)}_{1}} \, 
    \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_{2}} \,
    \frac{\partial a^{(1)}_{2}}{\partial z^{(1)}_{2}} \,
    \frac{\partial z^{(1)}_{2}}{\partial w^{(1)}_{22}}
\end{align*}
$$

Decompose

$$
\begin{align*}
    \delta^{(L)} &= \frac{\partial MSE}{\partial y} \,
    \frac{\partial y}{\partial z^{(L)}} \\
    &= 
    \frac{\partial MSE}{\partial a^{(L)}} \,
    \frac{\partial a^{(L)}}{\partial z^{(L)}} \\
\end{align*}
$$

$$
\begin{align*}
    \delta^{(L-1)} &= \delta^{(L)} \,
    \frac{\partial z^{(L)}}{\partial a^{(L-1)}} \,
    \frac{\partial a^{(L-1)}}{\partial z^{(L-1)}} \\
    %
    \delta^{(L-2)} &= \delta^{(L-1)} \,
    \frac{\partial z^{(L-1)}}{\partial a^{(L-2)}} \,
    \frac{\partial a^{(L-2)}}{\partial z^{(L-2)}} \\
    & \vdots \\
    \delta^{(l-1)} &= \delta^{(l)} \,
    \frac{\partial z^{(l)}}{\partial a^{(l-1)}} \,
    \frac{\partial a^{(l-1)}}{\partial z^{(l-1)}}
\end{align*}
$$

Therefore

$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(l)}_{ij}} = \delta^{(l)} \, \frac{\partial z^{(l)}}{\partial w^{(l)}_{ij}}
\end{align*}
$$

Scalar mode:

$$
\begin{align*}
    \delta^{(L)}_{i} &= 
    \frac{\partial MSE}{\partial a^{(L)}_{i}} \,
    \frac{\partial a^{(L)}_{i}}{\partial z^{(L)}_{i}}
    \hspace{75pt}
    where
    \hspace{10pt}
    0 \le i < len(L) \\
    %
    \delta^{(L-1)}_{k} &= 
    \sum^{len(L)}_{i=0} \delta^{(L)}_{i} \,
    \frac{\partial z^{(L)}_{i}}{\partial a^{(L-1)}_{k}} \,
    \frac{\partial a^{(L-1)}_{k}}{\partial z^{(L-1)}_{k}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le k < len(L-1) \\
    %
    &\vdots\\
    \delta^{(l-1)}_{k} &= 
    \sum^{len(l)}_{i=0} \delta^{(l)}_{i} \,
    \frac{\partial z^{(l)}_{i}}{\partial a^{(l-1)}_{k}} \,
    \frac{\partial a^{(l-1)}_{k}}{\partial z^{(l-1)}_{k}}
    \hspace{45pt}
    where
    \hspace{10pt}
    0 \le k < len(l-1)
\end{align*}
$$

Therefore:

$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(L)}_{ij}} &= \delta^{(L)}_{i} \, 
    \frac{\partial z^{(L)}_{i}}{\partial w^{(L)}_{ij}}
    \hspace{50pt}
    where
    \hspace{10pt}
    0 \le i < len(L) &
    0 \le j < len(L-1) \\
    %
    \frac{\partial MSE}{\partial w^{(L-1)}_{ij}} &= \delta^{(L-1)}_{i} \, 
    \frac{\partial z^{(L-1)}_{i}}{\partial w^{(L-1)}_{ij}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le i < len(L-1) &
    0 \le j < len(L-2) \\
    %
    & \vdots \\
    \frac{\partial MSE}{\partial w^{(l-1)}_{ij}} &= \delta^{(l-1)}_{i} \, 
    \frac{\partial z^{(l-1)}_{i}}{\partial w^{(l-1)}_{ij}}
    \hspace{40pt}
    where
    \hspace{10pt}
    0 \le i < len(l-1) &
    0 \le j < len(l-2)
\end{align*}
$$

## Implementation

In [166]:
import numpy as np

np.random.seed(0)

### Forward pass

In [167]:
print("Input x:")
x = np.array([[1, 3]]).T
print(x, "\n")

print("Add 1 to input x:")
x = np.array([[1, 3, 1]]).T
print(x, "\n")

print("Layer 1:")

W_l1 = np.random.rand(3, x.shape[0])
print("W_l1", "\n", W_l1, "\n")

Input x:
[[1]
 [3]] 

Add 1 to input x:
[[1]
 [3]
 [1]] 

Layer 1:
W_l1 
 [[0.5488135  0.71518937 0.60276338]
 [0.54488318 0.4236548  0.64589411]
 [0.43758721 0.891773   0.96366276]] 



In [168]:
x.shape[0]

3

In [169]:
l1_size = 3
l2_size = 2

print("Input x:")
x = np.array([[1, 3]]).T
print(x, "\n")

print("Add 1 to input x:")
x = np.array([[1, 3, 1]]).T
print(x, "\n")

print("Layer 1:")
W_l1 = np.random.rand(l1_size, x.shape[0])
z_l1 = W_l1 @ x
a_l1 = 1 / (1 + np.exp(-z_l1))
a_l1 = np.vstack([a_l1, 1]) # add 1 input
print("W_l1", "\n", W_l1, "\n")
print("z_l1", "\n", z_l1, "\n")
print("a_l1", "\n", a_l1, "\n")

print("Layer 2:")
W_l2 = np.random.rand(l2_size, a_l1.shape[0])
z_l2 = W_l2 @ a_l1
print("W_l2", "\n", W_l2, "\n")
print("z_l2", "\n", z_l2, "\n")

y = z_l2
print("y", "\n", y, "\n")

t = np.random.randint(5, size=(2,1))
print("t" "\n", t, "\n")
MSE = 0.5 * ( (t[0] - y[0])** 2 + (t[1] - y[1])** 2 )
print("MSE", "\n", MSE, "\n")

Input x:
[[1]
 [3]] 

Add 1 to input x:
[[1]
 [3]
 [1]] 

Layer 1:
W_l1 
 [[0.38344152 0.79172504 0.52889492]
 [0.56804456 0.92559664 0.07103606]
 [0.0871293  0.0202184  0.83261985]] 

z_l1 
 [[3.28751155]
 [3.41587053]
 [0.98040434]] 

a_l1 
 [[0.96399789]
 [0.96819686]
 [0.72718844]
 [1.        ]] 

Layer 2:
W_l2 
 [[0.77815675 0.87001215 0.97861834 0.79915856]
 [0.46147936 0.78052918 0.11827443 0.63992102]] 

z_l2 
 [[3.10328301]
 [1.92649985]] 

y 
 [[3.10328301]
 [1.92649985]] 

t
 [[1]
 [1]] 

MSE 
 [2.64110069] 



## Backward pass

$$
\begin{align*}
    \delta^{(L)}_{i} &= 
    \frac{\partial MSE}{\partial a^{(L)}_{i}} \,
    \frac{\partial a^{(L)}_{i}}{\partial z^{(L)}_{i}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le i < len(L) \\
    %
    &= (y_i - t_i) \, 1 \\
    &= (y_i - t_i)
\end{align*}
$$

In [170]:
L2_size = y.shape[0]

delta_L2 = np.zeros(L2_size)
for i in range(L2_size):
    delta_L2[i] = y[i] - t[i]

print(f"Dimension of L2-layer: {L2_size}")
print(f"delta_L2: {delta_L2}")

Dimension of L2-layer: 2
delta_L2: [2.10328301 0.92649985]


$$
\begin{align*}    
    \frac{\partial MSE}{\partial w^{(L)}_{ij}} &= \delta^{(L)}_{i} \, \frac{\partial z^{(L)}_{i}}{\partial w^{(L)}_{ij}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le i < len(L) &
    0 \le j < len(L-1) \\
    %
    &= \delta^{(L)}_{i} \, a^{(L-1)}_{j}
\end{align*}
$$

In [171]:
l1_size = a_l1.shape[0]

W_l2_grad = np.zeros_like(W_l2)
for i in range(L2_size):
    for j in range(l1_size):
        W_l2_grad[i,j] = delta_L2[i] * a_l1[j]

print("W_grad of layers 2:")
print(W_l2_grad)

W_grad of layers 2:
[[2.02756038 2.03639201 1.52948309 2.10328301]
 [0.8931439  0.89703425 0.67373998 0.92649985]]


 $$
\begin{align*}   
    \delta^{(L-1)}_{k} &= 
    \sum^{len(L)}_{i=0} \delta^{(L)}_{i} \,
    \frac{\partial z^{(L)}_{i}}{\partial a^{(L-1)}_{k}} \,
    \frac{\partial a^{(L-1)}_{k}}{\partial z^{(L-1)}_{k}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le k < len(L-1) \\
    %
    &= \sum^{len(L)}_{i=0} \delta^{(L)}_{i} \, w^{(L)}_{ik} \, a^{(L-1)}_{k} \, (1 - a^{(L-1)}_{k})
\end{align*}
$$

In [172]:
l1_size = z_l1.shape[0]

delta_l1 = np.zeros(l1_size)
for k in range(l1_size):
    acc = 0
    for i in range(L2_size):
        acc += delta_L2[i] * W_l2[i,k] * a_l1[k] * (1 - a_l1[k])
    delta_l1[k] = acc

print(f"Dimension of L1-layer: {l1_size}")
print(f"delta_L1: {delta_l1}")

Dimension of L1-layer: 3
delta_L1: [0.07164158 0.07861249 0.43007826]


$$
\begin{align*}
    \frac{\partial MSE}{\partial w^{(L-1)}_{ij}} &= \delta^{(L-1)}_{i} \,
    \frac{\partial z^{(L-1)}_{i}}{\partial w^{(L-1)}_{ij}}
    \hspace{35pt}
    where
    \hspace{10pt}
    0 \le i < len(L-1) &
    0 \le j < len(L-2) \\
    %
    &= \delta^{(L-1)}_{i} \, a^{(L-2)}_{j} \\
    &= \delta^{(L-1)}_{i} \, x_{j} \\
\end{align*}
$$

In [173]:
l0_size = x.shape[0]

W_l1_grad = np.zeros_like(W_l1)
for i in range(l1_size):
    for j in range(l0_size):
        W_l1_grad[i,j] = delta_l1[i] * x[j]

print("W_grad of layers 1:")
print(W_l1_grad)

W_grad of layers 1:
[[0.07164158 0.21492474 0.07164158]
 [0.07861249 0.23583748 0.07861249]
 [0.43007826 1.29023479 0.43007826]]


## Gradient descent

In [174]:
mu = 0.01

In [175]:
print("w_l1 before:")
print(W_l1)
W_l1 = W_l1 - mu * W_l1_grad

print("\n")

print("w_l1 new:")
print(W_l1)

w_l1 before:
[[0.38344152 0.79172504 0.52889492]
 [0.56804456 0.92559664 0.07103606]
 [0.0871293  0.0202184  0.83261985]]


w_l1 new:
[[0.3827251  0.78957579 0.5281785 ]
 [0.56725844 0.92323826 0.07024993]
 [0.08282852 0.00731605 0.82831906]]


In [176]:
print("w_l2 before:")
print(W_l2)
W_l2 = W_l2 - mu * W_l2_grad

print("\n")

print("w_l2 new:")
print(W_l2)

w_l2 before:
[[0.77815675 0.87001215 0.97861834 0.79915856]
 [0.46147936 0.78052918 0.11827443 0.63992102]]


w_l2 new:
[[0.75788115 0.84964823 0.96332351 0.77812573]
 [0.45254792 0.77155883 0.11153703 0.63065602]]


In [177]:
print("BEFORE")
print("MSE", "\n", MSE, "\n")

x = np.array([[1,3]]).T
x = np.vstack([x, [1]])
z_l1 = W_l1 @ x
a_l1 = 1 / (1 + np.exp(-z_l1))
a_l1 = np.vstack([a_l1, 1]) # add 1 input
z_l2 = W_l2 @ a_l1
y = z_l2
MSE = 0.5 * ( (t[0] - y[0])** 2 + (t[1] - y[1])** 2 )

print("AFTER")
print("MSE", "\n", MSE, "\n")

BEFORE
MSE 
 [2.64110069] 

AFTER
MSE 
 [2.44414192] 



## Problem

In [233]:
X = np.random.rand(1000, 2)
#y = 50 * np.exp(np.sin(X)) # too difficult?
y = 15 * np.sin(X)

In [234]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [235]:
X_train.shape, y_train.shape

((670, 2), (670, 2))

In [236]:
in_size = 2
hid_1_size = 3
out_size = 2

sizes = [in_size, hid_1_size, out_size]

In [237]:
W = []

W.append(np.random.rand(sizes[1], sizes[0]))
print("W_l1 | Layer 1, with index 0")
print(W[0])

print("\n")

W.append(np.random.rand(sizes[2], sizes[1]))
print("W_l2 | Layer 2, with index 1")
print(W[1])

W_l1 | Layer 1, with index 0
[[0.74669869 0.96823925]
 [0.31578774 0.45369754]
 [0.47384458 0.28591303]]


W_l2 | Layer 2, with index 1
[[0.28946358 0.33217074 0.73033831]
 [0.7637502  0.85113377 0.67101466]]


In [238]:
z = []
a = []

# Layer 1, with index 0
z.append(np.zeros(sizes[1]))
a.append(np.zeros(sizes[1]))
print("z_l1 | Layer 1, with index 0:")
print(z[0])
print("a_l1 | Layer 1, with index 0:")
print(a[0])

print("\n")

# Layer 2, with index 1
a.append(np.zeros(sizes[2]))
z.append(np.zeros(sizes[2]))
print("z_l2 | Layer 2, with index 1:")
print(z[1])
print("a_l2 = y | Layer 2, with index 1:")
print(a[1])

z_l1 | Layer 1, with index 0:
[0. 0. 0.]
a_l1 | Layer 1, with index 0:
[0. 0. 0.]


z_l2 | Layer 2, with index 1:
[0. 0.]
a_l2 = y | Layer 2, with index 1:
[0. 0.]


In [239]:
def forward(xi, log=False):

    # Input
    if log: print("Input x:")
    x = np.array([xi]).T
    if log: print(x, "\n")

    # Layer 1
    if log: print("Layer 1:")
    z[0] = W[0] @ x
    a[0] = 1 / (1 + np.exp(-z[0]))
    if log: print("W_l1", "\n", W[0], "\n")
    if log: print("z_l1", "\n", z[0], "\n")
    if log: print("a_l1", "\n", a[0], "\n")

    # Layer 2
    if log: print("Layer 2:")
    z[1] = W[1] @ a[0]
    a[1] = z[1]
    if log: print("W_l2", "\n", W[1], "\n")
    if log: print("z_l2", "\n", z[1], "\n")
    if log: print("y", "\n", a[1], "\n")

    return a[1]


def loss_MSE(x, t, log=False):
    # MSE
    if log: print("t" "\n", t, "\n")

    try:
        MSE = 0.5 * ( (t[0] - x[0])** 2 + (t[1] - x[1])** 2 )
        return MSE
    except Exception as e:
        print(e)


def backward(t):

    # Initialization
    W_grad = []
    for w in W:
        W_grad.append(np.zeros_like(w))

    delta = []
    for i in range(1, len(sizes)):
        delta.append(np.zeros(sizes[i]))

    # -------------------------------
    # LAYER 2
    # compute delta
    for i in range(sizes[2]):
        delta[1][i] = a[1][i] - t[i]

    # compute W_grad
    for i in range(sizes[2]):
        for j in range(sizes[1]):
            W_grad[1][i,j] = delta[1][i] * a[0][j]

    # -------------------------------
    # LAYER 1
    # compute delta
    for k in range(sizes[1]):
        acc = 0
        for i in range(sizes[2]):
            acc += delta[1][i] * W[1][i,k] * a[0][k] * (1 - a[0][k])
        delta[0][k] = acc

    # compute W_grad
    for i in range(sizes[1]):
        for j in range(sizes[0]):
            W_grad[0][i,j] = delta[0][i] * x[j]

    # UPDATE WEIGHTS
    mu = 0.00001
    for idx, w_grad in enumerate(W_grad):
        W[idx] = W[idx] - mu * w_grad

In [240]:
EPOCHS = 300
for ep in range(EPOCHS):
    for i in range(X_train.shape[0]):
        target = y_train[i]
        out = forward(X_train[i])
        loss = loss_MSE(out, target)
        backward(target)

    if ep%10==0:
        loss = 0
        for j in range(X_train.shape[0]):
            target = y_train[j]
            out = forward(X_train[j])
            loss += loss_MSE(out, target)
        l_train = loss / X_train.shape[0]

        loss = 0
        for j in range(X_test.shape[0]):
            target = y_test[j]
            out = forward(X_test[j])
            loss += loss_MSE(out, target)
        l_test = loss / X_test.shape[0]

        print(f"Epoch: {ep}, \t Test loss: {l_test}, \t Train loss: {l_train}")

Epoch: 0, 	 Test loss: [43.95948972], 	 Train loss: [45.0018069]
Epoch: 10, 	 Test loss: [37.46226462], 	 Train loss: [38.31239628]
Epoch: 20, 	 Test loss: [31.34839281], 	 Train loss: [32.00964916]
Epoch: 30, 	 Test loss: [26.13928967], 	 Train loss: [26.6355257]
Epoch: 40, 	 Test loss: [22.01773909], 	 Train loss: [22.37745752]
Epoch: 50, 	 Test loss: [18.90758545], 	 Train loss: [19.15606628]
Epoch: 60, 	 Test loss: [16.62757953], 	 Train loss: [16.78534274]
Epoch: 70, 	 Test loss: [14.98516174], 	 Train loss: [15.06855663]
Epoch: 80, 	 Test loss: [13.81419841], 	 Train loss: [13.83629569]
Epoch: 90, 	 Test loss: [12.98386605], 	 Train loss: [12.95519922]
Epoch: 100, 	 Test loss: [12.3960044], 	 Train loss: [12.32512389]
Epoch: 110, 	 Test loss: [11.97904215], 	 Train loss: [11.87292684]
Epoch: 120, 	 Test loss: [11.68175715], 	 Train loss: [11.54612643]
Epoch: 130, 	 Test loss: [11.46796921], 	 Train loss: [11.30752554]
Epoch: 140, 	 Test loss: [11.31237909], 	 Train loss: [11.1310

In [241]:
# Prediction
for i in range(5):
    out = forward(X_test[i])
    loss = loss_MSE(out, y_test[i])
    print(f"input \t{X_test[i][0]} {X_test[i][1]}")
    print(f"target \t{y_test[i][0]} {y_test[i][1]}")
    print(f"out \t{out[0][0]} {out[1][0]}")
    print(f"loss \t{loss} \n")

input 	0.6054662992640721 0.6300245595074443
target 	8.53718319800151 8.83746903783294
out 	7.0828813315987516 7.26092870559477
loss 	[2.30023667] 

input 	0.6525266404212424 0.05851174583322949
target 	9.107938340591033 0.8771754676180992
out 	5.656536016550342 5.794287243201136
loss 	[18.04508311] 

input 	0.9950727226478373 0.24058989751789528
target 	12.58197842465046 3.5741336234480428
out 	6.676524804947052 6.839117833788919
loss 	[22.76725217] 

input 	0.41620098844153264 0.2524901502675777
target 	6.064330252047687 3.7472389084557305
out 	5.8695629016157325 6.016078581608252
loss 	[2.59278389] 

input 	0.45722022198423384 0.9436332676510186
target 	6.621833518658247 12.145434349640754
out 	7.497821335317667 7.688697472024757
loss 	[10.31492913] 

