## Gradients for vectorized code
https://www.youtube.com/watch?v=d14TUNcbn1k&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=5&t=2253s

Input: $x, y$; Output: $z$. (Locally)  
Local gradient: $\frac{\partial z}{\partial x}$, $\frac{\partial z}{\partial y}$  
Gradients: $\frac{\partial L}{\partial z}$  
Then there is $$\frac{\partial L}{\partial x}=\frac{\partial L}{\partial z}\frac{\partial z}{\partial x}$$  

A vectorized example:
\begin{align}
f(x, W)=\Vert W\cdot x\Vert ^2 = \sum _{i=1} ^n(W\cdot x)_i^2
\end{align}
Where $x\in \hbox{R}^n, W\in \hbox^{n\times n}$.  
Then
\begin{align}
q = W\cdot x &= 
\begin{pmatrix}
W_{1, 1}x_1+\cdots +W_{1, n}x_n\\
\vdots \\
W_{n, 1}x_1+\cdots +W_{n, n}x_n\\
\end{pmatrix}\\
f(q) = \Vert q \Vert^2 &= q_1^2+\cdots + q_n^2\\
\frac{\partial f}{\partial q_i} &= 2q_i\\
\bigtriangledown_q\ f &= 2q\\
\frac{\partial q_k}{\partial W_{i, j}} &= 1_{k=i}x_j\\
\frac{\partial f}{\partial W_{i,j}} 
= \sum_k \frac{f}{q_k} \frac{\partial q_k}{W_{i,j}}
&= \sum (2q_k)(1_k= _i x_j) = 2^k q_i x_j 
\end{align}
* Always check: the gradient with respect to a variable should have the same shape as the variable

In [None]:
class ComputationalGraph(object):
    def forward(inputs):
        # 1. [pass inputs to inout gates...]
        # 2. forward the computational graph:
        for gate i self.graph.nodes_topologically_sorted():
            gate.forward()
        return loss # final gate in the graph outputs the loss
    def backward():
        for gate in reversed(self.graph.node_topographically_sorted()):
            gate.backward() # little pice of backprop (chain rule applied)  
        return input gradients

In [None]:
class MultiplyGate(object):
    def forward(x,y):
        z = x*y
        self.x = x
        self.y = y
        return z
    def backward(dz):
        dx = self.y * dz # [dz/dx * dL/dz]
        dy = self.x * dz # [dz/dy * dL/dz]
        retun [dx, dy]

In [None]:
# E.g. for the SVM:
# receive W, X
scores =  # [f = W*x]
margin =  # [max(0,s_j-s_{y_i}+1)]
data_loss = 
reg_loss = 
loss = data_loss + reg_loss
dmargins = 
dscores = 
dW = 

\begin{align}
&\text{score:}\ &f = Wx\\
&\text{margins:}\ &\text{max}(0, s_j-s_{y_i}+1)
\end{align}
## Summary
* neural nets will be very large: impactical to write down gradient formula by hand for all parameters
* **backpropagation** = recursive application of the chain rule along a computational graph to compute the gradient of all inputs/ parameters/ intermediates
* implementations maintain a praph structure, where the nodes implements the **forward()/backward()** API
* **forward:** compute result of an operation and save any intermediates needed for gradient computtion in memory
* **backward:** apply the chain rule to compute the gradient of the loss function with respect to the inputs

Linear score function: $f=Wx$  
2-layer neural network: $f=W_2\text{max}(0, W_1x)$  
Or 3-layer NN: $f=W_3\max(0, W_2\max(0, W1x))$

In [1]:
# 2-layer NN
import numpy as np
from numpy.random import randn
import matplotlib.pyplot as plt

In [18]:
N, D_in, H, D_out = 64, 1000, 100, 10
x, y = randn(N, D_in), randn(N, D_out)
w1, w2 = randn(D_in, H), randn(H, D_out)

In [19]:
for t in range(200):
    h = 1/(1+np.exp(-x.dot(w1)))
    y_pred = h.dot(w2)
    loss = np.square(y_pred-y).sum()
    # print (loss)
    
    grad_y_pred = 2.0 * (y_pred-y)
    grad_w2 = h.T.dot(grad_y_pred)
    grad_h = grad_y_pred.dot(w2.T)
    grad_w1 = x.T.dot(grad_h*h*(1-h))
    
    w1 -= 1e-4*grad_w1
    w2 -= 1e-4*grad_w2

Like the brain module

In [None]:
class Neuron:
    def neuron_tick(inputs):
        cell_body_sum = np.sum(inputs*self.weights+self.bias)
        firing_rate = 1.0/(1.0+math.exp(-cell_body_sum))
        return firing_rate

In [None]:
f = lambda x: 1.0/(1.0+np.exp(-x))
x = np.random.randn(3, 1)
h1 = f(np.dot(w1, x)+b1)
h2 = f(np.sot(w2, h1)+b2)
out = np.dot(w3, h2)+b3

## Summary2
* We arrange neurons into fully-connected layers
* The abstraction of a layer has the nice-property that it allows us to use efficient vectorized code (e.g. matrix nultiplies)
* NN are not rally nerual