# Gradient Descent

### Loss graph

$ \text{loss}(w) = { 1 \over N} \sum^{N}_{n=1} (\hat{y_n}-y_n)^2 = { 1 \over N} \sum^{N}_{n=1} (w*x_n - y_n)^2 $

### What is the learning => find w that minimizes the loss

$ {\min}_w  \text{loss}(w) $

### Gradient descent algorithm


$$ \text{Gradient} \Rightarrow { \partial \text{loss} \over \partial w } $$

$$ w = w - \alpha {\partial \text{loss} \over \partial w} $$ 

$$ \text{loss} = (\hat{y} - y) ^2 = (w*x -y)^2  $$

$$ {\partial \text{loss} \over \partial w} = 2x(w*x - y)  $$

$$ w = w - \alpha * 2 * x (w*x - y)  $$ 

In [19]:
import numpy as np
import matplotlib.pyplot as plt

x_data = [1, 2, 3]
y_data = [2, 4, 6]

w = 1.0  # a random guess (i.e. random value)

# a lineadr model for the forward pass
def forward(x):
    return x * w

# Loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)

def gradient(x, y):
    return 2 * x * (x * w - y)

for epoch in range(50):
    for x, y in zip(x_data, y_data):
        grad = gradient(x, y)
        w = w - 0.01 * grad
        print('\t grad:', x, y, round(grad, 6))
        l = loss(x, y)
    if (epoch%10 == 0) :    
        print("progress:", epoch, "w =", round(w,2), "loss =", round(l,6))
        
print("predict (after training)", "4 hours", forward(4))

	 grad: 1 2 -2.0
	 grad: 2 4 -7.84
	 grad: 3 6 -16.2288
progress: 0 w = 1.26 loss = 4.91924
	 grad: 1 2 -1.478624
	 grad: 2 4 -5.796206
	 grad: 3 6 -11.998147
	 grad: 1 2 -1.093164
	 grad: 2 4 -4.285205
	 grad: 3 6 -8.870374
	 grad: 1 2 -0.80819
	 grad: 2 4 -3.168103
	 grad: 3 6 -6.557974
	 grad: 1 2 -0.597504
	 grad: 2 4 -2.342217
	 grad: 3 6 -4.848389
	 grad: 1 2 -0.441742
	 grad: 2 4 -1.731629
	 grad: 3 6 -3.584472
	 grad: 1 2 -0.326585
	 grad: 2 4 -1.280214
	 grad: 3 6 -2.650043
	 grad: 1 2 -0.241448
	 grad: 2 4 -0.946478
	 grad: 3 6 -1.959209
	 grad: 1 2 -0.178506
	 grad: 2 4 -0.699742
	 grad: 3 6 -1.448466
	 grad: 1 2 -0.131971
	 grad: 2 4 -0.517328
	 grad: 3 6 -1.070869
	 grad: 1 2 -0.097568
	 grad: 2 4 -0.382467
	 grad: 3 6 -0.791706
progress: 10 w = 1.96 loss = 0.011707
	 grad: 1 2 -0.072133
	 grad: 2 4 -0.282762
	 grad: 3 6 -0.585318
	 grad: 1 2 -0.053329
	 grad: 2 4 -0.209049
	 grad: 3 6 -0.432732
	 grad: 1 2 -0.039427
	 grad: 2 4 -0.154553
	 grad: 3 6 -0.319924
	 grad: 1 2 

### Exercise 3.1 : Compute gradient


$$ \hat{y} = x^2w_2 + xw_1 + b $$

$$ \text{loss} = (\hat{y} - y)^2 = (x^2w_2 + xw_1 + b - y)^2 $$

$$ {\partial \text{loss} \over \partial w_1 } =  2x(x^2w_2 + xw_1 + b - y)$$

$$ {\partial \text{loss} \over \partial w_2 } = ? $$

### Exercise 3.2 : implement
