### Gradient descent algorithm

1. Initialize variable x to any random number : $x_0 = \alpha$

2. Pick a value for the learning rate $\eta$.
    - If α is very small, it would take long time to converge.
    - If α is large, it may fail to converge and overshoot the minimum.
    

3. Repeat until convergense :  $ x_{t+1} = x_t - \eta ~ f'(x_t)$

### Simple Example
- Simple function : $f(x) = x^2 + 5sin(x)$
- Derivative function : $f'(x) = 2x + 5cos(x)$
- Gradient descent formula : $ x_{t+1} = x_t - \eta (2x + 5cos(x)) $

In [1]:
# Cost function
def cost(x):
    return x**2 + 5*np.sin(x)

# Derivative function
def grad(x):
    return 2*x + 5*np.cos(x)

In [2]:
import matplotlib.pyplot as plt
import numpy as np

# cost function
X = np.linspace(-5, 5, 100)
Y = [cost(x) for x in X]
plt.plot(X, Y, c='blue')

# optimal point solve by 
# https://www.wolframalpha.com
x = -2221/2000; y =cost(x)
plt.scatter(x, y, c='red')

plt.show(); print((x, y))

<Figure size 640x480 with 1 Axes>

(-1.1105, -3.246394272334107)


In [3]:
# Gradient descent algorithm
def simpleGD(x0, eta, loop):
    x = [x0]
    for it in range(loop):
        x_new = x[-1] - eta*(grad(x[-1]))
        if abs(grad(x_new)) < 1e-6: break
        x.append(x_new)
    return (x, it)

In [4]:
(x, it) = simpleGD(-5.0, .1, 100)
print('optimal point : %f, cost : %f, after %d iterations'%(x[-1], cost(x[-1]), it))

optimal point : -1.110511, cost : -3.246394, after 17 iterations
