# Introduction to Gradient Descent
## Gradient

In [None]:
import math
import numpy as np
import matplotlib.pyplot as plt

Lets start by defining a function

In [None]:
def f(x):
    return 3*x**2 - 4*x + 5

We can, of course, evaluate the function

In [None]:
f(3.0)

Lets plot this

In [None]:
xs = np.arange(-5, 5, 0.25)
ys = f(xs)
plt.plot(xs, ys)
plt.show()

What is derivative meassuring?

df/dx = lim h->0 (f(x+h) - f(x)) / h

In [None]:
h = 0.01
x = 3.0
(f(x + h) - f(x)) / h

In [None]:
h = 0.001
(f(x + h) - f(x)) / h

In [None]:
h = 0.0001
(f(x + h) - f(x)) / h

Solving analytically we find dy/dx = 6*x-4

In [None]:
6 * x - 4

The derivative in the point is the slope, or instaneous increment of the function when the argument incresase:
- since derivative is positive the funcion is increasing in that point
- the increase is proportional to 14 times the increment in the function parameter

Lets try a different value

In [None]:
x = 1
h = 0.0001
(f(x + h) - f(x)) / h

Now the function is also increasing, but now slower

Lets check another value

In [None]:
x = -2
h = 0.0001
(f(x + h) - f(x)) / h

Note than in x=2 the function is decreasing, faster ...

Lets see another point

In [None]:
x = 2/3
h = 0.0001
(f(x + h) - f(x)) / h

It is very close to 0, so the function is neither increasing nor decreasing at that point

A more complex example

In [None]:
a = 2.0
b = -3.0
c = 10.0
f = a*b+c
f

Let see how the function changes with respect to the parameters

In [None]:
h = 0.01
df_da = (((a+h)*b + c) - (a*b + c)) / h
df_da

This -3 means that the function value decreases proportional to 3 times the increment in 'a'. 

Lets see some other increments

In [None]:
df_db = ((a*(b+h) + c) - (a*b + c)) / h
df_dc = ((a*b+c+h) - (a*b+c)) / h

df_da, df_db, df_dc

You can check the values are very close to the analytical "partial derivatives"
How can I modify the parameters a, b, and c if I want to increase the value of f?

In [None]:
h = 0.01
a*b+c, (a-h)*(b+h)+(c+h)

In [None]:
h = -0.01
a*b+c, (a-h)*(b+h)+(c+h)

Gradient: Vector formed by the partial derivatives of a function in a point.

## Gradient Descent

![Gradient Descent](images/image.png)


In [None]:
def f(x):  # Objective function
    return x ** 2

def f_grad(x):  # Gradient (derivative) of the objective function
    return 2 * x

f(5), f_grad(5)

Next, we use as the initial value and assume. Using gradient descent to iterate for 10 times we can see that, eventually, the value of approaches the optimal solution.

In [None]:
def gd(eta, f_grad):
    x = 20.0
    results = [x]
    for i in range(10):
        grad = f_grad(x)
        x = x - eta * grad
        results.append(float(x))
    print(f'epoch 10, x: {x:f}')
    return results

learning_rate = 0.2
results = gd(learning_rate, f_grad)

In [None]:
xs = np.arange(-3, 22, 0.25)
ys = f(xs)
plt.plot(xs, ys)
plt.plot(results, [f(r) for r in results], c='red', marker='o')
plt.show()

### Learning rate

In [None]:
learning_rate = 0.02
results = gd(learning_rate, f_grad)

xs = np.arange(-3, 22, 0.25)
ys = f(xs)
plt.plot(xs, ys)
plt.plot(results, [f(r) for r in results], c='red', marker='o')
plt.show()

In [None]:
learning_rate = 1.1
results = gd(learning_rate, f_grad)

xs = np.arange(min(results), max(results), 0.25)
ys = f(xs)
plt.plot(xs, ys)
plt.plot(results, [f(r) for r in results], c='red', marker='o')
plt.show()