# Gradient Descent

A [first order](overview.ipynb#First-Order-Methods) method. Assume `f` is a continuous and twice diferentiable function, and we want to solve: $min_{x} f(x)$

An intuitive approach is to start at some initial point, and iteratively move in the direction that decreases `f`.<br>
A natural choice in the direction, is the negative [gradient](../calculus/gradients.ipynb): <br>
$x^{(k+1)} = x^{(k)} - t_k \nabla f(x^{(k)})$ where $t_k$ is a step size


the algorithm is as follows:<br>

1: guess $x^{(0)}$, set k = 0<br>
2: while ||$\nabla f(x^{(k)})|| \geq \epsilon$ do <br>
&nbsp;&nbsp;&nbsp;&nbsp;3: $x^{(k+1)} = x^{(k)} - t_k \nabla f(x^{(k)})$<br>
&nbsp;&nbsp;&nbsp;&nbsp;4: k += 1<br>
5: end while<br>
6: return $x^{(k)}$<br>

Let's look at this algo with a simple example: $x^2$

We know from the start that $\nabla f(x) = 2x$

In [1]:
import numpy as np

def gradient_descent(f, gf, tk):
    x = np.random.randint(0, 1000) 
    print(f"random initialization: {x}")
    eps = 1.0
    steps = 0
    
    while np.abs(gf(x)) >= eps:
        x_new = x - tk * gf(x)
        print(f"{x:.4f} => {x_new:.4f}")
        x = x_new
        steps += 1
    print(f"finished in {steps} steps")

tk = 0.25
gradient_descent(lambda x: x ** 2, lambda x: 2 * x, tk)

random initialization: 973
973.0000 => 486.5000
486.5000 => 243.2500
243.2500 => 121.6250
121.6250 => 60.8125
60.8125 => 30.4062
30.4062 => 15.2031
15.2031 => 7.6016
7.6016 => 3.8008
3.8008 => 1.9004
1.9004 => 0.9502
0.9502 => 0.4751
finished in 11 steps


How quickly we find the solution depends on the step size. We can alter our current approach to adoptively adjust the step size. 

<strong>Extact line search:</strong><br>
In each iteration choose the step that minimizes $f(x^{(k+1)})$

$argmin_{t\geq0} f(x^{(k)} - t\nabla f(x^{(k)})$

In [3]:
f = lambda x: x ** 2
gf = lambda x: 2 * x
x = np.random.randint(0, 1000)
res = list(map(lambda t: f(x - t * gf(x)), np.arange(0, 1, 0.1)))
print(res)
np.argmin(res)

[1600.0, 1024.0, 576.0, 255.9999999999999, 64.0, 0.0, 64.00000000000011, 256.0000000000002, 576.0, 1024.0]


5

In [4]:
def gradient_descent_line_search(f, gf):
    x = np.random.randint(0, 1000) 
    print(f"random initialization: {x}")
    eps = 1.0
    steps = 0
    tk = 1
    tk_range = np.arange(0, 1.0, 0.1)
    
    while np.abs(gf(x)) >= eps:
        tk_i = np.argmin(list(map(lambda t: f(x - t * gf(x)), tk_range)))
        tk = tk_range[tk_i]
        print(f"chosen step: {tk}")
        x_new = x - tk * gf(x)
        print(f"{x:.4f} => {x_new:.4f}")
        x = x_new
        steps += 1
    print(f"finished in {steps} steps")

gradient_descent_line_search(lambda x: x ** 2, lambda x: 2 * x)

f_ex = lambda x1, x2: 4*x1**2 + 2*x2**2 - 4*x1*x2
gf_ex = np.array([lambda x1, x2: 8*x1 - 4*x2, 
                  lambda x1, x2: 4*x2 - 4*x1])

random initialization: 543
chosen step: 0.5
543.0000 => 0.0000
finished in 1 steps


<strong>Backtracking line search:</strong><br>
Start with an initial t and then in iteration $k$, use $\frac{t}{2^{(k-1)}}$ or in general $t^*C$ where $C\in(0,1)$

In [5]:
def find_step(f, gf, x, c=0.9):
    t = np.random.rand()
    
    while(f(x + t * gf(x)) < f(x)):
        t = t * c # backtracking blind search
    return t
    

def gradient_descent_exact_line(f, gf):
    x = np.random.randint(0, 1000) 
    print(f"random initialization: {x}")
    eps = 1.0
    steps = 0
    tk = 1
    
    while np.abs(gf(x)) >= eps:
        tk = find_step(f, gf, x)
        print(f"chosen step: {tk:0.4f}")
        x_new = x - tk * gf(x)
        print(f"{x:.4f} => {x_new:.4f}")
        x = x_new
        steps += 1
    print(f"finished in {steps} steps")
    
gradient_descent_exact_line(lambda x: x ** 2, lambda x: 2 * x)

random initialization: 846
chosen step: 0.1044
846.0000 => 669.3692
chosen step: 0.7397
669.3692 => -320.9215
chosen step: 0.6443
-320.9215 => 92.6214
chosen step: 0.6007
92.6214 => -18.6573
chosen step: 0.4067
-18.6573 => -3.4821
chosen step: 0.8668
-3.4821 => 2.5547
chosen step: 0.5018
2.5547 => -0.0091
finished in 7 steps
