# Gradient Descent

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">吳安容</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

##### Overview
Let $f:\mathbb{R}^2\rightarrow\mathbb{R}$ be a function. 
Our main goal is to find a local minimum of a function $f$. 

We use the method of **gradient descent** to find a local minimum of $f$. 

Define a function $f$ from $\mathbb{R}^2$ to $\mathbb{R}$. 

For example, when $f(x,y)=x^2+y^2$, the algorithm will return an approximation of the location of the minimum $(0,0)$.

##### Algorithm
1. Let $f$ be a function from $\mathbb{R}^2$ to $\mathbb{R}$ and choose $\mathbf{x}_0=(x_0,y_0)=(0,0)$ as an initial point.
2. Then compute $\nabla f(x_i,y_i)$, the gradient of $f$ at $\mathbf{x}_i$, and

> $\mathbf{x}_{i+1}=\mathbf{x}_i-\alpha\cdot\nabla f(x_i,y_i)$  

where $\alpha$ is a small enough number such that the sequence$ (\mathbf{x}_n)$ converges to a local minimum.
3. Repeat Step 2 for several steps.

##### Explanation
The goal is to find a local minimum of a given function.

Given functions are restricted to be two variables and the gradient of function is needed to be calculated by hands.

We set $\alpha = 0.05$ and the step of $x$ denoted by $\epsilon < 10^{-5}$.

The sequence might converge slowly or not converge, so we need to set the maximum iteration times.

##### Implimentation

In [1]:
import numpy as np

def gradient_descent(f, gradf, steps=100, alpha=0.5):
    """
    Input:
        f: a function of x,y that we want to evaluate the minimum
        gradf:gradient of the function f
        steps:the steps of iterations
        alpha:step size multiplier
    Output:
        min:the minimum of f
    """
    v1 = vector(np.random.randn(2)); ### next vector
    v2 = vector(np.random.randn(2)); ### current vector
    for i in range(steps):
        v2 = v1;
        v1 = v2 - alpha * vector(gradf(v2));
        h = v1[0] - v2[0];
        if abs(h) <= epsilon :
            break;
    print("Minimum at",v1," and minimum is",f(v1));    

##### Example 1 
$f(x,y)=x^2+y^2$

In [2]:
epsilon=0.00001;

f = lambda v1: v1[0]^2 + v1[1]^2

gradf = lambda v2: (2*v2[0], 2*v2[1])

gradient_descent(f,gradf, steps=100, alpha=0.5)

('Minimum at', (0.0, 0.0), ' and minimum is', 0.0)


In [3]:
### If we use a large alpha,then the result may be different.

gradient_descent(f,gradf, steps=100, alpha=1)

('Minimum at', (1.236371488782072, -0.781210485847116), ' and minimum is', 2.1389042814706842)


##### Example 2 
$f(x,y)=x^2-y^2+xy-4$

In [4]:
epsilon=0.00001;

f = lambda v1: v1[0]^2 - v1[1]^2 + v1[0]*v1[1] - 4

gradf = lambda v2: (2*v2[0] + v2[1], -2*v2[1]+v2[0])

gradient_descent(f,gradf, steps=100)

('Minimum at', (2.792893696625991e+31, -1.1830887552838372e+32), ' and minimum is', -1.652120563590659e+64)


In [5]:
### example of vectors and functions in SageMath

### tuples are nice but their addition is not what we want
print "(0,0) + (1,0) =", (0,0) + (1,0)

### Use vector instead
v = vector([0,0])
u = vector([1,0])
print "u+v =", u+v

### get vector entries
print "u[0],u[1] =", u[0], u[1]

### a function that takes a vector as input
### syntax:
### lambda input: output
### this is called the lambda method to define a function
f = lambda v: v[0]^2 + v[1]^2
print "f(v) =", f(v)
print "f(u) =", f(u)

### alternatively, you have to do the classical function define
def g(v):
    return v[0]^2 + v[1]^2
### two methods are almost the same except that the lambda method you don't have to give a name to the function.
print "g(v) =", g(v)
print "g(u) =", g(u)

### Therefore, the gradf can be
gradf = lambda v: 2*v[0] + 2*v[1]

(0,0) + (1,0) = (0, 0, 1, 0)
u+v = (1, 0)
u[0],u[1] = 1 0
f(v) = 0
f(u) = 1
g(v) = 0
g(u) = 1
