# Gradient Descent

- Gradient gives the input direction in which function most quickly increases.
- Gradient is the vector of the partial derivatives.
- One approach to maximizing a function is to start at a random point, compute the gradient, take a small step in the direction of the gradient and repeat with new starting point.

In [1]:
# Suppose we have some function that takes input as a vector of real numbers and returns a single real number.
def sum_of_squares(v):
    """Computes the sum of squared elements in v """
    return sum(v_i**2 for v_i in v)

## Estimating the gradient

- If f is a function of one variable, its derivative at a point x measures how f(x) changes when we make a very small change to x. It is defined as the limit of difference quotients. 

In [3]:
def difference_quotient(f, x, h):
    return (f(x+h) - f(x)) / h     # as h approaches zero

- The derivative is the slope of tangent line at (x, f(x)), while difference quotient is the slope of the not-quite-tangent line that runs through (x+h, f(x+h)). As h gets smaller and smaller the not-quite-tangent line gets closer and closer to the tangent line.

- When f is a function of many variables, each indicating how f changes when we make small changes in just one of the input variables.
- We calculate the ith partial derivative by treating it as a function of just its ith variable holding ther variables fixed.

In [4]:
def partial_difference_quotient(f, v, i, h):
    """Compute the ith partial difference quotient of f at v"""
    w = [v_j + (h if j==i else 0)   # add h to just the ith element of v
        for j, v_j in enumerate(v)]
    
    return (f(w) - f(v)) / h

In [5]:
def estimate_gradient(f, v, h=0.00001):
    return [partial_difference_quotient(f, v, i, h)
           for i, _ in enumerate(v)]

## Using the gradient

- Let's try to find the minimum among all three-dimensionl vectors.
- We'll just pick a random starting point and then take tiny steps in direction opposite to gradient until we reach a point where gradient is very small.

In [6]:
def step(v, direction, step_size):
    """move step_size in the direction of v"""
    return [v_i + step_size*direction_i 
            for v_i, direction_i in zip(v, direction)]

In [7]:
def sum_of_squares_gradient(v):
    return [2 * v_i for v_i in v]

In [13]:
# Pick a random starting point.
import random
v = [random.randint(-10, 10) for i in range(3)]

tolerance = 0.0000001

In [39]:
def distance(v, w):
    return np.sqrt(sum(v**2, w**2) for v, w in zip(v,w))

In [38]:
while True:
    gradient = sum_of_squares_gradient(v) # compute the gradient at v
    next_v = step(v, gradient, -0.01)     # take a negative gradient step
    if distance(next_v, v) < tolerance:   # stop if converging
        break
    v = next_v

TypeError: '<' not supported between instances of 'generator' and 'float'