# Gradient Descent

Optimizing functions algorithmically

In [4]:
from linalg import Vector, dot
from typing import Callable

In [3]:
def sum_of_squares(xs: Vector) -> float:
    """
    Return the sum of the square of each element in xs
    """
    # this is equivalent to x dot x
    return dot(x, x)

Consider a function (e.g. a loss function) which reduces a vector to a meaningful float. The main idea of gradient descent is to algorithmically find the inputs that minimize this reducing function

### Terms:
- **Gradient**: The vector of partial deriviates for a vector relative to a function. E.g. if `y = sum_of_squares(xs)`, the gradirent is `dy/dxs` or `[dy/dx_0, dy/dx_1, ... dy/dx_n]`

#### Estimating the Gradient

In [5]:
def difference_quotient(f: Callable[[float], float], x: float, h: float) -> float:
    return (f(x + h) - f(x)) / h


This is the definition of a gradient for a single variable x and function f(x). We can estimate the gradient by just choosing a very small h (e.g. 10**-6). We can also do this for partial-derivatives in a vector calculus setting for f(xs):

In [6]:
def partial_diff_quotient(f: Callable[[Vector], float], xs: Vector, i: int, h: float) -> float:
    w = [x_j + (h if i == j else 0) for j, x_j in enumerate(xs)]  # single out and add h to just the ith element of xs
    return (f(w) - f(xs)) / h  # reflects only the change we made to the ith variable

In [8]:
def estimate_gradient(f: Callable[[Vector], float], xs: Vector, h: float = 10**-4) -> Vector:
    """
    Estimate the gradient of f with respect to xs by computing partial diff quotients element-wise
    """
    # note this is expensive and why auto-grad libraries mathematically compute most derivatives
    return [partial_diff_quotient(f, xs, i, h) for i in range(len(xs))]

### Using the Gradient

TBD