# Demo:  Computing Gradients

Most numerical optimization methods require that we compute gradients of the loss function that we are attempting to minimize.  In this demo, we illustrate how to compute gradients efficiently in python for a few simple examples.  As much as possible, we avoid for loops for fast implementation.

In [1]:
import numpy as np

## Example 1:  A Simple Vector-Input Function

Suppose `f(w) = w_0^2 + 2w_0w_1^3`.  Then the function and gradient at `w=[2,4]` can be computed as:

In [2]:
# Point to evaluate 
w = np.array([2,4])

# Function
f = w[0]**2 + 2*w[0]*(w[1]**3)

# Gradient
df0 = 2*w[0]+2*(w[1]**3)
df1 = 6*w[0]*(w[1]**2)
fgrad = np.array([df0, df1])

## Example 2:  Non-Linear Least Squares for an Exponential Model

Consider an exponential model 

    yhat = a*exp(-b*x)
    
for parameters `w=[a,b]`.  Given training data `(x[i],y[i])` a natural loss function is given by

    J(w) := \sum_i (y[i] - yhat[i])**2,   yhat[i] = a*exp(-b*x[i])
    
The following code computes the the loss function `J(w)` and its gradient `dJ/dw`.

In [3]:
# Generate some random data
ny = 100
y = np.random.randn(ny)
x = np.random.rand(ny)

# Some arbitrary parameters 
# to compute the gradient at
a = 1
b = 2

# Compute the loss function
yerr = y-a*np.exp(-b*x)
J = np.sum(yerr**2)

# Compute the gradient
dJ_da = -np.sum( yerr*np.exp(-b*x))
dJ_db = np.sum( yerr*a*x*np.exp(-b*x))
Jgrad = np.array([dJ_da, dJ_db])

## Example 3:  A Function of a Matrix.

Suppose `f(W) = a'*W*b`.  Then, `fgrad(W) = a*b.T`.

In [4]:
# Some random data
m = 4
n = 3
W = np.random.randn(m,n)
a = np.random.randn(m)
b = np.random.randn(n)

# Function
f = a.dot(W.dot(b))

# Gradient -- Use python broadcasting
fgrad = a[:,None]*b[None,:]
