## Sample Code and Test For Optimization 1 Section

http://cs231n.github.io/optimization-1

SMV Loss Function

$$ L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)j - f(x_i; W){y_i} + 1) \right] + \alpha R(W) $$



In [4]:
import numpy as np

def eval_numerical_gradient(f, x):
    """ 
      a naive implementation of numerical gradient of f at x 
      - f should be a function that takes a single argument
      - x is the point (numpy array) to evaluate the gradient at
    """ 

    fx = f(x) # evaluate function value at original point
    grad = np.zeros(x.shape)
    h = 0.00001

    # iterate over all indexes in x
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:

        # evaluate function at x+h
        ix = it.multi_index
        old_value = x[ix]
        x[ix] = old_value + h # increment by h
        fxh = f(x) # evalute f(x + h)
        x[ix] = old_value # restore to previous value (very important!)

        # compute the partial derivative
        grad[ix] = (fxh - fx) / h # the slope
        it.iternext() # step to next dimension

    return grad

In [5]:
np.arange(10)

## Chain Rule and Backprogragation Intuition and Compound Expressions with the Chain Rule

Keep in mind what the derivatives tell you: They indicate the rate of change of a function with respect to that variable surrounding an infinitesimally small region near a particular point:


### Example 1

$f(x,y,z)=(x+y)z$

$q = x +x$

$f = qz$

$\frac{df}{dz} = q$

$\frac{df}{dq} = z$


$\frac{dq}{dx} = 1$
$\frac{dq}{dy} = 1$

In [3]:
# set some inputs
x = -2; y = 5; z = -4

# perform the forward pass
q = x + y # q becomes 3
f = q * z # f becomes -12

# perform the backward pass (backpropagation) in reverse order:
# first backprop through f = q * z
dfdz = q # df/dz = q, so gradient on z becomes 3
dfdq = z # df/dq = z, so gradient on q becomes -4
# now backprop through q = x + y
dfdx = 1.0 * dfdq # dq/dx = 1. And the multiplication here is the chain rule!
dfdy = 1.0 * dfdq # dq/dy = 1


## Modularity: Sigmoid example

In [6]:
import math
w = [2,-3,-3] # assume some random weights and data
x = [-1, -2]

# forward pass
dot = w[0]*x[0] + w[1]*x[1] + w[2]
f = 1.0 / (1 + math.exp(-dot)) # sigmoid function

# backward pass through the neuron (backpropagation)
ddot = (1 - f) * f # gradient on dot variable, using the sigmoid gradient derivation
dx = [w[0] * ddot, w[1] * ddot] # backprop into x
dw = [x[0] * ddot, x[1] * ddot, 1.0 * ddot] # backprop into w
# we're done! we have the gradients on the inputs to the circuit

In [7]:
dx

[0.3932238664829637, -0.5898357997244456]

In [8]:
dw

[-0.19661193324148185, -0.3932238664829637, 0.19661193324148185]

## Backprop in practice: Staged computation

$f(x,y) = \frac{x + \sigma(y)}{\sigma(x) + (x+y)^2}$

Not really useful apart from showing up backpropragation.

In [9]:
x = 3 # example values
y = -4

# forward pass
sigy = 1.0 / (1 + math.exp(-y)) # sigmoid in numerator   #(1)
num = x + sigy # numerator                               #(2)
sigx = 1.0 / (1 + math.exp(-x)) # sigmoid in denominator #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # denominator                        #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # done!       

$f(x,y) = \frac{x + \sigma(y)}{\sigma(x) + (x+y)^2}$


In [10]:
# backprop f = num * invden 
dnum = invden # gradient on numerator                             #(8)
dinvden = num                                                     #(8)

# backprop invden = 1.0 / den 
dden = (-1.0 / (den**2)) * dinvden                                #(7)


# backprop den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)

# backprop xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)


# backprop xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)


# backprop sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)


# backprop num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)


# backprop sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# done! phew



