# Lecture 7 notebook
## CS152 September 26, 2018  Neil Rhodes

Given $f(x_1, x_2, x_3, w_1, w_2, b) = \frac{1}{1 + e^{-(w_1x_1 + w_2x_2 + b)}}$, we want to find $\frac{\partial{f}}{{w_1}}$.

Let's define this as a one-layer neural network with sigmoid activation function. We can use NumPy to do matrix calculations.

In [3]:
import numpy as np

def sigmoid_np(a):
    return 1/(1+np.exp(-a))

def sigmoid_np_prime(a):
    return sigmoid_np(a)*(1-sigmoid_np(a))

def relu_np(z):
    return np.maximum(0 ,z)

def relu_np_prime(z):
    # anything in z <= 0 maps to 0. Anything > 0 maps to 1
    return np.where(z <= 0, 0, 1)

def weightedsum_np(A, W, b):
    return np.matmul(W, A) + b

## Numerical Differentiation
Here, we calculate via numeric differentation, the value $\frac{\partial{f}}{{w_1}}$

In [13]:
params = {
    'x': np.array([[3.0], [-2.0]]),
    'W1': np.array([[1.0, 2.0]]),
    'b1':np.array([2.0]),
}

def f(params):
    v = {}
    v['layer1z'] = weightedsum_np(params['x'], params['W1'], params['b1'])
    v['layer1a'] = sigmoid_np(v['layer1z'])
    return v['layer1a'], v                      

f_out, values = f(params)
print(f'f: {f_out}')

# Clone params
params_epsilon = dict(params)
epsilon = .000001
params['W1'][0] += epsilon
fepsilon, _ = f(params_epsilon)

df_dw1 = (fepsilon-f_out)/epsilon
print(f'df/dw1: {df_dw1}')

f: [[0.73105858]]
df/dw1: [[0.19661189]]


We calculate the output value of each node:

## Backpropagation
Now, we'll do backpropagation:

In [21]:
dloss_dlayer1a = np.array([1]) # since f = layer1a, df/dlayer1a = 1
dloss_dlayer1z = np.matmul(dloss_dlayer1a, sigmoid_np_prime(values['layer1z']))
dloss_dW1 = np.matmul(params['x'], dloss_dlayer1z.transpose())
dloss_db1 = dloss_dlayer1z

print(f'dloss/dW1: {dloss_dw1}')
print(f'dloss/db1: {dloss_db1}')

print(f'dloss/dw_1: {dloss_dw1[0]}')

dloss/dW1: [[ 0.5898358 ]
 [-0.39322387]]
dloss/db1: [0.19661193]
dloss/dw_1: [0.5898358]


## Automatic Differentiation using PyTorch
Let's look at how to differentiate using PyTorch

In [11]:
import torch

from torch.autograd import Variable
from torch import Tensor

def f(x1, x2, w1, w2, b):
    weightedSum = x1*w1 + x2*w2 + b
    return 1/(1+torch.exp(-weightedSum))

x1 = Variable(Tensor([3.0])) # in pyTorch ≥0.4: x1 = Tensor([3.0])
w1 = Variable(Tensor([1.0]), requires_grad=True) # in pyTorch ≥0.4: w1 = Tensor([1.0], requires_grad=True)
x2 = Variable(Tensor([-2.0]))
w2 = Variable(Tensor([2.0]), requires_grad=True)
b = Variable(Tensor([2.0]), requires_grad=True)

result = f(x1, x2, w1, w2, b)
print("f(...) = ", result)

result.backward()
print(f'df/dw1: {w1.grad}')
print(f'df/dw2: {w2.grad:}')
print(f'df/db: {b.grad:}')

f(...) =  Variable containing:
 0.7311
[torch.FloatTensor of size 1]

df/dw1: Variable containing:
 0.5898
[torch.FloatTensor of size 1]

df/dw2: Variable containing:
-0.3932
[torch.FloatTensor of size 1]

df/db: Variable containing:
 0.1966
[torch.FloatTensor of size 1]



Or, using matrix multiplication:

In [46]:
import torch

from torch.autograd import Variable
from torch import Tensor

def sigmoid_layer(w, x, b):
    return sigmoid(torch.mm(w.t(), x)+ b)

def sigmoid(x):
    return 1/(1+torch.exp(-x))

def relu_layer(w, x, b):
    return relu(torch.mm(w.t(), x)+ b)

def relu(x):
    return x.clamp(min=0)


x = Variable(Tensor([[3.0], [-2.0]])) # shape: (2, 1)
w = Variable(Tensor([[1.0], [2.0]]), requires_grad=True) # shape: (2, 1)
b = Variable(Tensor([2.0]), requires_grad=True)

a1 = sigmoid_layer(x, w, b)
print("f(...) = ", a1)

a1.backward()
print(f'df/dw: {w.grad}')
print(f'df/db: {b.grad:}')

f(...) =  Variable containing:
 0.7311
[torch.FloatTensor of size 1x1]

df/dw: Variable containing:
 0.5898
-0.3932
[torch.FloatTensor of size 2x1]

df/db: Variable containing:
 0.1966
[torch.FloatTensor of size 1]

