# Backpropagation

How to remove tab indent from several lines in IDLE?
    
    Have you tried Shift+Tab?

In [65]:
import numpy as np

# Inputs and weights
x1 = 0.1
x2 = 0.3
w1 = 0.4
w2 = -0.2

# Hidden weight
hw = 0.1

#target value
y = 1

for i in range(1):
    print("x1[{}] x2[{}] w1[{}] w2[{}] hw[{}] y[{}]".format(x1,x2,w1,w2,hw,y))

    # Assume we're trying to fit some binary data and the target is y=1. 
    # We'll start with the forward pass, first calculating the input to the hidden unit
    # h = ∑iwi​xi = 0.1×0.4−0.2×0.3 = −0.02
    h = w1*x1 + w2*x2
    print("and the output of the hidden unit h")
    print(h)

    # Defining the sigmoid function for activations
    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1/(1+np.exp(-x))

    # and the output of the hidden unit
    # a = f(h) = sigmoid(−0.02) = 0.495.
    a = sigmoid(h)
    print("the output of the hidden unit a")
    print(a)

    # Using this as the input to the output unit, the output of the network is
    # y^ = f(W⋅a) = sigmoid(0.1×0.495) = 0.512.
    y1 = sigmoid(hw*a)
    print("the output of the network is y1")
    print(y1)


    # With the network output, we can start the backwards pass to calculate the weight updates for both layers. 
    # Using the fact that for the sigmoid function 
    # f′(W⋅a) = f(W⋅a)(1−f(W⋅a)), the error for the output unit is
    # δo = (y−y^)f′(W⋅a) = (1−0.512)×0.512×(1−0.512) = 0.122.
    b = (y-y1)*y1*(y-y1)
    print("the error for the output unit is b")
    print(b)


    # Now we need to calculate the error for the hidden unit with backpropagation. 
    # Here we'll scale the error from the output unit by the weight W connecting it to the hidden unit. 
    # For the hidden unit error, δhj = ∑k Wjk δok f′(hj), 
    # but since we have one hidden unit and one output unit, this is much simpler.
    # δh = Wδo f′(h) = 0.1×0.122×0.495×(1−0.495) = 0.003
    bh = hw*b*a*(y-a)
    print("the error for the hidden unit error with backpropagation is bh")
    print(bh)

    # Now that we have the errors, we can calculate the gradient descent steps. 
    # The hidden to output weight step is the learning rate, times the output unit error, 
    # times the hidden unit activation value.
    # ΔW = ηδoa = 0.5×0.122×0.495 = 0.0302
    gdw = y1*b*a
    print("the gradient descent steps is gdw")
    print(gdw)

    # Then, for the input to hidden weights wi, it's the learning rate times the hidden unit error, 
    # times the input values.
    # Δwi = ηδhxi = (0.5×0.003×0.1, 0.5×0.003×0.3) = (0.00015,0.00045)
    #hw = y1*bh*x1,y1*bh*x2
    w1,w2 = y1*bh*x1,y1*bh*x2
    print("input to hidden weights wi is w1,w2")
    print(w1,w2)

x1[0.1] x2[0.3] w1[0.4] w2[-0.2] hw[0.1] y[1]
and the output of the hidden unit h
-0.01999999999999999
the output of the hidden unit a
0.49500016666
the output of the network is y1
0.512372477963
the error for the output unit is b
0.121832235361
the error for the hidden unit error with backpropagation is bh
0.00304550132373
the gradient descent steps is gdw
0.0308996351456
input to hidden weights wi is w1,w2
0.000156043105988 0.000468129317964


From this example, you can see one of the effects of using the sigmoid function for the activations. The maximum derivative of the sigmoid function is 0.25, so the errors in the output layer get reduced by at least 75%, and errors in the hidden layer are scaled down by at least 93.75%! You can see that if you have a lot of layers, using a sigmoid activation function will quickly reduce the weight steps to tiny values in layers near the input. This is known as the vanishing gradient problem. Later in the course you'll learn about other activation functions that perform better in this regard and are more commonly used in modern network architectures.

# Backpropagation exercise
Below, you'll implement the code to calculate one backpropagation update step for two sets of weights. I wrote the forward pass, your goal is to code the backward pass.

Things to do

    Calculate the network error.
    Calculate the output layer error gradient.
    Use backpropagation to calculate the hidden layer error.
    Calculate the weight update steps.

In [75]:
import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate error
error = (target-output)*output*(target-output)
print("error")
print(error)

# TODO: Calculate error gradient for output layer
del_err_output = weights_hidden_output*error*output*(target-output)

# TODO: Calculate error gradient for hidden layer
del_err_hidden = weights_input_hidden*del_err_output*hidden_layer_output*(target-hidden_layer_output)

gdw = output*error*hidden_layer_output
print("the gradient descent steps is gdw")
print(gdw)

# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = output*del_err_output*hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_h = output*del_err_hidden*x[:, None]

print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)

error
0.00641673759167
the gradient descent steps is gdw
[ 0.0017418   0.00120428]
Change in weights for hidden layer to output layer:
[  9.71659803e-06  -2.01541576e-05]
Change in weights for input layer to hidden layer:
[[  9.78615691e-08   1.28793638e-06]
 [  3.91446276e-09   8.58624255e-08]
 [ -7.82892553e-09   6.01036978e-07]]


error is wrong. error should be 0.11502656915

In [76]:
import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate error
error = target - output
print("error")
print(error)

# TODO: Calculate error gradient for output layer
del_err_output = error * output * (1 - output)

# TODO: Calculate error gradient for hidden layer
del_err_hidden = np.dot(del_err_output, weights_hidden_output) * \
                 hidden_layer_output * (1 - hidden_layer_output)

# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate * del_err_output * hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_h = learnrate * del_err_hidden * x[:, None]

print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)

error
0.11502656915
Change in weights for hidden layer to output layer:
[ 0.00804047  0.00555918]
Change in weights for input layer to hidden layer:
[[  1.77005547e-04  -5.11178506e-04]
 [  3.54011093e-05  -1.02235701e-04]
 [ -7.08022187e-05   2.04471402e-04]]
