# 4. Neural Networks

## Introduction to Neural Networks

### Softmax

To calculate the probability that a prediction is of one of many classes:

e^Z_i / e^Z_1 ... e^Z_n

### Cross Entropy

In [2]:
import numpy as np

# Write a function that takes as input two lists Y, P,
# and returns the float corresponding to their cross-entropy.
def cross_entropy(Y, P):
    result = 0.0
    for y, p in zip(Y, P):
        if y == 1:
            result += np.log(p)
        if y == 0:
            result += np.log(1-p)
    
    return (result * (-1))

print(cross_entropy([1,0,1,1], [0.4,0.6,0.1,0.5]))

4.828313737302301


### Gradient Descent

* Error Funciton: E=−yln(y^​)−(1−y)ln(1−y^​)
* Gradient Descent Step: wi′​←wi​+α(y−y^​)xi​. AND b′←b+α(y−y^​),

Pseudo-code for the Gradient Descent Step:

1. Initialize W and b with random values
2. For every point (x1,...,xn)
2.1 For i=1...n
2.1.1 Update wi​+α(y−y^​)xi
2.1.2 Update b+α(y−y^​)
3. Repeat until the error is small (Epochs)



In [2]:
# Defining the sigmoid function for activations
import numpy as np

def sigmoid(x):
    return 1/(1+np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_prime(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Input data
x = np.array([0.1, 0.3])
# Target
y = 0.2
# Input to output weights
weights = np.array([-0.8, 0.5])

# The learning rate, eta in the weight step equation
learnrate = 0.5

# the linear combination performed by the node (h in f(h) and f'(h))
h = x[0]*weights[0] + x[1]*weights[1]
# or h = np.dot(x, weights)

# The neural network output (y-hat)
nn_output = sigmoid(h)

# output error (y - y-hat)
error = y - nn_output

# output gradient (f'(h))
output_grad = sigmoid_prime(h)

# error term (lowercase delta)
error_term = error * output_grad

# Gradient descent step 
del_w = [ learnrate * error_term * x[0],
          learnrate * error_term * x[1]]
# or del_w = learnrate * error_term * x



Set the weight step to zero: Δwi=0\Delta w_i = 0Δwi​=0
For each record in the training data:

    * Make a forward pass through the network, calculating the output y^=f(∑iwixi)\hat y = f(\sum_i w_i x_i)y^​=f(∑i​wi​xi​)
    * Calculate the error term for the output unit, δ=(y−y^)∗f′(∑iwixi)\delta = (y - \hat y) * f'(\sum_i w_i x_i)δ=(y−y^​)∗f′(∑i​wi​xi​)
    * Update the weight step Δwi=Δwi+δxi\Delta w_i = \Delta w_i + \delta x_iΔwi​=Δwi​+δxi​

Update the weights wi=wi+ηΔwi/mw_i = w_i + \eta \Delta w_i / mwi​=wi​+ηΔwi​/m where η\etaη is the learning rate and mmm is the number of records.

Here we're averaging the weight steps to help reduce any large variations in the training data.

Repeat for e epochs.

### Multilayer Perceptrons

In [4]:
import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate output error
error = target - output

# TODO: Calculate error term for output layer
output_error_term = error * output * (1 - output)

# TODO: Calculate error term for hidden layer
hidden_error_term = weights_hidden_output * output_error_term * hidden_layer_output * (1 - hidden_layer_output)

# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate * output_error_term * hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_h = learnrate * hidden_error_term * x[:,None]

print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)


Change in weights for hidden layer to output layer:
[0.00804047 0.00555918]
Change in weights for input layer to hidden layer:
[[ 1.77005547e-04 -5.11178506e-04]
 [ 3.54011093e-05 -1.02235701e-04]
 [-7.08022187e-05  2.04471402e-04]]
