# Perceptrons

## Gradient descent

### Calculate the step

Prediction:
$$
ŷ = f(\sum(w*x))
$$

Squared error:
$$
E  = \sum(y-ŷ)^2
$$

Function to minimize. The free variable is w.
$$
E \equiv \frac{1}{2}\sum(y-f(\sum(w*x)))^2
$$

Applying chain rule:
$$
\begin{align}
\frac{\partial E}{\partial w} &= \frac{\partial \frac{\sum(y-ŷ)^2}{2}}{\partial ŷ} * \frac{\partial -f(\sum(w*x))}{\partial \sum(w*x)} * \frac{\partial \sum(w*x)}{\partial w} \\
\\
 &= -(y-ŷ)*f'(\sum(w*x)*x \\
\end{align}
$$

Sigmoid derivative
$$
f(x)=1/(1+e^{−x}) \rightarrow f'(h)=f(h)(1−f(h))
$$

Updating weights:
$$
\Delta w = -\eta \frac{\partial E}{w}
$$

*Note: the derivative measures instant increment. Since we want to reduce the error, we change the direction, hence the '$-$' symbol.*

Weights increment:
$$
\Delta w = \eta*(y-ŷ)*ŷ*(1-ŷ)*x \\
\Delta w = \eta \delta x
$$
* $\eta$: learning rate
* $\delta$: $(y-ŷ)*ŷ*(1-ŷ)$ it is error dependant
* $x$: sample 

In [None]:
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

learnrate = 0.5
x = np.array([1, 2])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5])

In [None]:
# Calculate one gradient descent step for each weight
# TODO: Calculate output of neural network
nn_output = sigmoid(w.dot(x.reshape(2,1)))

# TODO: Calculate error of neural network
error = y-nn_output

# TODO: Calculate change in weights
del_w = learnrate * error * nn_output * (1 - nn_output) * x

print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

### Gradient Loop implementation

In [1]:
# Data prep

import numpy as np
import pandas as pd

admissions = pd.read_csv('data/binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

In [2]:
#import numpy as np
#from data_prep import features, targets, features_test, targets_test

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

In [7]:
for e in range(epochs):
    # TODO: Calculate the output
    output = sigmoid(weights.dot(features.as_matrix().T))
    # TODO: Calculate the error
    error = targets-output
    # TODO: Update weights
    weights += (error * output * (1 - output)).dot(features.as_matrix())/n_records*learnrate

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

('Train loss: ', 0.19696267698081696)
('Train loss: ', 0.19696238351582482)
('Train loss: ', 0.19696218155383541)
('Train loss: ', 0.19696204235674888)
('Train loss: ', 0.19696194630046757)
('Train loss: ', 0.19696187994685732)
('Train loss: ', 0.19696183407241416)
('Train loss: ', 0.19696180233423277)
('Train loss: ', 0.19696178036339629)
('Train loss: ', 0.19696176514665412)
Prediction accuracy: 0.725


## Multilayer Perceptrons

Below, you'll implement a forward pass through a 4x3x2 network, with sigmoid activation functions for both layers.

Things to do:

* Calculate the input to the hidden layer.
* Calculate the hidden layer output.
* Calculate the input to the output layer.
* Calculate the output of the network.


In [None]:
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(4)

weights_in_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_out = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))

In [None]:
# TODO: Make a forward pass through the network

hidden_layer_in = X
hidden_layer_out = sigmoid(weights_in_hidden.T.dot(hidden_layer_in))

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = hidden_layer_out
output_layer_out = sigmoid(weights_hidden_out.T.dot(output_layer_in))

print('Output-layer Output:')
print(output_layer_out)

## Backpropagation

Error at the output layer in the neuron $k$
$$\delta^0_k$$

Error at the hidden layer $h$ in the neuron $j$
$$\delta^h_j = \sum W_{jk} \delta^0_k f'(h_j)$$ 

Gradient descent step
$$\Delta_{ij} = \eta \delta_{output} V_{in}$$

### Example
* Real value: $y=1$
* Learning rate: $\eta = 0.5$
* Output of the hidden unit: $a$

![Network](data/backprop-network.png)

**Prediction (forwards)**

Hidden unit 
$$a = sigmoid(0.4*0.1 + (-0.2)*0.3) = 0.495$$

Output unit
$$ŷ = sigmoid( 0.1* a) = 0.512$$

**Calculating gradient errors (backwards)**

Output unit
$$\delta^0_0 = (y-ŷ)*f'(W · a)=(1-0.512)*0.512*(1-0.512)=0.122$$

Hidden unit
$$\delta^1_0 = W \delta^1_0f'(h) = 0.1 * 0.122 *0.495 *(1-0.495)=0.003$$


**Updating weights (backwards)**

Output unit
$$\Delta^0_0 = \eta \delta^0_0a = 0.5*0.122*0.495 = 0.0302$$

Hidden unit
$$\Delta^1_0 = \eta \delta^1_0 X = 0.5*0.003* [0.1, 0.3] = [0.00015, 0.00045]$$

### Exercise

Below, you'll implement the code to calculate one backpropagation update step for two sets of weights. I wrote the forward pass, your goal is to code the backward pass.

Things to do:
* Calculate the network error.
* Calculate the output layer error gradient.
* Use backpropagation to calculate the hidden layer error.
* Calculate the weight update steps.

In [2]:
import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

In [17]:
## Backwards pass
## TODO: Calculate error
error = target - output

# TODO: Calculate error gradient for output layer
del_err_output = error * output * (1-output)

# TODO: Calculate error gradient for hidden layer
del_err_hidden = weights_hidden_output * del_err_output * hidden_layer_output * (1-hidden_layer_output)

# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate * del_err_output * hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_o = np.outer(learnrate * del_err_hidden,x).T

print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_o)

Change in weights for hidden layer to output layer:
[ 0.00804047  0.00555918]
Change in weights for input layer to hidden layer:
[[  1.77005547e-04  -5.11178506e-04]
 [  3.54011093e-05  -1.02235701e-04]
 [ -7.08022187e-05   2.04471402e-04]]


### Implementation

Now you're going to implement the backprop algorithm for a network trained on the graduate school admission data. You should have everything you need from the previous exercises to complete this one.

Your goals here:
* Implement the forward pass.
* Implement the backpropagation algorithm.
* Update the weights.

**Algorithm (vectorial)**

Calculate the prediction 
$$Ŷ$$

Calculate the error gradient
$$\delta^0 = (Y-Ŷ)*f'(W*A)$$
Where $A$ is the input to the output unit.

Propagate the errors to the hidden layer and calculate gradient
$$\delta^h_j = \delta^0 * W_j * f'(h_j)$$

Update weights
$$ W_j = W_j + \eta \frac{\delta^h_j*a_i}{m} $$

Where m is the number of samples.

Repeat for $e$ epochs.

In [164]:
import numpy as np
import pandas as pd

admissions = pd.read_csv('data/binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

In [165]:
import numpy as np

np.random.seed(42)

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


# Hyperparameters
n_hidden = 3  # number of hidden units
epochs = 500
learnrate = 0.05

n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

In [167]:
for e in range(epochs):
    ## Forward pass ##
    # TODO: Calculate the output
    hidden_input = features.as_matrix().dot(weights_input_hidden)
    hidden_activations = sigmoid(hidden_input)
    output = sigmoid(hidden_activations.dot(weights_hidden_output))
    
    # Backward pass ##
    # TODO: Calculate the error
    error = targets.as_matrix()-output

    # TODO: Calculate error gradient in output unit
    output_error = error*output*(1-output)
    
    # TODO: propagate errors to hidden layer
    hidden_error = output_error.reshape(360,1).dot(weights_hidden_output.reshape(1,3))*hidden_activations*(1-hidden_activations)

    # TODO: Update weights
    weights_input_hidden += learnrate*features.as_matrix().T.dot(hidden_error)/n_records
    weights_hidden_output += learnrate*hidden_activations.T.dot(output_error)/n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_activations = sigmoid(np.dot(features, weights_input_hidden))
        out = sigmoid(np.dot(hidden_activations,
                             weights_hidden_output))
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

('Train loss: ', 0.24086081610559487)
('Train loss: ', 0.23855446529706201)
('Train loss: ', 0.23660964549959923)
('Train loss: ', 0.23495534274939658)
('Train loss: ', 0.23353501302739446)
('Train loss: ', 0.23230369096005926)
('Train loss: ', 0.23122561412251344)
('Train loss: ', 0.23027230822025849)
('Train loss: ', 0.22942106207352045)
('Train loss: ', 0.22865372168644621)
Prediction accuracy: 0.750


## Further Reads

* [Karpathy - Yes you should understand backprop (Text)](https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b)
* [Karpathy - CS231n Winter 2016: Lecture 5: Neural Networks Part 2  (Video)](https://www.youtube.com/watch?v=gYpoJMlgyXA)