# Neural Networks Coding Challenge

Objectives:
  * Write a simple three layer network
  * Compute forward propagation for a new sample in three layer network
  * Compute backward propagation in the same network
  * Use MLPClassifier to train and test a dataset

### Background

Other than the MLPClassifier objective, you will be working with this neural net during this coding challenge:

![Simple Neural Net](https://www.lucidchart.com/publicSegments/view/a5b0773e-7165-450d-99fc-7089891e099a/image.png)

### 1. Write a simple three layer network

Create variables to store weights and biases for the above network. Initialize each with $0.5$.

In [0]:
#imports

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

In [0]:
# create weights
# np.random.seed(42)
W1,W2,W3 = 0.5, 0.5, 0.5

# create biases
# np.random.seed(42)
b1,b2,b3 = 0.5, 0.5, 0.5

# sigmoid function
def sigma(x):
    return 1 / (1+np.exp(-x))
  
# def NN(x1):
#     h1 = sigma(x1*w1 + b1*1)
#     h2 = sigma(x1*w2 + b2*1)
#     z = sigma(h1*w3 + h2*w4 + b3*1)
#     return z

### 2. Compute forward propagation for a new sample in three layer network

Write a function `feed_forward` that takes a new sample $x$ and calculates $\hat{y}$ via forward propagation.

In [0]:
x1 = 4
x2 = 0.5
x3 = 0.125
y1 = 0
y2 = 1
y3 = 1

In [0]:
# This is the forward propagation function
def feed_forward(a0):

    # Do the first Linear step 
    z1 = np.dot(a0,W1) + b1
    
    # Put it through the first activation function
    a1 = np.tanh(z1)
    
    # Second linear step
    z2 = np.dot(a1,W2) + b2
    
    # Put through second activation function
    a2 = np.tanh(z2)
    
    #Third linear step
    z3 = np.dot(a2, W3) + b3
    
    #For the Third linear activation function we use the sigmoid function
    a3 = sigma(z3)
    
    #Store all results in these values
    cache = {'a0':a0,'z1':z1,'a1':a1,'z2':z2,'a2':a2,'a3':a3,'z3':z3}
    yhat = a3
    return yhat, a0, a1, a2, a3
    
  
# TEST
y_hat1 = feed_forward(x1)
y_hat2 = feed_forward(x2)
y_hat3 = feed_forward(x3)

print("===y_hat1===")
print(y_hat1[0])
print("\n")

print("===y_hat2===")
print(y_hat2[0])
print("\n")

print("===y_hat3===")
print(y_hat3[0])


===y_hat1===
0.7066946552192437


===y_hat2===
0.6978063882895045


===y_hat3===
0.6940316729708891


### 3. Compute backward propagation for the same network

The backprop algorithm is derived from the goal of minimizing the error (or loss) function $\epsilon = (y - \hat{y})^2$.

$\epsilon = (y - \sigma(h_2+b_2))^2$

Via the chain rule, the derivative of the above is

$\frac{\partial \epsilon}{\partial \hat{y}} = 2(y-\hat{y})\sigma(h_2)$

Let $\alpha = 0.1$. Update the weights for $h_2$ and $h_1$ via back propagation so that $h_2$ = $h_2 + \alpha \frac{\partial \epsilon}{\partial \hat{y}}$ and $h_1 = h_1 + \alpha \frac{\partial \epsilon}{\partial h_2}$

Also, let $\sigma(x) = ReLU(x)$. As such, $\sigma'(x) = 0$ when $x \le 0$ and $\sigma'(x) = 1$ when $x \gt 0$.

Check Case1: of [Brian Dolhansky](http://briandolhansky.com/blog/2013/9/27/artificial-neural-networks-backpropagation-part-4) for a more detailed explanation of the values in the back propagation.


In [0]:
# This is the backward propagation function
def feed_forward_and_back_propagate(x,y):

    # derivative of tanh
    def dtanh(y):
        return 1 - y*y
  
  
    # Load forward propagation results
    a0 = feed_forward(x)[1]
    a1 = feed_forward(x)[2]
    a2 = feed_forward(x)[3]
    a3 = feed_forward(x)[4]
    
    
    # Get number of samples
    m = 1
    
    # Calculate loss derivative with respect to output
    dz3 = a3 - y

    # Calculate loss derivative with respect to second layer weights
    dW3 = 1/m*np.dot(a2.T,dz3) #dW2 = 1/m*(a1.T).dot(dz2) 
    
    # Calculate loss derivative with respect to second layer bias
    db3 = 1/m*np.sum(dz3, axis=0)
    
    # Calculate loss derivative with respect to first layer
    dz2 = np.multiply(np.dot(dz3, W3) ,dtanh(a2))
    
    # Calculate loss derivative with respect to first layer weights
    dW2 = 1/m*np.dot(a1.T, dz2)
    
    # Calculate loss derivative with respect to first layer bias
    db2 = 1/m*np.sum(dz2, axis=0)
    
    dz1 = np.multiply(np.dot(dz2, W2),dtanh(a1))
    
    dW1 = 1/m*np.dot(a0,dz1)
    
    db1 = 1/m*np.sum(dz1,axis=0)
    
    # forward again
    
    # Do the first Linear step 
    z1 = np.dot(x,dW1) + db1
    
    # Put it through the first activation function
    a1 = np.tanh(z1)
    
    # Second linear step
    z2 = np.dot(a1,dW2) + db2
    
    # Put through second activation function
    a2 = np.tanh(z2)
    
    #Third linear step
    z3 = np.dot(a2, dW3) + db3
    
    #For the Third linear activation function we use the sigmoid function
    a3 = sigma(z3)
    
   
    # Store gradients
    grads = {'dW3':dW3, 'db3':db3, 'dW2':dW2,'db2':db2,'dW1':dW1,'db1':db1}
    return a3, grads



# CODE
y_hat4 = feed_forward_and_back_propagate(x1,y1)
y_hat5 = feed_forward_and_back_propagate(x2,y2)
y_hat6 = feed_forward_and_back_propagate(x3,y3)

print(y_hat4)
print(y_hat5)
print(y_hat6)

(0.6876401335843211, {'dW3': 0.5362179769516296, 'db3': 0.7066946552192437, 'dW2': 0.1479078346638543, 'db2': 0.14991454608045088, 'dW1': 0.007973123184948928, 'db1': 0.001993280796237232})
(0.42904252993945663, {'dW3': -0.20360220447179556, 'db3': -0.3021936117104955, 'dW2': -0.052405227344898685, 'db2': -0.08250856298814166, 'dW1': -0.012305859435102911, 'db1': -0.024611718870205822})
(0.42834303709452126, {'dW3': -0.19522982701336145, 'db3': -0.3059683270291109, 'dW2': -0.04624097647812385, 'db2': -0.0906988189402449, 'dW1': -0.0041952364448069, 'db1': -0.0335618915584552})


### 4. Use MLPClassifier to train a dataset

`X` is now a small dataset. Create an MLPClassifier from sklearn and train it on the `X` dataset, with `y` as the targets.

In [0]:
import numpy as np
X = np.row_stack([x1,x2,x3])
y = np.row_stack([y1,y2,y3])

In [0]:
print(X)
print(y)

[[4.   ]
 [0.5  ]
 [0.125]]
[[0]
 [1]
 [1]]


In [0]:


# CODE
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(
                    hidden_layer_sizes=(3, 3),
                    activation='tanh',
                    solver='sgd',
                    alpha=1e-5,
                    batch_size=100, 
                    learning_rate='adaptive',
                    learning_rate_init=0.001,
                    max_iter=200,
                    shuffle=True,
                    random_state=42,
                    tol=1e-4 )

mlp.fit(X,y)

predictions = mlp.predict(X)

from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y,predictions))
print(classification_report(y,predictions))

[[1 0]
 [0 2]]
             precision    recall  f1-score   support

          0       1.00      1.00      1.00         1
          1       1.00      1.00      1.00         2

avg / total       1.00      1.00      1.00         3



  y = column_or_1d(y, warn=True)


In [0]:
# neural net training with keras sequential

import keras
from keras.models import Sequential
from keras.layers import Dense

# initializing the ANN
classifier = Sequential()

# add input layer and first hidden layer
classifier.add(Dense(output_dim = 3, init = 'uniform', activation = 'relu', input_dim=X.shape[1]))

# add 2nd hidden layer
classifier.add(Dense(output_dim = 3, init = 'uniform', activation = 'relu'))

# add output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# compile ANN
classifier.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])

# fit ANN to training set
classifier.fit(X, y, batch_size=10, nb_epoch=100)

# predictions
y_pred = classifier.predict(X)

# convert predictions into binary
y_pred = (y_pred > 0.5) * 1

  # Remove the CWD from sys.path while we load stuff.
  del sys.path[0]
  app.launch_new_instance()


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100


Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100


Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


In [0]:
print(y_pred)
print(confusion_matrix(y,y_pred))
print(classification_report(y,y_pred))

[[1]
 [1]
 [1]]
[[0 1]
 [0 2]]
             precision    recall  f1-score   support

          0       0.00      0.00      0.00         1
          1       0.67      1.00      0.80         2

avg / total       0.44      0.67      0.53         3



  'precision', 'predicted', average, warn_for)
