<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** 

An artificial neuron in a neural network is a function that takes the weighted sum of it's inputs and a bias and uses an activation function to produce an output

- **Input Layer:**

The input layer recieves the data. It is classified as 'visible' because it is the only layer that interacts directly with the input data. The input layer interacts with a subsequent hidden layer (if one exists). Input layers are typically represented with one cell (neuron) for each feature/class of the input data, and weights applied to this layer directly effect only that feature.

- **Hidden Layer:**

A hidden layer is accessed through the input layer. They are 'inside' the network, and do not directly interact with the data. They are also composed of neurons that take input from the result of the input layer

- **Output Layer:**

The output layer returns values or whatever other information is needed for the application of the network. Generally, the output will be in the format of a vector or list of values. There will usually be a 'cell' or 'node' for each output class. If the network is used for a regression or classification problem with only one class, there will be one output node.

- **Activation:**

Activation is a function applied to the weighted sum of inputs and the bias to 'normalize' the output of a neuron, or convert it to a useful scale for the input of the next layer. The activation function controlls the way the neuron responds to inputs. For instance, if using a sigmoid function, unless the sum is very near the center of possible outcomes, it will push the output very close to an extreme (0 or 1; or -1 or 1)

- **Backpropagation:**

Backpropogation is how a neural network learns. Backpropogation compares the 'ground truth' and adjusts weights and biases (using the gradient of the error for instance) and adjusts the weights based on the scale of the error and in which direction the error was made from the ground truth. Through repetitions, this will optimize the network to at least a local minimum error. The learning rate can be adjusted to balance the speed and accuracy of this process


## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [31]:
import numpy as np

In [32]:
gate_table = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]
])
y = [[1],[0],[0],[0]]

In [33]:
def sigmoid(x):
    return(1/(1 + np.exp(-x)))

def sigmoid_ddx(x):
    sx = sigmoid(x)
    return(sx * (1-sx))

In [34]:
weights = (2 * np.random.random((3,1)) - 1)


for i in range(10000):
    weighted_sum = np.dot(gate_table, weights) - 1.5
    
    activated_output = sigmoid(weighted_sum)
    
    error = y - activated_output
    
    adjustments = error * sigmoid_ddx(activated_output)
    
    weights += np.dot(gate_table.T, adjustments)
    
print("Weights after training")
print(weights)

print("\nRounded output after training :: Actual")
for a,b in zip(activated_output, y):
    print(round(a[0]),'::', b[0])
print('Actual outputs:')
print(activated_output)

Weights after training
[[ 11.84179679]
 [ 11.84179679]
 [-16.55100444]]

Rounded output after training :: Actual
1.0 :: 1
0.0 :: 0
0.0 :: 0
0.0 :: 0
Actual outputs:
[[9.96433113e-01]
 [2.00699577e-03]
 [2.00699577e-03]
 [1.44770181e-08]]


## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [35]:
import pandas as pd

hd_url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv'

In [36]:
heartd = pd.read_csv(hd_url)

In [37]:
heartd.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [38]:
# Thanks Ryan!
heartd.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [39]:
heartd.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal          int64
target        int64
dtype: object

In [40]:
for col in heartd.columns:
    if np.amax(heartd[col].values) > 1:
        print(col)

age
cp
trestbps
chol
restecg
thalach
oldpeak
slope
ca
thal


In [41]:
hd_norm = pd.DataFrame()

for col in heartd.columns:
    if np.amax(heartd[col].values) > 1:
        hd_norm[col] = heartd[col] / np.amax(heartd[col])
    else:
        hd_norm[col] = heartd[col]

In [42]:
hd_norm.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,0.706056,0.683168,0.322332,0.658119,0.436638,0.148515,0.264026,0.740826,0.326733,0.167678,0.69967,0.182343,0.771177,0.544554
std,0.117949,0.466011,0.344017,0.087691,0.091898,0.356198,0.26293,0.113392,0.469794,0.18727,0.308113,0.255652,0.204092,0.498835
min,0.376623,0.0,0.0,0.47,0.223404,0.0,0.0,0.351485,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.616883,0.0,0.0,0.6,0.374113,0.0,0.0,0.660891,0.0,0.0,0.5,0.0,0.666667,0.0
50%,0.714286,1.0,0.333333,0.65,0.425532,0.0,0.5,0.757426,0.0,0.129032,0.5,0.0,0.666667,1.0
75%,0.792208,1.0,0.666667,0.7,0.486702,0.0,0.5,0.821782,1.0,0.258065,1.0,0.25,1.0,1.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [54]:
class MLPerceptron:
    def __init__(self, 
                 activator='sigmoid', 
                 n_input=3, 
                 n_hidden=4, 
               #  n_hidden2=2, 
                 n_output=1):
        self.input = n_input
        self.hidden_nodes = n_hidden
#         self.hidden_nodes2 = n_hidden2
        self.output_nodes = n_output
        self.activator = activator
                
        np.random.seed(4242)
        
        self.weights1 = np.random.randn(
            self.input,
            self.hidden_nodes
        )
        self.weights2 = np.random.randn(
            self.hidden_nodes,
            self.output_nodes
        )
        
#         self.weights3 = np.random.randn(
#             self.hidden_nodes,
#             self.hidden_nodes2
#         )
        
#         self.weights4 = np.random.randn(
#             self.hidden_nodes2,
#             self.output_nodes
#         )

    def sigmoid(self, x):
        return(1 / (1 + np.exp(-x)))

    def sigmoidPrime(self, x):
        return(x * (1 - x))
    # Set up for multiple activation functions
    def activate(self, x):
        if self.activator == 'sigmoid':
            return(self.sigmoid(x))
    def grad_a(self, x):
        if self.activator == 'sigmoid':
            return(self.sigmoidPrime(x))

    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.weights1)
        self.activated_hidden = self.activate(self.hidden_sum)
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        self.activated_output = self.activate(self.output_sum)
        return self.activated_output

    def backward(self, X, y, o):
        self.o_error = y - o
        self.o_delta = self.o_error * self.grad_a(o)

        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.grad_a(self.activated_hidden)

        self.weights1 += X.T.dot(self.z2_delta)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
    
#     def feed_forward2(self, X):
#         self.hidden_sum = np.dot(X, self.weights3)
#         self.activated_hidden = self.activate(self.hidden_sum)
#         self.output_sum = np.dot(self.activated_hidden, self.weights4)
#         self.activated_output = self.activate(self.output_sum)
#         return self.activated_output

#     def backward2(self, X, y, o):
#         self.o_error = y - o
#         self.o_delta = self.o_error * self.grad_a(o)

#         self.z2_error = self.o_delta.dot(self.weights4.T)
#         self.z2_delta = self.z2_error * self.grad_a(self.activated_hidden)

#         self.weights3 += X.T.dot(self.z2_delta)
#         self.weights4 += self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y,):
        o = self.feed_forward(X)
        self.backward(X, y, o)


In [55]:
mlp_heart = MLPerceptron(activator='sigmoid', n_input=13, n_hidden=4, n_output=1)


In [56]:
X = hd_norm[['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']].values

y = hd_norm[['target']].values

In [57]:
hd_norm.head(10)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,0.818182,1,1.0,0.725,0.413121,1,0.0,0.742574,0,0.370968,0.0,0.0,0.333333,1
1,0.480519,1,0.666667,0.65,0.443262,0,0.5,0.925743,0,0.564516,0.0,0.0,0.666667,1
2,0.532468,0,0.333333,0.65,0.361702,0,0.0,0.851485,0,0.225806,1.0,0.0,0.666667,1
3,0.727273,1,0.333333,0.6,0.41844,0,0.5,0.881188,0,0.129032,1.0,0.0,0.666667,1
4,0.74026,0,0.0,0.6,0.62766,0,0.5,0.806931,1,0.096774,1.0,0.0,0.666667,1
5,0.74026,1,0.0,0.7,0.340426,0,0.5,0.732673,0,0.064516,0.5,0.0,0.333333,1
6,0.727273,0,0.333333,0.7,0.521277,0,0.0,0.757426,0,0.209677,0.5,0.0,0.666667,1
7,0.571429,1,0.333333,0.6,0.466312,0,0.5,0.856436,0,0.0,1.0,0.0,1.0,1
8,0.675325,1,0.666667,0.86,0.352837,1,0.5,0.80198,0,0.080645,1.0,0.0,1.0,1
9,0.74026,1,0.666667,0.75,0.297872,0,0.5,0.861386,0,0.258065,1.0,0.0,0.666667,1


In [58]:
epochs = 100000
for ep in range(epochs):
    if (ep+1 in [1,5,10]) or ((ep+1) % int(epochs / 10) == 0):
        
        print('+' + '---' * 3 + f'EPOCH {ep+1}' + '---' * 3 + '+')
#        print('Input: \n', X[:20])
#        print('Predicted Output:   Actual Output: \n') 
#         for pr, ta in zip(mNN.feed_forward(X), y):
#             print(convert_back(pr), "       ", convert_back(ta))
        if ((ep+1 == epochs / 4) or (ep+1 == epochs)):
            for pr, ta in zip(mlp_heart.feed_forward(X), y):
                print(pr[:10], ta[:10])
                print(pr[100::110], ta[100::110])
#                 print(convert_back(pr), "       ", convert_back(ta))
        print('Loss: \n', str(np.mean(np.square(y - mlp_heart.feed_forward(X)))))
    mlp_heart.train(X,y)

+---------EPOCH 1---------+
Loss: 
 0.2947568670381197
+---------EPOCH 5---------+
Loss: 
 0.5445518436292864
+---------EPOCH 10---------+
Loss: 
 0.5445517925987076
+---------EPOCH 10000---------+
Loss: 
 0.18116323699562817
+---------EPOCH 20000---------+
Loss: 
 0.18091533289621178
+---------EPOCH 30000---------+
Loss: 
 0.18083675623691584
+---------EPOCH 40000---------+
Loss: 
 0.18079888631786334
+---------EPOCH 50000---------+
Loss: 
 0.18077670637398618
+---------EPOCH 60000---------+
Loss: 
 0.18076216818951404
+---------EPOCH 70000---------+
Loss: 
 0.1807519133839718
+---------EPOCH 80000---------+
Loss: 
 0.18074429650834892
+---------EPOCH 90000---------+
Loss: 
 0.18073841785589445
+---------EPOCH 100000---------+
[0.49999977] [1]
[] []
[0.49998068] [1]
[] []
[0.49999996] [1]
[] []
[0.5] [1]
[] []
[0.5] [1]
[] []
[0.49999993] [1]
[] []
[0.49999989] [1]
[] []
[0.5] [1]
[] []
[0.5] [1]
[] []
[0.5] [1]
[] []
[0.5] [1]
[] []
[0.49999999] [1]
[] []
[0.5] [1]
[] []
[0.5] [1]
[]

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [59]:
import keras
from keras.models import Sequential
from keras.layers import Dense

from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
np.random.seed(4200)

In [60]:
import matplotlib.pyplot as plt

In [61]:
X_gscv = hd_norm[['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']].values

y_gscv = hd_norm[['target']].values

X_ker = hd_norm[['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']].values

y_ker = hd_norm[['target']].values

In [62]:
X_ker[0]

array([0.81818182, 1.        , 1.        , 0.725     , 0.41312057,
       1.        , 0.        , 0.74257426, 0.        , 0.37096774,
       0.        , 0.        , 0.33333333])

In [63]:
inputs_ker = X_ker.shape[1]
epochs_ker = 100
batch_ker = 10

mod_ker = Sequential()
mod_ker.add(Dense(64, activation='relu', input_shape=(inputs_ker,)))
mod_ker.add(Dense(64, activation='relu'))
mod_ker.add(Dense(1))

# Compile 
mod_ker.compile(optimizer='adam', loss='mse', metrics=['mse'])

# Fit 
mod_ker.fit(X_ker, y_ker, validation_split=0.2, epochs=epochs_ker, batch_size=batch_ker)

Train on 242 samples, validate on 61 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Ep

<keras.callbacks.History at 0x7ff68c4252b0>

In [None]:
type(X_ker)

In [None]:
type(y_ker[0][0])

In [25]:
inputs_gscv = X_gscv.shape[1]
epochs_gscv = 1000

In [28]:
def mk_keras_model():
    mod_gscv = Sequential()
    mod_gscv.add(Dense(64, input_dim=13, activation='relu'))
    mod_gscv.add(Dense(32, activation='sigmoid'))
    mod_gscv.add(Dense(1, activation='tanh'))
    mod_gscv.compile(loss='binary_crossentropy', optimizer='adam', metrics=['mse', 'accuracy'])
    return mod_gscv
    
gscv_model = KerasClassifier(build_fn=mk_keras_model, verbose=5)


In [29]:
gscv_pgrid = {'batch_size': [100, 120, 160, 200],
              'epochs': [epochs_gscv]}

gscv = GridSearchCV(estimator=gscv_model, param_grid=gscv_pgrid, n_jobs=4)
gscv_results = gscv.fit(X_gscv, y_gscv)

 



Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

In [30]:
#print best score
print("Best score: {}".format(gscv_result.best_score_))


gscv_std_dev = gscv_results.cv_results_['std_test_score']
gscv_mean_tscore = gscv_results.cv_results_['mean_test_score']
gscv_parameters = gscv_results.cv_results_['params']
for mean, stdev, param in zip(gscv_std_dev, gscv_mean_tscore, gscv_parameters):
    print(f"Means: {gscv_mean_tscore}, Stdev: {gscv_std_dev} with: {gscv_parameters}") 

Best score: 0.7062706215546863
Means: [0.57755776 0.21122113 0.52475246 0.61056105], Stdev: [0.42272577 0.2384931  0.22417963 0.14149143] with: [{'batch_size': 100, 'epochs': 1000}, {'batch_size': 120, 'epochs': 1000}, {'batch_size': 160, 'epochs': 1000}, {'batch_size': 200, 'epochs': 1000}]
Means: [0.57755776 0.21122113 0.52475246 0.61056105], Stdev: [0.42272577 0.2384931  0.22417963 0.14149143] with: [{'batch_size': 100, 'epochs': 1000}, {'batch_size': 120, 'epochs': 1000}, {'batch_size': 160, 'epochs': 1000}, {'batch_size': 200, 'epochs': 1000}]
Means: [0.57755776 0.21122113 0.52475246 0.61056105], Stdev: [0.42272577 0.2384931  0.22417963 0.14149143] with: [{'batch_size': 100, 'epochs': 1000}, {'batch_size': 120, 'epochs': 1000}, {'batch_size': 160, 'epochs': 1000}, {'batch_size': 200, 'epochs': 1000}]
Means: [0.57755776 0.21122113 0.52475246 0.61056105], Stdev: [0.42272577 0.2384931  0.22417963 0.14149143] with: [{'batch_size': 100, 'epochs': 1000}, {'batch_size': 120, 'epochs': 10