<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** A neuron (also referred to as perceptron, node) tankes any number of inputs and produces an output. Essentially a neuron takes each input values, multiplies it by a weight, sumes all the products and passes the sum through an activation function which products the final value. Many neurons typically make up a single hidden layer of a neural network. 
- **Input Layer:** The entry point for data into the neural network. Typically input layers have one node for each of the features (columns) of the dataset.
- **Hidden Layer:** Layers between the input and output layers are referred to as hidden layers. These layers are made up of neurons which accept inputs and pass outputs onto the next layers.
- **Output Layer:** The final layer in a neural network. Typically the output layer outputs a vector of values in a format suitable for the type of problem to be addressed. The output value is typically modified by an 'activation function' to transform it into a format that makes sense. 
- **Activation:** Activation refers to how 'on' or 'off' a neuron is. This is determined by an activation functions which determines how much signal is passed onto the next layer. Activation functions take in a weighted sum of inputs (and possibly a bias value) form the previous layer and output an activation value. 
- **Backpropagation:** The process of updating weights in reverse order as the final step in each training epoch. Backpropogation works by calculating the loss for each training observation, adjusting weight to reduce the loss for each observation, and then averaging the adjustments across all observations. 


## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [1]:
import numpy as np

In [2]:
X = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]
])

y = [[1], [0], [0], [0]]

In [3]:
np.zeros(1 + X.shape[1])

array([0., 0., 0., 0.])

In [4]:
class Perceptron(object):
    def __init__(self, rate=0.01, niter=1000):
        self.rate = rate
        self.iter = niter
        
    def fit(self, X, y):
        # initialize weights as zero
        self.weight = np.zeros(1 + X.shape[1])
        
        # create list to stores error in 
        self.errors = []
        
        for i in range(self.iter):
            err = 0
            for i, target in zip(X, y):
                delta_w = self.rate * (target - self.predict(i))
                
                # update weight
                self.weight[1:] += delta_w * i
                self.weight[0] += delta_w
                
                # append error
                err += int(delta_w != 0.0)
            self.errors.append(err)
        
        return self
    
    def net_input(self, X):
        return np.dot(X, self.weight[1:]) + self.weight[0]
    
    def predict(self, X):
        return np.where(self.net_input(X) >= 0.0, 1, 0)

In [5]:
p = Perceptron()
p.fit(X, y)

<__main__.Perceptron at 0x107cd77f0>

In [6]:
import matplotlib.pyplot as plt

plt.plot(range(1, len(p.errors) +1), p.errors)
plt.xlabel('Iters')
plt.ylabel('Number of misclassifications')
plt.show()

<Figure size 640x480 with 1 Axes>

In [7]:
p.predict([0,1,1])

array(0)

In [8]:
p.predict([1,1,1])

array(1)

In [9]:
p.predict([4,4,4])

array(1)

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [55]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')

In [56]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [57]:
y = df['Outcome'].values
X = df.drop(columns='Outcome')

In [58]:
X = X.values

In [59]:
X.shape

(768, 8)

In [60]:
y = y.reshape((768,1))

In [63]:
class NeuralNetwork():
    def __init__(self, inputs=8, hidden_nodes=12, output_nodes=1):
        self.inputs = inputs
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes
        
        # Iniitalize Weights
        self.weights1 = np.random.randn(self.inputs, self.hidden_nodes)
        self.weights2 = np.random.randn(self.hidden_nodes, self.output_nodes)
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        # Sum weights and inputs
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Pass sum to activation function
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Sum input and weights 
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Pass sum through output activation function
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, output):
        self.o_error = y - output
        self.o_delta = self.o_error * self.sigmoidPrime(output) 

        self.z2_error = self.o_delta.dot(self.weights2.T) 
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)

        # Ajust input to hidden layer weights
        self.weights1 += X.T.dot(self.z2_delta) 
        # Adjust hidden to output layer weights
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y):
        output = self.feed_forward(X)
        self.backward(X, y, output)

In [64]:
nn = NeuralNetwork()

In [65]:
for i in range(10000):
    nn.train(X, y)

print('Results after Epoch 1000: \n')
print('Loss: \n', str(np.mean(np.square(y - nn.feed_forward(X)))))

Results after Epoch 1000: 

Loss: 
 0.3489583333332676


## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [23]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

In [24]:
import keras
from keras.models import Sequential
from keras.layers import Dense

In [25]:
model = Sequential()

model.add(Dense(16, input_dim=8, activation='sigmoid'))
model.add(Dense(16, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', 
             optimizer='adam',
             metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 16)                144       
_________________________________________________________________
dense_5 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 17        
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________


In [27]:
model.fit(X, y, 
          batch_size=20, 
          epochs=20, 
          verbose=1)

Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1a3ab41eb8>

## Tuning

In [49]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.constraints import maxnorm
from keras.layers import Dropout

In [50]:
def create_model(nodes=16, dropout_rate=0.0, weight_constraint=10, optimizer='adam', activation='sigmoid'):
    model = Sequential()
    model.add(Dense(nodes, input_dim=8, activation=activation, kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(nodes, activation=activation))
    model.add(Dense(1, activation=activation))

    model.compile(loss='binary_crossentropy', 
                 optimizer=optimizer,
                 metrics=['accuracy'])
    return model

In [51]:
model = KerasClassifier(build_fn=create_model, verbose=1)

In [52]:
param_grid = {
    'batch_size': [16, 32],
    'epochs': [25, 50],
    'nodes': [16, 32],
    'optimizer': ['adam', 'SGD'],
    'dropout_rate': [0.0,0.2],
    'weight_constraint':[0,3]
}

In [53]:
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, y)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoc

In [54]:
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7708333333333334 using {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 32, 'optimizer': 'adam', 'weight_constraint': 3}
Means: 0.6510416666666666, Stdev: 0.024773824987552668 with: {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 16, 'optimizer': 'adam', 'weight_constraint': 0}
Means: 0.75390625, Stdev: 0.04384754750125713 with: {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 16, 'optimizer': 'adam', 'weight_constraint': 3}
Means: 0.6510416666666666, Stdev: 0.024773824987552668 with: {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 16, 'optimizer': 'SGD', 'weight_constraint': 0}
Means: 0.6510416666666666, Stdev: 0.024773824987552668 with: {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 16, 'optimizer': 'SGD', 'weight_constraint': 3}
Means: 0.6510416666666666, Stdev: 0.024773824987552668 with: {'batch_size': 16, 'dropout_rate': 0.0, 'epochs': 25, 'nodes': 32, 'optimizer': 'adam', 'weight_constraint': 0}
Mea