<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** node in a neural network with associated activation and connecting weights
- **Input Layer:** layer to a network where values are input
- **Hidden Layer:** layer between the input and output layers where all the magic happens
- **Output Layer:** final layer of a network where results can he interpreted from
- **Activation:** function used to calculate the output value of a neuron
- **Backpropagation:** Backpropagation is short for "Backwards Propagation of errors" and refers to a specific (rather calculus intensive) algorithm for how weights in a neural network are updated in reverse order at the end of each training epoch. 


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [3]:
candy.shape

(10000, 3)

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. You will not be able to achieve more than ~50% with the simple perceptron. Explain why you could not achieve a higher accuracy with the *simple perceptron* architecture, because it's possible to achieve ~95% accuracy on this dataset. Provide your answer in markdown (and *optional* data anlysis code) after your perceptron implementation. 

In [4]:
class Perceptron(object):
    
    def __init__(self, niter = 1000):
        self.niter = niter
        np.random.seed(0)
        
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        sx = self.sigmoid(x)
        return sx * (1-sx)

    def fit(self, X, y):
        inputs = X
        correct_outputs = y.reshape(-1,1)
        weights = 2 * np.random.random((X.shape[1],1)) - 1
        
        for i in range(self.niter):
            weighted_sum = np.dot(inputs, weights)
            
            activated_output = self.sigmoid(weighted_sum)
            
            error = correct_outputs - activated_output
            
            adjustments = error * self.sigmoid_derivative(activated_output)
            
            weights += np.dot(inputs.T, adjustments)
            
        self.weights = weights

    def predict(self, X):
        weighted_sum = np.dot(X, self.weights)
        predictions = np.round(self.sigmoid(weighted_sum))
        return predictions

In [5]:
X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

per = Perceptron()

per.fit(X, y)

accuracy_score(y, per.predict(X))

0.5

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [6]:
class NeuralNetwork:
    def __init__(self, inputs, hiddenNodes, outputNodes):
        self.inputs = inputs
        self.hiddenNodes = hiddenNodes
        self.outputNodes = outputNodes
        
        np.random.seed(0)
        
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes)
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        
        self.hidden_sum = np.dot(X, self.weights1)
        
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, o):
        self.o_error = y - o
        
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
        self.weights1 += X.T.dot(self.z2_delta)
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)
        
    def predict(self, X):
        predictions = np.round(self.feed_forward(X) * 2)
        return np.round(predictions)

In [7]:
nn = NeuralNetwork(2, 3, 1)

for i in range(1000):
    nn.train(X, y.reshape(-1, 1))

In [8]:
accuracy_score(y, nn.predict(X))

0.5

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [10]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
250,51,1,0,140,298,0,1,122,1,4.2,1,3,3,0
256,58,1,0,128,259,0,0,130,1,3.0,1,2,3,0
15,50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
65,35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
213,61,0,0,145,307,0,0,146,1,1.0,1,0,3,0


In [19]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import Adam
import random

In [15]:
X = df.drop(columns = 'target').to_numpy()
y = df['target'].to_numpy().reshape(-1, 1)

Scaler  = MinMaxScaler()
X = Scaler.fit_transform(X)

def create_model(learning_rate=0.001, dropout_rate=0.2, layer1_size = 30, activation1='relu'):
    adam = Adam(learning_rate=learning_rate)
    
    model = Sequential()
    model.add(Dense(layer1_size, input_dim=13, activation=activation1))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation='sigmoid'))
    
    model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

In [16]:
baseline = create_model()
baseline.fit(X, y, epochs=100, validation_split=0.2, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x1a2b4890b8>

In [17]:
print(f"Accuracy: {accuracy_score(np.round(baseline.predict(X)),y)}")

Accuracy: 0.8415841584158416


In [None]:
param_grid = {'layer1_size': [10,15,20,25,30,35,40,45,50,55,60,65,70,75,80]
             ,'activation1' : ['relu', 'softmax', 'selu', 'tanh', 'sigmoid', 'hard_sigmoid', 'exponential', 'linear', 'elu']
             ,'batch_size': [10, 20, 40, 60, 80, 100]
             ,'learning_rate' : [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]
             }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=4, cv=5)
grid_result = grid.fit(X, y)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")