<a href="https://colab.research.google.com/github/mkirby1995/DS-Unit-4-Sprint-2-Neural-Networks/blob/master/LS_DS_Unit_4_Sprint_Challenge_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** Often referred to as a neuron, nodes are the basic component of a Neural Network and represent abstracted computational steps in the Neural Networks processing of input data. What a node does is it takes each of the input values, multiplies each of them by a weight, sums all of these products up, adds a bias term to this sum, and then passes the resulting sum through what is called an "activation function" this 'activated output' is the final value.
- **Input Layer:** The input layer of a neural network interacts directly with the data. We feed data in the form of a feature matrix into the input layer which in turn feeds the data to the next layer in the Neural Network. Typically there is a one to one relationship between the number of features our data possesses and the number of nodes contained by the input layer.
- **Hidden Layer:** Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value.
- **Output Layer:** The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context.
- **Activation:** In Neural Networks, each node has an activation function. Each node in a given layer typically has the same activation function. These activation functions are the biggest piece of neural networks that have been inspired by actual biology. The activation function decides whether a cell "fires" or not. Sometimes it is said that the cell is "activated" or not. In Artificial Neural Networks activation functions decide how much signal to pass onto the next layer. This is why they are sometimes referred to as transfer functions because they determine how much signal is transferred to the next layer.
- **Backpropagation:** Backpropagation is short for Backwards Propagation of Errors, and refers to a specific (rather calculus intensive) algorithm for how weights in a neural network are updated in reverse order at the end of each training epoch.


In [1]:
import pandas as pd
import numpy as np
np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow
from tensorflow import keras 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

from sklearn.preprocessing import normalize, StandardScaler
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

Using TensorFlow backend.


## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [2]:
x = np.array([[1, 1, 1],
              [1, 0, 1],
              [0, 1, 1],
              [0, 0, 1]])

y = np.array([1,0,0,0])

x, y

(array([[1, 1, 1],
        [1, 0, 1],
        [0, 1, 1],
        [0, 0, 1]]), array([1, 0, 0, 0]))

In [0]:
class Perceptron(object):
  def __init__(self, rate = .01, n_iter = 10):
    self.rate = rate
    self.n_iter = n_iter
    
  def fit(self, X, y):
    # Set weights
    self.weight = np.zeros(1 + X.shape[1])
    self.errors = []
    
    for i in range(self.n_iter):
      error = 0
      for xi, target in zip(X, y):
        delta_w = self.rate * (target - self.predict(xi))
        self.weight[1:] += delta_w * xi
        self.weight[0] += delta_w
        error += int(delta_w != 0.0)
      self.errors.append(error)
    return self
  
  def net_input(self, X):
    return np.dot(X, self.weight[1:]) + self.weight[0]
  
  def predict(self, X):
    return np.where(self.net_input(X) >= 0.0, 1, -1)

In [4]:
model = Perceptron(rate = .01, n_iter = 10000)

model = model.fit(x, y)

model.predict(x)

array([1, 1, 1, 1])

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [5]:
csv = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv'

df = pd.read_csv(csv)

df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.2 KB


In [7]:
df['target'].value_counts(normalize = True)

1    0.544554
0    0.455446
Name: target, dtype: float64

In [8]:
x = df.drop(columns = ['target']).values

x[0]

array([ 63. ,   1. ,   3. , 145. , 233. ,   1. ,   0. , 150. ,   0. ,
         2.3,   0. ,   0. ,   1. ])

In [9]:
y = np.array(df['target']).reshape(-1,1)

y[:10]

array([[1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1]])

In [0]:
class NeuralNetwork: 
    def __init__(self, inputs, hidden_nodes, output_nodes):
        # Set up Archietecture 
        self.inputs = inputs
        self.hiddenNodes = hidden_nodes
        self.outputNodes = output_nodes
        
        # Initial weights
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes)
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes)
    
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        # Weighted sum of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Acivations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, o):
        self.o_error = y - o #error
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T) 
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta) 
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) 
        
    def train(self, X, y, epochs = 10000):
      for i in range(epochs):
        o = self.feed_forward(X)
        self.backward(X, y, o)
      print("That shizz is trained. \n Loss =", str(np.mean(np.square(y - model.feed_forward(X)))))

In [11]:
model = NeuralNetwork(inputs = 13, hidden_nodes = 50, output_nodes = 1)

model.train(x, y)

  del sys.path[0]


That shizz is trained. 
 Loss = 0.5445544554455446


## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [0]:
no_of_features = len(x[0])

In [0]:
def create_model():
  model = Sequential()
  
  model.add(Dense(128, input_dim = no_of_features, activation = 'relu'))
  model.add(Dense(128, activation = 'relu'))
  model.add(Dense(1, activation = 'sigmoid'))
  
  model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
  return model

model = KerasClassifier(build_fn = create_model, verbose = 1)


In [14]:
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [10]}

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 2, cv = 3)
grid_result = grid.fit(x, y)

results = pd.DataFrame(grid_result.cv_results_)

results = results.sort_values(by = ['mean_test_score'], ascending = False)

W0719 16:55:57.299852 139991007377280 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0719 16:55:57.397415 139991007377280 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [15]:
results[['param_batch_size', 'mean_test_score']].head(3)

Unnamed: 0,param_batch_size,mean_test_score
1,20,0.762376
4,80,0.617162
2,40,0.514851


In [0]:
best_batch_size = results['param_batch_size'].tolist()[0]

In [17]:
param_grid = {'batch_size': [best_batch_size],
              'epochs': [500, 600, 700, 800, 900, 1000, 1100]}

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 2, cv = 3)
grid_result = grid.fit(x, y)

results = pd.DataFrame(grid_result.cv_results_).sort_values(by = ['mean_test_score'], ascending = False)



Epoch 1/600
Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 53/600
Epoch 54/600
Epoch 55/600
Epoch 56/600
Epoch 57/600
Epoch 58/600
Epoch 59/600
Epoch 60/600
Epoch 61/600
Epoch 62/600
Epoch 63/600
Epoch 64/600
Epoch 65/600
Epoch 66/600
Epoch 67/600
Epoch 68/600
Epoch 69/600
Epoch 70/600
Epoch 71/600
Epoch 72/600
Epoch 73/600
Epoch 74/600
Epoch 75/600
Epoch 76/600
Epoch 77/600
Epoch 78

In [18]:
results[['param_epochs', 'mean_test_score']].head(3)

Unnamed: 0,param_epochs,mean_test_score
1,600,0.69967
3,800,0.636964
5,1000,0.627063
