<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:  A node in network consisting of a set of weights to be applied to inputs and optionally a transform (activation function) dictating output.
- **Input Layer:  The processed data layer - often an unweighted ingest of normalized/scaled data to the first hidden layer.
- **Hidden Layer:  A layer that lies between the input and output layers.
- **Output Layer:  A final set of nodes in the shape of desired output.  May use alternate activation functions to get explicity output ranges.
- **Activation:  Output transformation at the neuron level - f(x) where x = sum(weights*inputs). 
- **Backpropagation:  Propogation of errors from the output layer backward through the network calculating the partial dependence on weights at each layer/node.  The backbone of neural networks. 


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [43]:
## Majority Class

candy.ate.mean(), candy.describe()

(0.5,           chocolate         gummy           ate
 count  10000.000000  10000.000000  10000.000000
 mean       0.499100      0.499300      0.500000
 std        0.500024      0.500025      0.500025
 min        0.000000      0.000000      0.000000
 25%        0.000000      0.000000      0.000000
 50%        0.000000      0.000000      0.500000
 75%        1.000000      1.000000      1.000000
 max        1.000000      1.000000      1.000000)

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [31]:
# Start your candy perceptron here

X = candy[['chocolate', 'gummy']].values

y = candy['ate'].values.reshape(-1,1)

In [32]:
type(X), X.shape, y.shape

(numpy.ndarray, (10000, 2), (10000, 1))

In [90]:
# Custom implementation of flexible dense network with numpy
import pandas as pd
import numpy as np

def gen_random_matrix(shape):
    np.random.seed()
    return np.random.rand(shape[0], shape[1])

# A lot of thought needs to go setting up the nodes.  Creating their size programatically
#  makes a lot of assumptions about hidden layer size (could be random, arbitrary, etc.).
#  One option would be to add a 'auto' feature that created hidden layers of X.shape + C
#  weights. Another would be to create pass a distribution and have the layer be a generator 
#  of sorts whose parameters can be optimized.

example_network_description = (
    ('input', X),  # row 0 must be input
    ('hidden_1', (X.shape[1], 3), 'simple random'),  # hidden vectors must match input vec
    ('hidden_2', (3, 7), 'simple random'),
    ('output', (7,1), 'simple random'),  # final active row must be output vector. match last hidden vec
    ('target', y) # last row in description must be the target vector
)


class LayerFactory():
    def __init__(self):
        return
    
    def initialize_weights(self, shape, algorithm):
        """
        Lookup available weight formulas and generate pseudo-random numbers for initial weights
        of specified shape.
        
        """
        available_generators = {
            'simple random': gen_random_matrix,
        }
        
        return available_generators[algorithm](shape)
    
    def generate_layers(self, description):
        """
        Generate layers based on network description.
        
        ====Parameters====
        description: tuple or list object of layer descriptions ('name', shape)
        """
        layers = {}
        layers[0] = Layer()
        layers[0].activated_values = description[0][1]
        for count, row in enumerate(description):
            if row[0] == 'target':
                layers[count] = Layer()
                layers[count].activated_values = description[len(description)-1][1]
                
            elif row[0] != 'input':
                layers[count] = Layer()
                layers[count].weights = self.initialize_weights(shape=row[1], algorithm=row[2])
                layers[count].weighted_sum = 0
                layers[count].activated_values = 0
            
        
        return layers
    
    
class Layer():
    pass
    

class NeuralNetwork(LayerFactory):
    def __init__(self, description):
        # Set up Architecture of Neural Network
        self.description = description
        self.layers = self.generate_layers(description)
        
    def sigmoid(self, weighted_sum):
        return 1 / (1+np.exp(-weighted_sum))
    
    def sigmoidPrime(self, weighted_sum):
        return weighted_sum * (1 - weighted_sum)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        for i in range(1, len(self.layers)-1):
            # Weighted sum of inputs
            #  Check if first layer (required to use feed_forward method as Predict)
            if i == 1:
                self.layers[i].weighted_sum = np.dot(X, self.layers[i].weights)
                # Activated values (local outputs)
                self.layers[i].activated_values = self.sigmoid(self.layers[i].weighted_sum)
            else:
                self.layers[i].weighted_sum = np.dot(self.layers[i-1].activated_values, self.layers[i].weights)
                # Activated values (local outputs)
                self.layers[i].activated_values = self.sigmoid(self.layers[i].weighted_sum)

        return self.layers[len(self.layers)-2].activated_values
        
    def backward(self, X, y, net_output, learning_rate):
        """
        Backward propagate through the network
        """
        # Step 1: Calculate errors and delta shifts for each layer (backward)
        back_prop_pos = 0
        for i in range(len(self.layers)-2, 0, -1):
            # Error in local output
            #   Check if first backprop
            if back_prop_pos == 0:
                self.layers[i].error = y - net_output
                # Apply Derivative of Sigmoid to error
                self.layers[i].delta = self.layers[i].error * self.sigmoidPrime(net_output) * learning_rate
            else:
                self.layers[i].error = self.layers[i+1].delta.dot(self.layers[i+1].weights.T)
                # Apply Derivative of Sigmoid to error
                self.layers[i].delta = self.layers[i].error * self.sigmoidPrime(
                    self.layers[i].activated_values)
                
            back_prop_pos += 1
            
        # Step 2: Calculate adjustments and apply to each layer (forward)
        for i in range(1, len(self.layers)-1):
            self.layers[i].weights += self.layers[i-1].activated_values.T.dot(self.layers[i].delta)
        
    def train(self, X, y, learning_rate):
        net_output = self.feed_forward(X)
        self.backward(X, y, net_output, learning_rate)

In [49]:
# Describe single perceptron network
single_layer =(
    ('input', X),  # row 0 must be input
    ('output', (X.shape[1], 1), 'simple random'),  # shape of first layer of (input_dim, #nodes)
    ('target', y) # last row in description must be the target vector
)

nn_singlet = NeuralNetwork(single_layer)

# Number of Epochs / Iterations
for i in range(10000):
    if (i+1 in [1, 5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Weights: \n', nn_singlet.layers[1].weights)
        print("Loss: \n", str(np.mean(np.square(y - nn_singlet.feed_forward(X)))))
    nn_singlet.train(X, y, 0.1)

+---------EPOCH 1---------+
Weights: 
 [[0.47432446]
 [0.16792762]]
Loss: 
 0.2590575177989399
+---------EPOCH 5---------+
Weights: 
 [[-13.89552077]
 [ -8.49463178]]
Loss: 
 0.5487530406363103
+---------EPOCH 1000---------+
Weights: 
 [[-19.54241453]
 [  2.8912209 ]]
Loss: 
 0.32526107670787996
+---------EPOCH 2000---------+
Weights: 
 [[-19.54082341]
 [  2.89122091]]
Loss: 
 0.32526107670281657
+---------EPOCH 3000---------+
Weights: 
 [[-19.53922975]
 [  2.89122091]]
Loss: 
 0.3252610766977371
+---------EPOCH 4000---------+
Weights: 
 [[-19.53763355]
 [  2.89122091]]
Loss: 
 0.32526107669264137
+---------EPOCH 5000---------+
Weights: 
 [[-19.5360348 ]
 [  2.89122091]]
Loss: 
 0.3252610766875294
+---------EPOCH 6000---------+
Weights: 
 [[-19.53443349]
 [  2.89122091]]
Loss: 
 0.325261076682401
+---------EPOCH 7000---------+
Weights: 
 [[-19.53282961]
 [  2.89122092]]
Loss: 
 0.3252610766772561
+---------EPOCH 8000---------+
Weights: 
 [[-19.53122315]
 [  2.89122092]]
Loss: 
 0.32526

In [52]:
# Check actual accuracy
from sklearn.metrics import accuracy_score

accuracy_score(nn_singlet.feed_forward(X).round(), y)


0.7229

### No Better Than Majority

**A single Perceptron** in with only two weights and a sigmoid activation function is highly limited.  Translating these values back to binary via simple round can vary error significantly.  Simply adding more nodes to allow for cooperative interpretation of weights might be helpful, but another layer is required to transform the output into single probability.  

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [95]:
# Build a simple multi layer network with one node/input in the first hidden layer and an output layer to convert back to single probability of ate/not ate
multi_layer =(
    ('input', X),  # row 0 must be input
    ('hidden_1', (X.shape[1], 2), 'simple random'),  # shape of first layer of (input_dim, #nodes)
    ('hidden_2', (2, 2), 'simple random'),
    ('output', (2, 1), 'simple random'),
    ('target', y) # last row in description must be the target vector
)

nn_multi = NeuralNetwork(multi_layer)

# Number of Epochs / Iterations
for i in range(10000):
    if (i+1 in [1, 5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
#         print('Weights: \n', nn_multi.layers[1].weights)
        print("Loss: \n", str(np.mean(np.square(y - nn_multi.feed_forward(X)))))
    nn_multi.train(X, y, 0.01)

+---------EPOCH 1---------+
Loss: 
 0.262119387525705
+---------EPOCH 5---------+
Loss: 
 0.2537678148345916
+---------EPOCH 1000---------+
Loss: 
 0.11284325264785669
+---------EPOCH 2000---------+
Loss: 
 0.11016799791348786
+---------EPOCH 3000---------+
Loss: 
 0.10124714221423707
+---------EPOCH 4000---------+
Loss: 
 0.1011770248536761
+---------EPOCH 5000---------+
Loss: 
 0.10115736297390696
+---------EPOCH 6000---------+
Loss: 
 0.10114816527332042
+---------EPOCH 7000---------+
Loss: 
 0.10114284685639229
+---------EPOCH 8000---------+
Loss: 
 0.10113938474650397
+---------EPOCH 9000---------+
Loss: 
 0.10113695284348583
+---------EPOCH 10000---------+
Loss: 
 0.10113515142281249


In [102]:
compat = pd.DataFrame.from_dict({
    'y_pred': nn_multi.feed_forward(X).reshape(1,-1)[0],
    'y_test': y.reshape(1,-1)[0],
})
compat.head()

Unnamed: 0,y_pred,y_test
0,0.499946,1
1,0.94768,1
2,0.499946,1
3,0.056076,0
4,0.056416,0


In [103]:
# Check actual accuracy
from sklearn.metrics import accuracy_score

accuracy_score(nn_multi.feed_forward(X).round(), y)


0.7229

In [104]:
# We can give a little more confidence manually with a stop function at ~50%

def give_confidence(x):
    if x > 0.45:
        return 1
    return 0

compat.y_pred = compat.y_pred.apply(give_confidence)
accuracy_score(compat.y_pred, compat.y_test)

0.9458

**Boom!** ~95%.  For some reason, the network isn't able to push values further away from 0.5 for true positives.  Maybe more layers, bias optimization, or other techniques could help with that.  With a little confidence, this spunky network is doing ok!

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [6]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
171,48,1,1,110,229,0,1,168,0,1.0,0,0,3,0
254,59,1,3,160,273,0,0,125,0,0.0,2,0,2,0
64,58,1,2,140,211,1,0,165,0,0.0,2,0,2,1
126,47,1,0,112,204,0,1,143,0,0.1,2,0,2,1
6,56,0,1,140,294,0,0,153,0,1.3,1,0,2,1


### Preprocessing

Some of these variables are continuous, others categorical and already encoded.

Summary:
* Categorical: sex, cp, fbs, restecg, exang, slope, ca, thal
* Continuous: everything else


In [7]:
X = df.drop(columns='target')
y = np.array(df.target).reshape(-1,1)

In [8]:
X.shape

(303, 13)

In [9]:
from sklearn.preprocessing import MinMaxScaler

categorical_vars = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'ca', 'thal']
continuous_vars = list(set(X.columns) - set(categorical_vars))

scaler = MinMaxScaler()

X_transformed = X.copy()
X_transformed[continuous_vars] = scaler.fit_transform(X_transformed[continuous_vars]) 

X_transformed.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
171,0.395833,1,1,0.150943,0.23516,0,1,0.740458,0,0.16129,0,0,3
254,0.625,1,3,0.622642,0.335616,0,0,0.412214,0,0.0,2,0,2
64,0.604167,1,2,0.433962,0.194064,1,0,0.717557,0,0.0,2,0,2
126,0.375,1,0,0.169811,0.178082,0,1,0.549618,0,0.016129,2,0,2
6,0.5625,0,1,0.433962,0.383562,0,0,0.625954,0,0.209677,1,0,2


In [10]:
# Convert X_transformed to np.array
X_transformed = np.array(X_transformed)

### Define Network

In [44]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout


# Initialize WANDB
import wandb
from wandb.keras import WandbCallback
wandb.init(project="lambda-ds-424")


W&B Run: https://app.wandb.ai/vincent-a-brandon/lambda-ds-424/runs/onyatt3i

In [45]:
from sklearn.model_selection import RandomizedSearchCV

# Static Parameters
inputs = X_transformed.shape[1]
wandb.config.epochs = 100

# Define model
model = Sequential()
model.add(Dense(13, input_shape=(inputs,)))
model.add(Dense(26, activation='sigmoid'))
model.add(Dense(13, activation='relu'))
model.add(Dense(1))
# Compile Model
def mean_pred(y_true, y_pred):
    return keras.backend.mean(y_pred)

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', mean_pred])

# Fit Model

model.fit(X_transformed, y, 
          epochs=wandb.config.epochs, 
          callbacks=[WandbCallback()],
          verbose=1,
         )

Train on 303 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100


<tensorflow.python.keras.callbacks.History at 0x7f47345bc748>

[]

In [14]:
# Implementing randomsearchcv
def build_keras_base(hidden_layers = [13, 26, 26], dropout_rate = 0,
                     n_input = 13, n_class = 2):

    model = Sequential()   
    for index, layers in enumerate(hidden_layers):       
        if not index:
            # specify the input_dim to be the number of features for the first layer
            model.add(Dense(layers, input_dim=n_input, activation='sigmoid'))
        else:
            model.add(Dense(layers, activation='sigmoid'))
    
    model.add(Dense(n_class, activation='softmax'))
    
    # the loss for binary and muti-class classification is different 
    loss = 'binary_crossentropy'
    if n_class > 2:
        loss = 'categorical_crossentropy'
    
    model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])   
    return model

In [29]:

# pass in fixed parameters n_input and n_class
model_keras = keras.wrappers.scikit_learn.KerasClassifier(
    build_fn = build_keras_base,
    n_input = X_transformed.shape[1],
    n_class = 1,
    verbose = 0,
)

# random search's parameter:
# specify the options and store them inside the dictionary
hidden_layers_opts = [[13, 13, 13, 13], [32, 32, 32, 32, 32], [13, 26, 13], [26, 13, 19]]

some_dict = {'epochs': 10}

keras_param_options = {
    'hidden_layers': hidden_layers_opts,
    'epochs': wandb.config.epochs,
}

In [30]:
# code courtesy of: https://ethen8181.github.io/machine-learning/keras/nn_keras_hyperparameter_tuning.html
rs_model = RandomizedSearchCV(
    model_keras,
    param_distributions=keras_param_options,
    n_iter=3,
    cv=3,
    n_jobs=1,
    verbose=20
)

rs_model.fit(X=X_transformed, y=y)
#           epochs=wandb.config.epochs, 
#           callbacks=[WandbCallback()],)

AttributeError: 'RandomizedSearchCV' object has no attribute 'best_score'

In [31]:
print('Best score: {0}'.format(rs_model.best_score_))
print('parameters:')
for param, value in rs_model.best_params_.items():
    print('\t{}: {}'.format(param, value))

Best score: 0.5445544719696045
parameters:
	hidden_layers: [13, 26, 13]
	epochs: 50
