# Lesson 9 Assignment - Wine Neural Network

## Author - Kenji Oman

## Instructions
For this assignment you will start from the perceptron neural network notebook (Simple Perceptron Neural Network.ipynb) and modify the python code to make it into a multi-layer neural network. To test your system, use the RedWhiteWine.csv file with the goal of building a red or white wine classifier. Use all the features in the dataset, allowing the network to decide how to build the internal weighting system.

## Tasks
1. Use the provided RedWhiteWine.csv file. Include ALL the features with “Class” being your output vector
2. Use the provided Simple Perceptron Neural Network notebook (copied below) to develop a multi-layer feed-forward/backpropagation neural network
4. Be able to adjust the following between experiments:
<ul>
<li>Learning Rate
<li>Number of epochs
<li>Depth of architecture—number of hidden layers between the input and output layers
<li>Number of nodes in a hidden layer—width of the hidden layers
<li>(optional) Momentum
    </ul>
5. Determine what the best neural network structure and hyperparameter settings results in the
best predictive capability

# Import data and do a train/ validate/ test split

In [1]:
# Data Set
#URL = "https://library.startlearninglabs.uw.edu/DATASCI420/Datasets/RedWhiteWine.csv"
URL = "RedWhiteWine.csv"

In [2]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

In [3]:
# Import data, and take a look
df = pd.read_csv(URL)
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,Class
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1


In [4]:
# Now, split into 3
temp_df, test_df = train_test_split(df, test_size=0.15, stratify=df.Class, random_state=0)
train_df, validate_df = train_test_split(temp_df, test_size=0.2, stratify=temp_df.Class, random_state=1)
train_df.shape, validate_df.shape, test_df.shape

((4417, 13), (1105, 13), (975, 13))

The data looks like it is in shape for running a perceptron model. Let's now define our perceptron class.

# Perceptron Class

In [5]:
class PerceptronNN(object):
    def __init__(self, learning_rate=0.01, epochs=100, depth=0, hidden_width=12):
        """
        learning_rate {float} = the learning rate, default=0.01
        epochs {int} = the number of training epochs, default=100
        depth {int} = the number of hidden layers, default=0
        hidden_width {int} = number of nodes within a hidden layer, default=12
        """
        
        # Set the internal parameters
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.depth = depth
        self.hidden_width = hidden_width
        
    # Define the sigmoid function
    def sigmoid(self, x):
        x = np.clip(x, -500, 500)
        if x.any()>=0:
            return 1/(1 + np.exp(-x))
        else:
            return np.exp(x)/(1 + np.exp(x))
        
    # Initialize parameters for the network
    def init_parameters(self, n_layers, n_obs, std=1e-1, random=True):
        if(random):
            return(np.random.random([n_layers, n_obs])*std)
        else:
            return(np.zeros([n_layers, n_obs]))
        
    # Define forward propogation for one layer
    def one_fwd_prop(self, W1, bias, X):
        Z1 = np.dot(W1,X) + bias # dot product of the weights and X + bias
        A1 = self.sigmoid(Z1)  # Uses sigmoid to create a predicted vector

        return(A1)
    
    def full_fwd_prop(self, W_l, B_l, X):
        """
        Function to run through all the forward propagation steps for a given epoch.
        
        Args:
        W_l = list of weight vectors
        B_l = list of bias vectors
        X = input vector
        """
        # Set the list of activations, using the input vector as the first
        # input to forward prop
        A_l = [self.one_fwd_prop(W_l[0], B_l[0], X)]

        # And for each of the hidden layers
        for layer in range(self.depth):
            # Calculate its activation
            A_l.append(self.one_fwd_prop(W_l[layer+1], B_l[layer+1], A_l[-1]))
            
        return A_l
    
    # And, back propagation for one layer
    def one_back_prop(self, A1, W1, bias, X, Y):
        m = np.shape(X)[1] # used the calculate the cost by the number of inputs -1/m

        # Cross entropy loss function
        cost = (-1/m)*np.sum(Y*np.log(A1) + (1-Y)*np.log(1-A1)) # cost of error
        dZ1 = A1 - Y                                            # subtract actual from pred weights
        dW1 = (1/m) * np.dot(dZ1, X.T)                          # calc new weight vector
        dBias = (1/m) * np.sum(dZ1, axis = 1, keepdims = True)  # calc new bias vector

        grads ={"dW1": dW1, "dB1":dBias} # Weight and bias vectors after backprop

        return(grads,cost)
    
    def full_back_prop(self, A_l, W_l, B_l, X, Y):
        """Run through all the back propogation steps, calculating
        the new set of weights/ biases
        
        Args:
        A_l = list of activation layers
        W_l = list of weights
        B_l = list of biases
        X = input matrix
        Y = prediction target
        """
        
        # First, do for the final layer
        # If we have hidden layers
        if self.depth != 0:
            grads, cost = self.one_back_prop(A_l[-1], W_l[-1], B_l[-1], A_l[-2], Y)
            W_l[-1] -= self.learning_rate*grads["dW1"]    # update weight vector LR*gradient*[BP weights]
            B_l[-1] -= self.learning_rate*grads["dB1"]    # update bias LR*gradient[BP bias]

            # Now, do for remaining layers
            for layer in range(self.depth -2, 0, -1):
                grads, temp_cost = self.one_back_prop(A_l[layer], W_l[layer], B_l[layer], A_l[layer - 1], A_l[layer])
                W_l[layer] -= self.learning_rate*grads["dW1"]    # update weight vector LR*gradient*[BP weights]
                B_l[layer] -= self.learning_rate*grads["dB1"]    # update bias LR*gradient[BP bias]

            # And, do for the very first layer
            grads, temp_cost = self.one_back_prop(A_l[0], W_l[0], B_l[0], X, A_l[1])
        else:
            # If we don't have any layers
            grads, cost = self.one_back_prop(A_l[-1], W_l[-1], B_l[-1], X, Y)
        W_l[0] -= self.learning_rate*grads["dW1"]    # update weight vector LR*gradient*[BP weights]
        B_l[0] -= self.learning_rate*grads["dB1"]    # update bias LR*gradient[BP bias]
        
        return W_l, B_l, cost
        
    
    # And finally, gradient descent to run it all
    def run_grad_desc(self, X, Y):
        
        # To make reproducible, set random seed
        np.random.seed(12345)
        
        # Transpose the X/ Y to make them consistent with the rest of the example
        # code.
        X = X.T
        Y = Y.T
        
        # Grab the dimensionality of the X vector (transposed, so
        # we have (num_features, num_observations))
        n_features, m_obs = np.shape(X)

        # Initialize weights for each initial layer
        # If we don't have any hidden layers
        if self.depth == 0:
            W1 = self.init_parameters(1, n_features, True)
            B1 = self.init_parameters(1, 1, True)
            W_h = []
            B_h = []
        else:
            # Otherwise, we need to initialize going to the hidden layers
            W1 = self.init_parameters(self.hidden_width, n_features, True)
            B1 = self.init_parameters(self.hidden_width, 1, True)
        
            # And the parameters for each hidden layer
            W_h = [self.init_parameters(self.hidden_width, self.hidden_width, True) for i in range(0, self.depth - 1)]
            B_h = [self.init_parameters(self.hidden_width, 1, True) for i in range(0, self.depth - 1)]
            
            # And, exiting the hidden layer to make the prediction
            W_h.append(self.init_parameters(1, self.hidden_width, True))
            B_h.append(self.init_parameters(1, 1, True))
        
        # Combine these two into one list to make it easier to manage
        W_l = [W1] + W_h
        B_l = [B1] + B_h

        loss_array = np.ones([self.epochs])*np.nan # resets the loss_array to NaNs

        for i in np.arange(self.epochs):
            
            # Go through all the forward propogations
            A_l = self.full_fwd_prop(W_l, B_l, X)
            
            # Now that we calculated the activation layers, let's do
            # back propagation
            W_l, B_l, cost = self.full_back_prop(A_l, W_l, B_l, X, Y)

            # also, store the loss we had from this epoch
            loss_array[i] = cost                    # loss array gets cross ent values

        # set the parameters (weights/ biases we calculated)
        parameter = {"W_l": W_l, "B_l": B_l}

        return(parameter,loss_array)

## Try running the model once

In [6]:
# Define the model
pp_nn = PerceptronNN(depth=2, hidden_width=1)

# Run it
params, loss = pp_nn.run_grad_desc(train_df.drop(columns='Class').values, train_df.Class.values)

# And test it
temp = pp_nn.full_fwd_prop(params['W_l'], params['B_l'], validate_df.drop(columns='Class').T)
((pd.Series(temp[-1][0]) > 0.5).astype(int) == validate_df.Class.reset_index(drop=True)).sum() / validate_df.Class.shape[0]



0.24615384615384617

So, it looks like with just a quick run, we got an accuracy of 24.6%.  Now, let's try searching for an optimum set of parameters.

# Gridsearch through hyperparameters

In [7]:
# Initialize a list of sets of parameters we want to try
input_params = []
# Now, set up the kinds of parameter values we want to try
for learn in [0.001, 0.01, 0.1]:
    for epoch in [100, 500]:#, 1_000]:
        for depth in [0, 1, 2, 3]:#, 4, 5]:
            if depth != 0:
                for width in [1, 3, 6, 9, 12, 15]:
                    input_params.append({
                        'learning_rate': learn,
                        'epochs': epoch,
                        'depth': depth,
                        'hidden_width': width
                    })
            else:
                input_params.append({
                    'learning_rate': learn,
                    'epochs': epoch,
                    'depth': depth
                })
input_params[:5]

[{'learning_rate': 0.001, 'epochs': 100, 'depth': 0},
 {'learning_rate': 0.001, 'epochs': 100, 'depth': 1, 'hidden_width': 1},
 {'learning_rate': 0.001, 'epochs': 100, 'depth': 1, 'hidden_width': 3},
 {'learning_rate': 0.001, 'epochs': 100, 'depth': 1, 'hidden_width': 6},
 {'learning_rate': 0.001, 'epochs': 100, 'depth': 1, 'hidden_width': 9}]

In [8]:
# Now, run our gridsearch, and store the accuracy values
results = {'hyperparameters': [], 'weights': [], 'loss': [], 'validation_accuracy': []}
for settings in input_params:
    # Store the hyperparameters
    results['hyperparameters'].append(settings)
    
    # Define our model
    pp_nn = PerceptronNN(**settings)

    # Run it
    weights, loss = pp_nn.run_grad_desc(train_df.drop(columns='Class').values, train_df.Class.values)
    
    # Store the parameters and loss
    results['weights'].append(weights)
    results['loss'].append(loss)

    # And test it
    temp = pp_nn.full_fwd_prop(weights['W_l'], weights['B_l'], validate_df.drop(columns='Class').T)
    
    # And store the accuracy
    results['validation_accuracy'].append(((pd.Series(temp[-1][0]) > 0.5).astype(int) == validate_df.Class.reset_index(drop=True)).sum() / validate_df.Class.shape[0])
    
# Once done running everything, let's make into a dataframe, so we can quickly find the best one
results = pd.DataFrame(results)
results.head()



Unnamed: 0,hyperparameters,weights,loss,validation_accuracy
0,"{'learning_rate': 0.001, 'epochs': 100, 'depth...",{'W_l': [[[ 0.85398276 0.31417314 0.17994224...,"[nan, nan, nan, nan, nan, nan, nan, nan, nan, ...",0.905882
1,"{'learning_rate': 0.001, 'epochs': 100, 'depth...",{'W_l': [[[ 0.77619612 0.30865069 0.17785691...,"[0.8163957621617163, 0.8161454319446956, 0.815...",0.246154
2,"{'learning_rate': 0.001, 'epochs': 100, 'depth...",{'W_l': [[[0.88429571 0.31424893 0.1819035 0....,"[2.209779137636132, 2.2078216554054824, 2.2058...",0.246154
3,"{'learning_rate': 0.001, 'epochs': 100, 'depth...",{'W_l': [[[0.91609926 0.31574088 0.18331833 0....,"[3.2032184057435296, 3.199389984160236, 3.1955...",0.246154
4,"{'learning_rate': 0.001, 'epochs': 100, 'depth...",{'W_l': [[[0.9261892 0.31621465 0.18376666 0....,"[4.330096401690009, 4.324461007454486, 4.31882...",0.246154


In [9]:
# Now, let's see which set of hyper parameters gave us the highest accuracy
results[results.validation_accuracy == results.validation_accuracy.max()]

Unnamed: 0,hyperparameters,weights,loss,validation_accuracy
57,"{'learning_rate': 0.01, 'epochs': 500, 'depth'...",{'W_l': [[[ 0.61793031 0.36127919 0.14693383...,"[nan, nan, 7.583493738988932, 6.27357264387054...",0.949321


In [10]:
# Can't see the hyperparameters, so show that explicitly
results.loc[results.validation_accuracy == results.validation_accuracy.max(), 'hyperparameters'].values

array([{'learning_rate': 0.01, 'epochs': 500, 'depth': 0}], dtype=object)

So, it looks like our best performing model used a learning rate of 0.1, trained for 500 epochs, and had no hidden layers.  Let's try testing this model on our test data to see what accuracy we get there.

In [11]:
# Final test with best performing hyperparameters
# First, grab the line (includes the weights)
best = results[results.validation_accuracy == results.validation_accuracy.max()]
# Define the model object (to allow us to test the best model)
pp_nn = PerceptronNN(**best.hyperparameters.values[0])

# And test it
temp = pp_nn.full_fwd_prop(best.weights.values[0]['W_l'], best.weights.values[0]['B_l'], test_df.drop(columns='Class').T)
((pd.Series(temp[-1][0]) > 0.5).astype(int) == test_df.Class.reset_index(drop=True)).sum() / test_df.Class.shape[0]

0.9425641025641026

So, we observe that our test accuracy 94.3%, so pretty good!