<a href="https://colab.research.google.com/github/kpjaskie/SenSIP21/blob/main/6_ML_Algorithms_Neural%20Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks Introduction

&#169; *Kristen Jaskie, June 2021*

In this notebook, we will be using Neural Networks to solve both regression and classification problems using sklearn. 

Multi-layer Preceptrons are shallow neural networks (as opposed to deep learning networks).  This means that they usually only have a couple of hidden layers.  

These networks generally have the following structure:


![picture](https://www.tutorialspoint.com/tensorflow/images/multi_layer_perceptron.jpg)




In [None]:
# First, we need to import the relavent libraries
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# To create simulated data
from sklearn.datasets import make_regression
from sklearn.neural_network import MLPRegressor

from sklearn.datasets import make_blobs
from sklearn.neural_network import MLPClassifier

# To create confusion matrices and metrics
from sklearn.metrics import mean_squared_error

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

import seaborn as sns; sns.set()  # for plotting confusion matrices
from tabulate import tabulate


# Neural Networks for Regression

In [None]:
#@title Generate regression data

Dataset_Size = 680 #@param {type:"slider", min:50, max:1000, step:10}
Num_Total_Features = 66 #@param {type:"slider", min:2, max:100, step:1}
Num_Useful_Features = 14 #@param {type:"slider", min:2, max:100, step:1}
Noise = 0.16 #@param {type:"slider", min:0, max:2, step:0.01}

#Let's make simple data to classify.
X, y = make_regression(n_samples=Dataset_Size, 
                       n_features=Num_Total_Features,
                       n_informative=Num_Useful_Features,
                       noise=Noise,
                       random_state=0)

if(Num_Total_Features is 2):
  plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='PiYG');

In [None]:
#@title Split data into Train/Validation/Test sets

from sklearn.model_selection import train_test_split

#Here, our training set is composed of 70% of the data and our validation
#and test sets are each composed of 15% of the data
X_train, X_remaining, y_train, y_remaining = train_test_split(X, y, test_size=0.3)
X_test, X_val, y_test, y_val = train_test_split(X_remaining, y_remaining, test_size=0.5)

#Combine training and validation sets for final classification when we're ready 
#for our final classification after validation training/testing.
X_final_train = np.concatenate((X_train, X_val), axis=0)
y_final_train = np.concatenate((y_train, y_val), axis=0)

print("Training Size   = ", X_train.shape[0])
print("Validation Size = ", X_val.shape[0])
print("Test Size       = ",  X_test.shape[0])

### Tune Hyperparameters

We're definitely going to have to do some hyperparameter tuning on this - neural nets have more hyperparameters than other methods.

Here, our hyperparameters consist of:
* The number of hidden layers
* The number of hidden nodes in each hidden layer
* The activation function which can be 'identity’, ‘logistic’, ‘tanh’, or ‘relu’ ('relu' is fairly standard).  See the documentation for more details.
* The type of solver which can be ‘lbfgs’, ‘sgd’, or ‘adam’.  'sgd' is standard gradiant descent, while 'adam' is a specialized version of gradient descent ('adam' is fairly standard)
* The max_iter specifies the maximum number of iterations.


In [None]:
#@title Tune Hyperparameters

#@markdown Set the hidden layers and hidden nodes in the code
Activation = 'relu' #@param ["relu","identity", "logistic", "tanh"]
Solver = 'adam' #@param ["adam","lbfgs", "sgd"]
Maximum_Iterations = 4990 #@param {type:"slider", min:0, max:10000, step:100}

# The number of nodes in each hidden layer is specified in this first field
NN_model_train = MLPRegressor(hidden_layer_sizes=(25, ), 
                              activation=Activation, 
                              solver=Solver, 
                              max_iter=Maximum_Iterations)

NN_model_train.fit(X_train, y_train)
nn_y_hat_train = NN_model_train.predict(X_train)
nn_y_hat_val = NN_model_train.predict(X_val)

nn_MSE_train = mean_squared_error(y_train, nn_y_hat_train) / y_train.shape[0]
nn_MSE_val = mean_squared_error(y_val, nn_y_hat_val) / y_val.shape[0]
print('Training Error: ', nn_MSE_train)
print('Validation Error: ', nn_MSE_val)

In [None]:
#@title Automating Hyperparameter Selection using Grid Search

Activation = 'relu' #@param ["relu","identity", "logistic", "tanh"]
Solver = 'adam' #@param ["adam","lbfgs", "sgd"]
Maximum_Iterations = 10000 #@param {type:"slider", min:0, max:10000, step:100}

#Try different numbers of hidden nodes in each layer 
hidden_nodes = [2, 4, 6, 10, 15, 25, 50]

#Let's try for just one hidden layer
for hl in hidden_nodes:
  NN_model_train = MLPRegressor(hidden_layer_sizes=(hl, ), 
                              activation=Activation, 
                              solver=Solver, 
                              max_iter=Maximum_Iterations)
  
  NN_model_train.fit(X_train, y_train)
  nn_y_hat_train = NN_model_train.predict(X_train)
  nn_y_hat_val = NN_model_train.predict(X_val)

  nn_MSE_train = mean_squared_error(y_train, nn_y_hat_train) / y_train.shape[0]
  nn_MSE_val = mean_squared_error(y_val, nn_y_hat_val) / y_val.shape[0]
  print('One hidden layer with ', hl, ' nodes. Training MSE: ', round(nn_MSE_train,3), 
        'Validation MSE: ', round(nn_MSE_val,3))

#Now we'll try two hidden layers
for hl in hidden_nodes:
  NN_model_train = MLPRegressor(hidden_layer_sizes=(hl, hl), 
                              activation=Activation, 
                              solver=Solver, 
                              max_iter=Maximum_Iterations)
  
  NN_model_train.fit(X_train, y_train)
  nn_y_hat_train = NN_model_train.predict(X_train)
  nn_y_hat_val = NN_model_train.predict(X_val)

  nn_MSE_train = mean_squared_error(y_train, nn_y_hat_train) / y_train.shape[0]
  nn_MSE_val = mean_squared_error(y_val, nn_y_hat_val) / y_val.shape[0]
  print('Two hidden layers with ', hl, ' nodes each. Training MSE: ', round(nn_MSE_train,3), 
        'Validation MSE: ', round(nn_MSE_val,3))
  
#Now we'll try three hidden layers
for hl in hidden_nodes:
  NN_model_train = MLPRegressor(hidden_layer_sizes=(hl, hl, hl), 
                              activation=Activation, 
                              solver=Solver, 
                              max_iter=Maximum_Iterations)
  
  NN_model_train.fit(X_train, y_train)
  nn_y_hat_train = NN_model_train.predict(X_train)
  nn_y_hat_val = NN_model_train.predict(X_val)

  nn_MSE_train = mean_squared_error(y_train, nn_y_hat_train) / y_train.shape[0]
  nn_MSE_val = mean_squared_error(y_val, nn_y_hat_val) / y_val.shape[0]
  print('Three hidden layers with ', hl, ' nodes each. Training MSE: ', round(nn_MSE_train,3), 
        'Validation MSE: ', round(nn_MSE_val,3))

In [None]:
#@title Test Final Model Performance

Activation = 'relu' #@param ["relu","identity", "logistic", "tanh"]
Solver = 'adam' #@param ["adam","lbfgs", "sgd"]
Maximum_Iterations = 7300 #@param {type:"slider", min:0, max:10000, step:100}

NN_model_train = MLPRegressor(hidden_layer_sizes=(hl, ), 
                              activation=Activation, 
                              solver=Solver, 
                              max_iter=Maximum_Iterations)
  
NN_model_train.fit(X_final_train, y_final_train)
nn_y_hat_train = NN_model_train.predict(X_final_train)
nn_y_hat_test = NN_model_train.predict(X_test)

nn_MSE_train = mean_squared_error(y_final_train, nn_y_hat_train) / y_final_train.shape[0]
nn_MSE_test = mean_squared_error(y_test, nn_y_hat_test) / y_test.shape[0]
print('Training Error: ', nn_MSE_train)
print('Final Testing Error: ', nn_MSE_test)

# Neural Networks for Regression

In [None]:
#@title Generate binary data

Dataset_Size = 840 #@param {type:"slider", min:50, max:1000, step:10}
Standard_Deviation = 0.75 #@param {type:"slider", min:0, max:2, step:0.01}

#Let's make simple data to classify.
X, y = make_blobs(n_samples=Dataset_Size, 
                  centers=2, 
                  random_state=0, 
                  cluster_std=Standard_Deviation)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='PiYG');

From this, we can clearly see the two well defined classes of data - red and green.



In [None]:
#Here, our training set is composed of 70% of the data and our validation
#and test sets are each composed of 15% of the data
X_train, X_remaining, y_train, y_remaining = train_test_split(X, y, test_size=0.3)
X_test, X_val, y_test, y_val = train_test_split(X_remaining, y_remaining, test_size=0.5)

#Combine training and validation sets for final classification when we're ready 
#for our final classification after validation training/testing.
X_final_train = np.concatenate((X_train, X_val), axis=0)
y_final_train = np.concatenate((y_train, y_val), axis=0)

print("Training Size   = ", X_train.shape[0])
print("Validation Size = ", X_val.shape[0])
print("Test Size       = ",  X_test.shape[0])

In [None]:
#@title Visualize Sets

from matplotlib import gridspec

fig, ax = plt.subplots(1, 3, 
                       gridspec_kw={
                           'width_ratios': [2, 2, 2],
                           'height_ratios': [1]},
                       figsize=(15,5))

ax[0].scatter(X_train[:, 0], X_train[:, 1], c=y_train, s=50, cmap='PiYG');
ax[0].set_title('Training Set')

ax[1].scatter(X_val[:, 0], X_val[:, 1], c=y_val, s=50, cmap='PiYG');
ax[1].set_title('Validation Set')

ax[2].scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=50, cmap='PiYG');
ax[2].set_title('Testing Set')

plt.show()


### NNs on Simulated Data

We're definitely going to have to do some hyperparameter tuning on this - neural nets have more hyperparameters than most other methods.

Here, our hyperparameters consist of:
* The number of hidden layers
* The number of hidden nodes in each hidden layer
* The activation function which can be ‘identity’, ‘logistic’, ‘tanh’, or ‘relu’ ('relu' is fairly standard).  See the documentation for more details.
* The type of solver which can be ‘lbfgs’, ‘sgd’, or ‘adam’.  'sgd' is standard gradiant descent, while 'adam' is a specialized version of gradient descent ('adam' is fairly standard)
* The max_iter specifies the maximum number of iterations.

The following code automatically tests a variety of hidden layers and hidden nodes though it uses the standard 'relu' activation function and 'adam' solver.  Feel free to play with these as well.

If you want to see more options, take a look at the documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html?highlight=mlp#sklearn.neural_network.MLPClassifier

In [None]:
#@title Tune Hyperparameters

#@markdown Set the hidden layers and hidden nodes in the code
Activation = 'relu' #@param ["relu","identity", "logistic", "tanh"]
Solver = 'adam' #@param ["adam","lbfgs", "sgd"]
Maximum_Iterations = 4990 #@param {type:"slider", min:0, max:10000, step:100}

# The number of nodes in each hidden layer is specified in this first field
NN_classify_model = MLPClassifier(hidden_layer_sizes=(10,), 
                                     activation=Activation, 
                                     solver=Solver, 
                                     max_iter=Maximum_Iterations)


NN_classify_model.fit(X_train, y_train)
class_y_hat_train = NN_classify_model.predict(X_train)
class_y_hat_val = NN_classify_model.predict(X_val)

train_accuracy = accuracy_score(y_train, class_y_hat_train)
val_accuracy = accuracy_score(y_val, class_y_hat_val)
print('Training Accuracy: ', train_accuracy)
print('Validation Accuracy: ', val_accuracy)

In [None]:
#@title Automating Hyperparameter Selection using Grid Search

#Try different hyperparameter values
hidden_nodes = [2, 4, 6, 10, 15, 25, 50]
hidden_layers = [1, 2, 3]
nn_activation = 'relu'
nn_solver = 'adam'
nn_max_iter = 1000

for layer in hidden_layers:
    for node in hidden_nodes:
        if layer is 1:
            NN_model = MLPClassifier(hidden_layer_sizes=(node,), 
                                     activation=nn_activation, 
                                     solver=nn_solver, 
                                     max_iter=nn_max_iter)
        elif layer is 2:
            NN_model = MLPClassifier(hidden_layer_sizes=(node, node), 
                                     activation=nn_activation, 
                                     solver=nn_solver, 
                                     max_iter=nn_max_iter)
        elif layer is 3:
            NN_model = MLPClassifier(hidden_layer_sizes=(node, node, node), 
                                     activation=nn_activation, 
                                     solver=nn_solver, 
                                     max_iter=nn_max_iter)
        NN_model.fit(X_train, y_train)
        #y_hat_nn = NN_model.predict(X_val)

        class_y_hat_train = NN_classify_model.predict(X_train)
        class_y_hat_val = NN_classify_model.predict(X_val)

        train_accuracy = accuracy_score(y_train, class_y_hat_train)
        val_accuracy = accuracy_score(y_val, class_y_hat_val)
        print(layer, ' hidden layers, ', node, ' nodes each. Training accuracy: ', 
              round(train_accuracy,3), 'Validation accuracy: ', round(val_accuracy,3))

In [None]:
#@title Test Final Model Performance

Activation = 'relu' #@param ["relu","identity", "logistic", "tanh"]
Solver = 'adam' #@param ["adam","lbfgs", "sgd"]
Maximum_Iterations = 7300 #@param {type:"slider", min:0, max:10000, step:100}

NN_classify_model = MLPClassifier(hidden_layer_sizes=(10,), 
                                     activation=Activation, 
                                     solver=Solver, 
                                     max_iter=Maximum_Iterations)


NN_classify_model.fit(X_final_train, y_final_train)
class_y_hat_train = NN_classify_model.predict(X_final_train)
class_y_hat_test = NN_classify_model.predict(X_test)

train_accuracy = accuracy_score(y_final_train, class_y_hat_train)
test_accuracy = accuracy_score(y_test, class_y_hat_test)
print('Training Accuracy: ', train_accuracy)
print('Test Accuracy: ', test_accuracy)

In [None]:
#@title Evaluate Final Model

plt.figure(0)
plt.scatter(X_final_train[:, 0], X_final_train[:, 1], c=y_final_train, s=50, cmap='PiYG', alpha=0.2);
plt.scatter(X_test[:, 0], X_test[:, 1], c=class_y_hat_test, s=50, cmap='PiYG');
plt.show()

accuracy_nn_final = round(accuracy_score(y_test, class_y_hat_test), 2)
precision_nn_final = round(precision_score(y_test, class_y_hat_test), 2)
recall_nn_final = round(recall_score(y_test, class_y_hat_test), 2)
fscore_nn_final = round(f1_score(y_test, class_y_hat_test), 2)

print(tabulate([['Accuracy', accuracy_nn_final], 
                ['Precision', precision_nn_final],
                ['Recall', recall_nn_final],
                ['F-score', fscore_nn_final],
                [' ', ' ']], 
               headers=['Metric', 'Value']))

CM_nn = confusion_matrix(y_test, class_y_hat_test)

ax = sns.heatmap(CM_nn.T, square=True, annot=True, 
                 fmt='d', cbar=False, cmap="Blues")

ax.set_xticklabels(('Red Class','Green Class'))
ax.set_yticklabels(('Red Class','Green Class'), 
    rotation=0, fontsize="10", va="center")

plt.xlabel("True Labels")
plt.ylabel("Predicted Labels");
plt.title("NN Final Test Confusion Matrix")
plt.show()