# Multi-Layer Perceptron
In this Jupyter Notebook, we code a Multi-Layer Perceptron (MLP) for classification and regression tasks using Stochastic Gradient Descent for training the model. The MLP would be able to perform multiclass as well as binary classification. <br>

The model would also allow customization of activation function used, as well as training schedule for optimization purposes. The MLP is designed with a single hidden layer architecture. <br>

This notebook is dvided into three sections
1. Section 1 - Helper functions and multiclass classification
1. Section 2 - Binary classification
1. Section 3 - Regression

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import itertools
import math
from math import exp
from sklearn import datasets
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import precision_recall_fscore_support

## Section 1

### 1. Implement the following function that creates a weight matrix and initializes it with small random real numbers.

In [3]:
def initializeWeights(input_neurons,output_neurons):
    W = np.random.randn(output_neurons,input_neurons) * 0.01
    return W

### 2. Implement the logistic sigmoid activation function.

In [4]:
def logistic(z):
    return 1/(1+np.exp(-z))

### 3. Implement the ReLU (rectified linear unit) activation function.

In [5]:
def relu(z):
    return np.maximum(0,z) 

### 4. Implement the tanh (hyperbolic tangent) activation function.

In [6]:
def tanh(z):
    return np.tanh(z)

### 5. Implement a MLP Classifier model class for performing multi-class classification.

### The MLP Classifier has a single hidden layer. It should have the following four methods.The model uses the back-propagation algorithm for learning the weights of the features/neurons. Note the that “fit” method should implement the Stochastic Gradient Descent algorithm for optimizing the weight update process.

In [37]:
'''
Helper function for one_hot_labels
Input:
    Y - target vector
Output:
    one_hot_matrix
'''
def one_hot_labels(Y):
    
    # Get unique labels in Y and number of observations
    unique_labels = np.unique(Y)
    n = len(Y)
    
    # Create one hot matrix
    labels = np.array(list(unique_labels)).reshape(len(unique_labels),1)
    one_hot_matrix = np.apply_along_axis(lambda x: np.full((n,),x),1,labels)
    one_hot_matrix = np.apply_along_axis(lambda x: (x==Y).astype(int),1,one_hot_matrix).T
    
    return one_hot_matrix

In [45]:
'''
Helper function for softmax calculation
Given matrix A (output from hidden layer), outputs probabilties   
'''
def softmax(A):
    A = np.apply_along_axis(np.exp,0,A)
    A = np.apply_along_axis(lambda x: x/(sum(x)),1,A)
    return A

In [17]:
'''
Helper function that determines the number of input layers
and the number of layers
Input:
    X - feature matrix
    Y - one_hot_matrix
    n_h - number of hidden layers
Output:
    n_o - tuple containing the values
'''
def size_of_layers(X,Y,n_h):
    n_i = X.shape[1]
    n_o = Y.shape[1]
    
    return n_i, n_h, n_o

In [18]:
'''
Helper function to initialize parameters of the MLP
This assumes a single hidden layer architecture
Bias is included
Input:
    n_i - size of input layer (# of X_train columns)
    n_h - size of hidden layer neurons
    n_o - size of output layer neurons (# number of classes)
Output:
    parameters - Dictionary of Parameters
'''
def initializeParameters(n_i, n_h, n_o):
    # Weights of each layer
    W_i = initializeWeights(n_i,n_h)
    W_h = initializeWeights(n_h,n_o)
    
    # Bias of each layer
    b_i = np.zeros((n_h,1))
    b_h = np.zeros((n_o,1))
    
    parameters = {
        'W_input': W_i,
        'b_input': b_i,
        'W_hidden': W_h,
        'b_hidden': b_h
    }
    
    return parameters

In [21]:
Y = one_hot_labels(y)

In [23]:
n_i, n_h, n_o = size_of_layers(X,Y,4)
params = initializeParameters(n_i, n_h, n_o)

In [78]:
'''
Forward propagation function given parameters, feature matrix and
activation function and function to calculate probabilities
Returns value after passing through hidden layer A2 and caching for backprop

Input:
    parameters - parameters
    X - feature matrix
    activation_function
    classifier - sigmoid or softmax for output layer
Output:
    A_h - output of hidden layer
    cache
'''
def forward_propagation(parameters, X, activation_function, classifier=softmax):
    # Parameters definition
    W_i = parameters['W_input']
    b_i = parameters['b_input']
    W_h = parameters['W_hidden']
    b_h = parameters['b_hidden']
    
    # Computation of Input Layer
    Z_i = np.dot(W_i,X.T) + b_i
    A_i = activation_function(Z_i)
    
    # Computation of Hidden Layer
    Z_h = np.dot(W_h,A_i) + b_h
    A_h = classifier(Z_h.T)
    
    # Caching 
    cache = {
        'Z_i': Z_i,
        'A_i': A_i,
        'Z_h': Z_h,
        'A_h': A_h
    }
    
    return A_h, cache
    

In [79]:
pred, cache = forward_propagation(params, X, relu)

In [89]:
'''
Helper function to calculate the cross entropy cost function
Input:
    pred - predicted values
    Y - actual values
Output:
    cost
'''
def cost(pred, Y):
    cost = np.multiply(Y, np.log(pred)) + np.multiply((1-Y), np.log(pred))
    cost = -np.sum(cost)/Y.shape[0]
    return cost

In [None]:
'''
def backward_propagation(parameters, cache, X, Y):
    # Number of samples
    n = Y.shape[0]
    
    # Parameters of model
    W_i = parameters['W_input']
    b_i = parameters['b_input']
    W_h = parameters['W_hidden']
    b_h = parameters['b_hidden']
    
    # Retrieve Cached Calculations
    A_i = cache['A_i']
    A_h = cache['A_h']
    b_i = cache['b_i']
    b_h = cache['b_h']
    
    # Calculation of gradients 
    dZ_h = A_h - Y
    dW_h = np.dot(dZ_h, A_i.T)/n
    #db_h = np.sum(dZ_h)
'''

### 6. Read the handwritten digits datasetusing the sklearn.datasets.load_digits function for performing multi-class classification.

In [14]:
digits = datasets.load_digits()
X = digits.data
y = digits.target

In [15]:
X.shape

(1797, 64)

### 7. Partition the data into train and test set. Use the “Partition” function from your previous assignment or from sklearn.

### 8. Standardize the features.

### 9. Hyperparameter tuning based on certain fixed-values hyperparameters.

### 10. Report on performance of model.

## Section 2

### 11. Implement binary classification module in theMLPClassifier. Then, performbinary classification on the handwritten digits dataset to recognize the digits “5” and “not-5”.

### (i) Hyperparameter tuning based on certain fixed-values hyperparameters

### (ii) Report performance on model.

## Section 3 (Extra Credit)

### 12. Implement a Multi-Layer Perceptron regressor model (a MLP Regressor class) with a single hidden layer. The model implements the backpropagation algorithm. To optimize the process of updating the weight matrices, it uses the Stochastic Gradient Descent (SGD) algorithm with momentum.

### (i) Hyperparameter tuning based on certain fixed-values hyperparameters

### (ii) Report performance on model.