# Multilayer Perceptron (MLP)

1. This notebook implements a neural network from scratch to gain an understanding of how neural networks work, and that is essential for designing effective models. We create a multilayer perceptron i.e. a neural network with input, hidden, and output layers. 

2. We will then show how easy it is to use built-in scitkit-learn modules to implement an MLP, as it takes cares of all the complex linear algebra. 


------------------------------------------------------------------------------------------------------------------------------------------------

# MLP from scratch using NumPy with back propagation

In [1]:
import numpy as np
from random import random

Here, we will use 
1. One input layer with variable number of input nodes 
2. Variable number of hidden layers with variable number of nodes in each
3. One output layer with variable output nodes

The default for the MLP class below is set to 3 input nodes, 2 hidden layers with 3 nodes in each hidden layer, and 2 output nodes. 

In [2]:
# We create a MLP class.
class MLP(object):
    """A Multilayer Perceptron class.
    """

    def __init__(self, num_inputs=3, hidden_layers=[3, 3], num_outputs=2):
        """Constructor for the MLP. Takes the number of inputs,
            a variable number of hidden layers, and number of outputs

        Args:
            num_inputs (int): Number of inputs
            hidden_layers (list): A list of ints for the hidden layers
            num_outputs (int): Number of outputs
        """

        self.num_inputs = num_inputs
        self.hidden_layers = hidden_layers
        self.num_outputs = num_outputs

        # create a generic representation of the layers
        layers = [num_inputs] + hidden_layers + [num_outputs]

        # create random connection weights for the layers
        weights = []
        for i in range(len(layers) - 1):
            w = np.random.rand(layers[i], layers[i + 1])
            weights.append(w)
        self.weights = weights

        # save derivatives per layer
        derivatives = []
        for i in range(len(layers) - 1):
            d = np.zeros((layers[i], layers[i + 1]))
            derivatives.append(d)
        self.derivatives = derivatives

        # save activations per layer
        activations = []
        for i in range(len(layers)):
            a = np.zeros(layers[i])
            activations.append(a)
        self.activations = activations


    def forward_propagate(self, inputs):
        """Computes forward propagation of the network based on input signals.

        Args:
            inputs (ndarray): Input signals
        Returns:
            activations (ndarray): Output values
        """

        # the input layer activation is just the input itself
        activations = inputs

        # save the activations for backpropogation
        self.activations[0] = activations

        # iterate through the network layers
        for i, w in enumerate(self.weights):
            # calculate matrix multiplication between previous activation and weight matrix
            net_inputs = np.dot(activations, w)

            # apply sigmoid activation function
            activations = self._sigmoid(net_inputs)

            # save the activations for backpropogation
            self.activations[i + 1] = activations

        # return output layer activation
        return activations


    def back_propagate(self, error):
        """Backpropogates an error signal.
        Args:
            error (ndarray): The error to backprop.
        Returns:
            error (ndarray): The final error of the input
        """

        # iterate backwards through the network layers
        for i in reversed(range(len(self.derivatives))):

            # get activation for previous layer
            activations = self.activations[i+1]

            # apply sigmoid derivative function
            delta = error * self._sigmoid_derivative(activations)

            # reshape delta as to have it as a 2d array
            delta_re = delta.reshape(delta.shape[0], -1).T

            # get activations for current layer
            current_activations = self.activations[i]

            # reshape activations as to have them as a 2d column matrix
            current_activations = current_activations.reshape(current_activations.shape[0],-1)

            # save derivative after applying matrix multiplication
            self.derivatives[i] = np.dot(current_activations, delta_re)

            # backpropogate the next error
            error = np.dot(delta, self.weights[i].T)


    def train(self, inputs, targets, epochs, learning_rate):
        """Trains model running forward prop and backprop
        Args:
            inputs (ndarray): X
            targets (ndarray): Y
            epochs (int): Num. epochs we want to train the network for
            learning_rate (float): Step to apply to gradient descent
        """
        # now enter the training loop
        for i in range(epochs):
            sum_errors = 0

            # iterate through all the training data
            for j, input in enumerate(inputs):
                target = targets[j]

                # activate the network!
                output = self.forward_propagate(input)

                error = target - output

                self.back_propagate(error)

                # now perform gradient descent on the derivatives
                # (this will update the weights
                self.gradient_descent(learning_rate)

                # keep track of the MSE for reporting later
                sum_errors += self._mse(target, output)

            # Epoch complete, report the training error
            print("Error: {} at epoch {}".format(sum_errors / len(items), i+1))

        print("Training complete!")
        print("=====")


    def gradient_descent(self, learningRate=1):
        """Learns by descending the gradient
        Args:
            learningRate (float): How fast to learn.
        """
        # update the weights by stepping down the gradient
        for i in range(len(self.weights)):
            weights = self.weights[i]
            derivatives = self.derivatives[i]
            weights += derivatives * learningRate


    def _sigmoid(self, x):
        """Sigmoid activation function
        Args:
            x (float): Value to be processed
        Returns:
            y (float): Output
        """

        y = 1.0 / (1 + np.exp(-x))
        return y


    def _sigmoid_derivative(self, x):
        """Sigmoid derivative function
        Args:
            x (float): Value to be processed
        Returns:
            y (float): Output
        """
        return x * (1.0 - x)


    def _mse(self, target, output):
        """Mean Squared Error loss function
        Args:
            target (ndarray): The ground trut
            output (ndarray): The predicted values
        Returns:
            (float): Output
        """
        return np.average((target - output) ** 2)

Create a dataset to train the network

In [3]:
# create a dataset to train a network for the sum operation
items = np.array([[random()/2 for _ in range(2)] for _ in range(1000)])
targets = np.array([[i[0] + i[1]] for i in items])

Create a Multilayer Perceptron with one hidden layer

In [4]:
mlp = MLP(2, [5], 1)

Train the network

In [5]:
mlp.train(items, targets, 50, 0.1)

Error: 0.04936473020005759 at epoch 1
Error: 0.04080244056294107 at epoch 2
Error: 0.040611124321341806 at epoch 3
Error: 0.040364902600220263 at epoch 4
Error: 0.040042706417859765 at epoch 5
Error: 0.0396183139560821 at epoch 6
Error: 0.03905930363631688 at epoch 7
Error: 0.038326608045949714 at epoch 8
Error: 0.03737521684366198 at epoch 9
Error: 0.03615698040315813 at epoch 10
Error: 0.034626650468516894 at epoch 11
Error: 0.03275191408971237 at epoch 12
Error: 0.030526634382322965 at epoch 13
Error: 0.027983639293044944 at epoch 14
Error: 0.02520059178042336 at epoch 15
Error: 0.022292896386748133 at epoch 16
Error: 0.019393318345151355 at epoch 17
Error: 0.016625900556214247 at epoch 18
Error: 0.014085064593540244 at epoch 19
Error: 0.01182663651559623 at epoch 20
Error: 0.009870323504602924 at epoch 21
Error: 0.008208702760888807 at epoch 22
Error: 0.006817598914827156 at epoch 23
Error: 0.00566478371149638 at epoch 24
Error: 0.004716009027442522 at epoch 25
Error: 0.00393859437

With each iteration/epoch, the error is decreasing. One can increase the number of epochs to minimise the error. However, runtime increases for more complex problems.

Create a dummy data

In [6]:
# create dummy data
input = np.array([0.3, 0.1])
target = np.array([0.4])

Getting predictions

In [7]:
# get a prediction
output = mlp.forward_propagate(input)

print()
print("According to the network, {} + {} is equal to {}".format(input[0], input[1], output[0]))


According to the network, 0.3 + 0.1 is equal to 0.4030851007239451


This is an example of an MLP with forward and back propagation.

### Now that we know how to build a MLP from scratch just using NumPy, let us see what **scikit-learn** has to offer.

-----------------------------------------------------------------------------------------------------------------------------------------------

# MLP using scikit-learn

Class MLPClassifier implements an MLP algorithm that trains using Backpropagation.

From Regression tasks, Class MLPRegressor implements an MLP that trains using backpropagation with no activation function in the output layer, which can also be seen as using the identity function as activation function. Therefore, it uses the square error as the loss function, and the output is a set of continuous values. MLPRegressor also supports multi-output regression, in which a sample can have more than one target.

https://scikit-learn.org/stable/modules/neural_networks_supervised.html

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor

In [8]:
from sklearn.neural_network import MLPClassifier

In [9]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)

After fitting (training), the model can predict labels for new samples.

In [10]:
clf.predict([[2., 2.], [-1., -2.]])

array([1, 0])

In [11]:
[coef.shape for coef in clf.coefs_]

[(2, 5), (5, 2), (2, 1)]

MLPClassifier supports only the Cross-Entropy loss function, which allows probability estimates by running the predict_proba method.

MLP trains using Backpropagation. More precisely, it trains using some form of gradient descent and the gradients are calculated using Backpropagation. For classification, it minimizes the Cross-Entropy loss function, giving a vector of probability estimates.

In [12]:
clf.predict_proba([[2., 2.], [1., 2.]])

array([[1.96718015e-04, 9.99803282e-01],
       [1.96718015e-04, 9.99803282e-01]])

MLPClassifier supports multi-class classification by applying Softmax as the output function.

In [13]:
X = [[0., 0.], [1., 1.]]
y = [[0, 1], [1, 1]]

clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(15,), random_state=1)
clf.fit(X, y)

clf.predict([[1., 2.]])

array([[1, 1]])

In [14]:
clf.predict([[0., 0.]])

array([[0, 1]])