# Multilayer perceptron using Pytorch

This notebook shows how to create a multilayer perceptron using Pytorch. First, we will see what is a multilayer perceptron and then we will go through the steps of building a multilayer perceptron in a simple but clear way. The objective is to build a multilayer perceptron to make predictions.

**What is a multilayer perceptron?**

A multilayer perceptron is a feedforward neural network which consists of atleast three layers, that is; an input layer, hidden layer and an output layer. We can have more than one hidden layer in between the input and output layer of a multilayer perceptron. In a feedforward neural network, data move in a forward direction only, that is, from the input to the output layer. Each layer consists of nodes called neurons, which are also known as units. The neurons in each layer use linear or non linear activation functions to produce an output. This output is fed into the immediate neuron as an input. The non linear activation starts at the immediate layer of the input layer.  
We use multilayer perceptron to make predictions, recognitions and classification of data. Multilayer perceptrons have the ability to learn non-linearity patterns of data. The diagrams below show examples of a multilayer perceptron.The first diagram is a multilayer perceptron with one hidden layer while the second diagram is a multilayer pereptron with two hidden layers. Diagram source: https://www.learnopencv.com/wp-content/uploads/2017/10/mlp-mnist-schematic.jpg, https://becominghuman.ai/multi-layer-perceptron-mlp-models-on-real-world-banking-data-f6dd3d7e998f.  From the diagrams, we can see that the MLPs are fully connected, each node in one layer connects to every node in the immediate layer. This implies that each node in one layer connects with a certain weight to every node in the immediate layer. 




1. <img src="MLP.png" alt="Drawing" style="width: 300px;" /> 

2.  <img src="mm.png" alt="Drawing" style="width: 300px;"/> 

**Steps in buliding a multilayer perceptron (MLP)**

1. **Importing required libraries**

As mentioned earlier that we will build the MLP using pytorch, the first step is to import this library. Pytorch is library used for deep learning using GPUs and CPUs. It uses tensors as opposed to arrays used by numpy library. Pytorch has several API, that is, functions that performs different tasks. For example, the **torch.utils.data.DataLoader** function that makes dataset to be iterable.



In [1]:
import torch
from torch import nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader 


2. **Loading dataset**

The second step after importing the necessary libraries, is to load your data. The data that was used here is the CIFAR10 data obtained from **torchvision**, which is a package in pytorch consisting of several datasets. To access and load our dataset, we use the function, **torchvision.datasets**. 



In [27]:
train_dataset = dsets.CIFAR10('path/to/CIFAR10_root/', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)
test_dataset = dsets.CIFAR10('path/to/CIFAR10_root/', 
                            train=False, 
                            transform=transforms.ToTensor(),
                            )



3. **Making dataset iterable**

In step three, we will use the function, **DataLoader**, from the torchvision package to make our dataset iterable. At this step will initialize the batch size of our dataset that will be used in the training of our model. We can also shuffle our data in this step if necessary. The batch-size used here is 200, one can take any amount of batch size. We set shuffle to be true inorder to shuffle our data. Here, we can also initialize the number of iterations and epochs which are used in training.

In [20]:
batch_size = 200
no_iterations = 6000
no_epochs = no_iterations/(len(train_dataset)/batch_size)
no_epochs = int(no_epochs)

train_loader = DataLoader(dataset=train_dataset,batch_size=batch_size,shuffle=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)

4. **Creating MLP model class**

Now, it is time to create our MLP model class. Our class will contain two parts; initialization part and defining the forward pass process. The forward pass involves two step; application of linear and non linear function. Our MLP neural network will have 3 hidden layers and two activation function will be used; **Rectified Linear Unit and tanh function**. **nn.Module** defines the neural network.

In [22]:
class MLP(nn.Module):
    def __init__(self, input_dim, no_neurons,output_dim):
        super(MLP,self).__init__()
        ###initialization step##
        self.fc1 = nn.Linear(input_dim, no_neurons)
        self.non_linear1 = nn.ReLU()
        self.fc2 = nn.Linear(no_neurons, no_neurons)
        self.non_linear2 = nn.ReLU()
        self.fc3 = nn.Linear(no_neurons, no_neurons)
        self.non_linear3 = nn.Tanh()
        self.fc4 = nn.Linear(no_neurons, output_dim)
    def forward(self,x):
        out = self.fc1(x)
        out = self.non_linear1(out)
        out = self.fc2(out)
        out = self.non_linear2(out)
        out = self.fc3(out)
        out = self.non_linear3(out)
        out = self.fc4(out)
        return out



**Instantiating model class**

Here we specify the size of our image, which is the input dimension. CIFA10 has image with width size of 32, length size of 32 and 3 channels. 
We specify the number of neurons/units in the first layer, this can be any number.
We specify the number of output, depending on the number of labels or categories you have in your target variable. CIFAR10 has 10 labels, so our output dimension will be 10.
We also call our model calss here.

In [23]:
input_dim = 32*32*3
no_neurons = 500
output_dim = 10
model = MLP(input_dim, no_neurons, output_dim)

**Instantiating loss function**

We call the loss function from the nn.module. Here we use CrossEntropyLoss since we are predicting classes. The loss function evaluates our model, whether it preforms well or poor. It calculates the difference beteen the predicted output and the actual output. If the error is too big, then this implies that our model performs poorly. This is a simple explanation but there is a deep explanation.  

In [24]:
criterion = nn.CrossEntropyLoss()

**Instantiating optimizer function**

We call the Stochastic Gradient Descent (SGD) from the torch library which  calculates weights. We set a learning rate which determines the size of the steps to be taken in gradient descent. Too big learning rate, may lead to divergence of the gradient descent, the gradient descent may not converge and for too small learning rate, the gradient descent may take long to converge. Therefore, the learning rate should sufficiently chosen.

In [25]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(),lr = learning_rate)

**Training and testing our model**

This is the final part, in which we now train our model and test it. We call all the functions defined above and make iterations.

In [26]:

Iter = 0
for epoch in range(no_epochs):
    #Enumerate assigns index to each item in an iterable object
    for i, (images, labels) in enumerate(train_loader):
        #Load images
        images = images.view(-1,32*32*3).requires_grad_()
        
        
        #Clear grad with respect to parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs
        outputs = model(images)
        
        #Calculate loss
        loss = criterion(outputs,labels)
        #backward pass
        
        loss.backward()
        #updating parameters
        optimizer.step()

        
        Iter += 1
        
        if Iter % 500 == 0:
            #calculate accuracy
            correct = 0
            
            total = 0
            #iterate through test set
            for images, labels in test_loader:
                #load images
                images = images.view(-1,32*32*3).requires_grad_()
                
                #Forward_pass to get output
                outputs = model(images)
                
                #Get prediction sfrom maximum value
                _, predicted = torch.max(outputs.data,1)
                
                #total number of labels
                total += labels.size(0)
                
                #total correct predictions

                correct += (predicted == labels).sum()
            
            accuracy =  (correct / float(total)) * 100

            
           
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(Iter, loss.item(), accuracy))       
            
            #torch.save(model, path)
            #torch.save(model.state_dict(),path)

Iteration: 500. Loss: 1.7275830507278442. Accuracy: 37.130001068115234
Iteration: 1000. Loss: 1.6027252674102783. Accuracy: 39.619998931884766
Iteration: 1500. Loss: 1.5468699932098389. Accuracy: 45.63999938964844
Iteration: 2000. Loss: 1.43499755859375. Accuracy: 46.90999984741211
Iteration: 2500. Loss: 1.5316741466522217. Accuracy: 46.07999801635742
Iteration: 3000. Loss: 1.4730250835418701. Accuracy: 45.12000274658203
Iteration: 3500. Loss: 1.3327068090438843. Accuracy: 49.95000076293945
Iteration: 4000. Loss: 1.4402254819869995. Accuracy: 50.790000915527344
Iteration: 4500. Loss: 1.4370384216308594. Accuracy: 47.77000045776367
Iteration: 5000. Loss: 1.251592993736267. Accuracy: 50.55999755859375
Iteration: 5500. Loss: 1.1726099252700806. Accuracy: 52.1099967956543
Iteration: 6000. Loss: 1.2683820724487305. Accuracy: 50.660003662109375


**Conclusion**

Finally, we have our accuaracy results which is 50.66%, which is not good accuracy. There are some other ways that we can consider, like fine-tuning the hyperparameters and also do some regularization inorder to improve accuracy. At this point, we have achieved our objective which was to build a multilayer perceptron using pytorch. 