The model works as follows:-
-It contains a input layer X1 (100,784)
-It is multiplied to a 500-d matrix containing the weights W1(784,500)
-The  bias B1(500) - a zero matrix - is added to the result and saved in Y1(100,500)
-The function is made non-linear by applying the relu activation function on Y1 and the result is saved in X2(100,500)
-Again, X2 is multiplied to a different set of weights W2(500,10) and another bias B2(10) is added to the result.
-The result is saved into Y2 (100,10).
-Y2 is sent through the softmax activation function(inbuilt in cross entropy loss) to calculate the probabilities of each.
-The gradients of all the parameters requiring gradients is computed and the optimizer is called upon to set those gradients.
*This the end of the training phase.*
-In the testing phase, the labels are compared with the predictions and the accuracy of the model is calculated.

Steps:-
1)Import the dependencies
2)Load the Data into testing and training sets
3)Divide it into mini batches 
*Until this point everything is same as logistic regression*
4)Initialize the parameters - 2 sets of weights and 2 sets of biases - make sure to set requires_grad = True for all
5)Assign the optimizer - SGD
6)Train the model by iterating through the mini - batches
7)Test the model using the test mini-batches
8)Calculate accuracy

In [4]:
#Imporating Dependencies
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
from torchvision import datasets, transforms
import torch.nn.functional as F
        

#Loading Data
mnist_train = datasets.MNIST(root = './datasets', train = True, transform = transforms.ToTensor(), download = True)
mnist_test = datasets.MNIST(root = './datasets', train = False, transform = transforms.ToTensor(), download = True)

#Distributing Data
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size = 100, shuffle = True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size = 100, shuffle = False)

#Initialize Parameters 
W1 = (torch.randn(784,500))*2/np.sqrt(784)
W1.requires_grad_()
W2 = (torch.randn(500,10))*2/np.sqrt(500)
W2.requires_grad_()
B1 = torch.zeros(500, requires_grad = True)
B2 = torch.zeros(10, requires_grad = True)

#Optimizer
optimizer = torch.optim.SGD([W1,W2,B1,B2], lr = 0.7)

#Training 
print('Traning Phase...')
for images,labels in tqdm(train_loader):
    #clear the gradients
    optimizer.zero_grad()
    
    #Forward pass
    X1 = images.view(-1, 28*28)
    Y1 = torch.matmul(X1,W1) + B1
    #Relu
    X2 = torch.max(torch.zeros_like(Y1),Y1)
    
    Y2 = torch.matmul(X2,W2) + B2
    
    #Calculating loss
    ce_loss = F.cross_entropy(Y2,labels)
    
    #Backward pass
    ce_loss.backward()
    optimizer.step()
print('Training Over.\n')

#Testing
correct = 0
total = len(mnist_test)

print('Testing Phase...')
for image, label in tqdm(test_loader):
    x1 = image.view(-1,28*28)
    y1 = torch.matmul(x1,W1) + B1
    x2 = F.relu(y1)
    y2 = torch.matmul(x2, W2) + B2
    pred = torch.argmax(y2, dim = 1)
    correct += torch.sum((pred == label).float())
print('Testing Over.\n')

#Calculating Accuracy
print('The accuracy of the multi-layered perceptron model is: {} %'.format(correct*100/total))
# Make sure to print out your accuracy on the test set at the end.

Traning Phase...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over.

Testing Phase...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over.

The accuracy of the multi-layered perceptron model is: 96.1500015258789 %


Using Classes

In [7]:
#Imporating Dependencies
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
from torchvision import datasets, transforms
import torch.nn.functional as F
        

#Loading Data
mnist_train = datasets.MNIST(root = './datasets', train = True, transform = transforms.ToTensor(), download = True)
mnist_test = datasets.MNIST(root = './datasets', train = False, transform = transforms.ToTensor(), download = True)

#Distributing Data
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size = 100, shuffle = True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size = 100, shuffle = False)

#Defining the training classes
class ModelTrain:
    
    def __init__(self,row,col):
        self.W = (torch.randn(row,col))*2/np.sqrt(row)
        self.W.requires_grad_()
        self.B = torch.zeros(col, requires_grad = True)
        
    def forward(self,x):
        return torch.matmul(x,self.W) + self.B
    
#Instantiating the layers of the model
layer_1 = ModelTrain(784,500)
layer_2 = ModelTrain(500,10)

#Optimizer
optimizer = torch.optim.SGD([layer_1.W,layer_1.B,layer_2.W,layer_2.B], lr = 0.7)

#Training 
print('Traning Phase...')

for images,labels in tqdm(train_loader):
    #clear the gradients
    optimizer.zero_grad()
    
    #Forward pass
    X1 = images.view(-1, 28*28)
    
    #Layer-1
    Y1 = layer_1.forward(X1)
    
    #Relu
    X2 = F.relu(Y1)
    
    #Layer-2
    Y2 = layer_2.forward(X2)
    
    #Calculating loss
    ce_loss = F.cross_entropy(Y2,labels)
    
    #Backward pass
    ce_loss.backward()
    optimizer.step()
    
print('Training Over.\n')

#Testing
correct = 0
total = len(mnist_test)

print('Testing Phase...')

for image, label in tqdm(test_loader):
    
    X1 = image.view(-1,28*28)
    Y1 = layer_1.forward(X1)
    
    X2 = F.relu(Y1)
    Y2 = layer_2.forward(X2)
    
    pred = torch.argmax(Y2, dim = 1)
    correct += torch.sum((pred == label).float())
    
print('Testing Over.\n')

#Calculating Accuracy
print('The accuracy of the multi-layered perceptron model is: {} %'.format(correct*100/total))
# Make sure to print out your accuracy on the test set at the end.

Traning Phase...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over.

Testing Phase...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over.

The accuracy of the multi-layered perceptron model is: 96.04000091552734 %


Using Higher Level API (torch.nn module)

In [31]:
#Importing dependencies
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
from torchvision import datasets,transforms
import torch.nn as nn
import torch.nn.functional as F

#Loading datasets
mnist_train = datasets.MNIST(root = './databases', train= True, transform = transforms.ToTensor(), download = True)
mnist_test = datasets.MNIST(root = './databases', train= False, transform = transforms.ToTensor(), download = True)

#Distributing Data
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size = 100, shuffle = True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size = 100, shuffle = False)

#Defining the model
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(784,500)
        self.layer_2 = nn.Linear(500,10)
    
    def layer1(self,x):
        y = self.layer_1(x)
        return F.relu(y)
    
    def layer2(self,x):
        return self.layer_2(x)
    
#Instantiating the model
model = MLP()

#Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = 0.6)

#Training
print('Training...')

for images,labels in tqdm(train_loader):
    optimizer.zero_grad()
    x1 = images.view(-1,28*28)
    x2 = model.layer1(x1)
    y = model.layer2(x2)
    
    loss = F.cross_entropy(y,labels)
    loss.backward()
    optimizer.step()
    
print('Training Over!')

#Testing
print('Testing...')

correct = 0
total = len(mnist_test)

for images, labels in tqdm(test_loader):
    x1 = images.view(-1,28*28)
    x2 = model.layer1(x1)
    y = model.layer2(x2)
    
    pred = torch.argmax(y, dim = 1)
    correct += torch.sum((pred == labels).float())
    
print('Testing Over!')

print('The accuracy of the Multi-Layered Perceptron model is: {} %'.format(correct*100/total))

Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 96.44000244140625 %


Adding some more hidden layers - Doesn't work any better than in case of a single hidden layer.

In [14]:
#Importing dependencies
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
from torchvision import datasets,transforms
import torch.nn as nn
import torch.nn.functional as F

#Loading datasets
mnist_train = datasets.MNIST(root = './databases', train= True, transform = transforms.ToTensor(), download = True)
mnist_test = datasets.MNIST(root = './databases', train= False, transform = transforms.ToTensor(), download = True)

#Distributing Data
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size = 100, shuffle = True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size = 100, shuffle = False)

#Defining the model
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(784,500)
        self.layer_2 = nn.Linear(500,1000)
        self.layer_3 = nn.Linear(1000,10)
    
    def layer1(self,x):
        y = self.layer_1(x)
        return F.relu(y)
    
    def layer2(self,x):
        y = self.layer_2(x)
        return F.relu(y)
    
    def layer3(self,x):
        return self.layer_3(x)
    
#Instantiating the model
model = MLP()

#Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = 0.25)

#Training
print('Training...')

for images,labels in tqdm(train_loader):
    optimizer.zero_grad()
    x1 = images.view(-1,28*28)
    x2 = model.layer1(x1)
    x3 = model.layer2(x2)
    y  = model.layer3(x3)
    
    loss = F.cross_entropy(y,labels)
    loss.backward()
    optimizer.step()
    
print('Training Over!')

#Testing
print('Testing...')

correct = 0
total = len(mnist_test)

for images, labels in tqdm(test_loader):
    x1 = images.view(-1,28*28)
    x2 = model.layer1(x1)
    x3 = model.layer2(x2)
    y  = model.layer3(x3)
    
    pred = torch.argmax(y, dim = 1)
    correct += torch.sum((pred == labels).float())
    
print('Testing Over!')

print('The accuracy of the Multi-Layered Perceptron model is: {} %'.format(correct*100/total))

Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 95.0999984741211 %


Method to estimate the best learning rate

In [15]:
#Importing dependencies
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
from torchvision import datasets,transforms
import torch.nn as nn
import torch.nn.functional as F
import operator as op
#Loading datasets
mnist_train = datasets.MNIST(root = './databases', train= True, transform = transforms.ToTensor(), download = True)
mnist_test = datasets.MNIST(root = './databases', train= False, transform = transforms.ToTensor(), download = True)


#Defining the model
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(784,500)
        self.layer_2 = nn.Linear(500,1000)
        self.layer_3 = nn.Linear(1000,10)

    def layer1(self,x):
        y = self.layer_1(x)
        return F.relu(y)

    def layer2(self,x):
        y = self.layer_2(x)
        return F.relu(y)

    def layer3(self,x):
        return self.layer_3(x)
        
#Distributing Data
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size = 100, shuffle = True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size = 100, shuffle = False)

#Instantiating the model
model = MLP()
    
#Initiating learning rate 
l_r = 0.1
d = {}

while True:
    print('\nThe learning rate for the iteration is - ', l_r)
    
    #Optimizer
    optimizer = torch.optim.SGD(model.parameters(), lr = l_r)

    #Training
    print('Training...')

    for images,labels in tqdm(train_loader):
        optimizer.zero_grad()
        x1 = images.view(-1,28*28)
        x2 = model.layer1(x1)
        x3 = model.layer2(x2)
        y  = model.layer3(x3)

        loss = F.cross_entropy(y,labels)
        loss.backward()
        optimizer.step()

    print('Training Over!')

    #Testing
    print('Testing...')

    correct = 0
    total = len(mnist_test)

    for images, labels in tqdm(test_loader):
        x1 = images.view(-1,28*28)
        x2 = model.layer1(x1)
        x3 = model.layer2(x2)
        y  = model.layer3(x3)

        pred = torch.argmax(y, dim = 1)
        correct += torch.sum((pred == labels).float())

    print('Testing Over!')
    
    accuracy = correct*100/total
    print('The accuracy of the Multi-Layered Perceptron model is: {} %'.format(accuracy))
    
    d[l_r] = float(accuracy)
    
    if accuracy >= 97:
        print('Accuracy Achieved! at {}.'.format(l_r))
        break
    elif l_r >= 1:
        print('Learning rate limit reached')
        break
    
    l_r += 0.05

dict_sorted = dict(sorted(d.items(),key = op.itemgetter(1),reverse = True))
print('\n',dict_sorted,'\n')
for key,value in dict_sorted.items():
    print('The best learning rate is', key)
    break


The learning rate for the iteration is -  0.1
Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 92.5 %

The learning rate for the iteration is -  0.15000000000000002
Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 95.08999633789062 %

The learning rate for the iteration is -  0.2
Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 96.38999938964844 %

The learning rate for the iteration is -  0.25
Training...


  0%|          | 0/600 [00:00<?, ?it/s]

Training Over!
Testing...


  0%|          | 0/100 [00:00<?, ?it/s]

Testing Over!
The accuracy of the Multi-Layered Perceptron model is: 97.19000244140625 %
Accuracy Achieved! at 0.25.

 {0.25: 97.19000244140625, 0.2: 96.38999938964844, 0.15000000000000002: 95.08999633789062, 0.1: 92.5} 

The best learning rate is 0.25


In [8]:
import operator as op
D = {}
for i in range(10,0,-1):
    D[i] = i+1
sortkey = dict(sorted(D.items(), key = op.itemgetter(1), reverse = True))
print(D)
print(sortkey)
for key,value in sortkey.items():
    print('The best learning rate is', value)
    break

{10: 11, 9: 10, 8: 9, 7: 8, 6: 7, 5: 6, 4: 5, 3: 4, 2: 3, 1: 2}
{10: 11, 9: 10, 8: 9, 7: 8, 6: 7, 5: 6, 4: 5, 3: 4, 2: 3, 1: 2}
The best learning rate is 11
