## Multi-layer Perceptron
In this example, we will build a simple classifier for MNIST dataset.

Let's begin by importing the required packages.

In [1]:
import torch
import torch.nn as nn
import torchvision         # pip install torchvision
import torchvision.transforms as transforms

It's important to write code in a way which can run both on CPU and GPU.

In [2]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Let's define some hyper-parameters for our simple MLP model now. Note that when you actually write a code, a good practice is to use program arguments for defining the hyper-parameters. 

In [3]:
# Hyper-parameters 
input_size = 784
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001


There are a lot of standard datasets which can be downloaded through torchvision. Getting the MNIST data now.

In [4]:
# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='../../data', 
                                           train=True, 
                                           transform=transforms.ToTensor(),  
                                           download=True)
# Each element in the dataset is of dimension (1 x 28 x 28) correponsing to (channels x height x width)

test_dataset = torchvision.datasets.MNIST(root='../../data', 
                                          train=False, 
                                          transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


Pytorch provides a DataLoader class, which has utility functions for creating minibatches and easily shuffling the data during training. You can also write your own dataloader.

In [15]:
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

Let's implement a simple feed-forward network with one hidden layer.

In [17]:
# Fully connected neural network with one hidden layer
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.sign = nn.Sigmoid()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, s):
        out = self.fc1(s)     # batch_size x 500
        out = self.sign(out)
        out = self.fc2(out)   # batch_size x num_classes
        return out

We have defined a model above. Let's try to instantiate the model.

In [18]:
model = NeuralNet(input_size, hidden_size, num_classes).to(device)       # .to() moves the model to the GPU if available

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

We are all set to train our first model. Let's begin the **training** loop...

In [19]:
# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # tensor dimensions
        # images (batch_size x 1 x 28 x 28)
        # labels (100)
        
        # Move tensors to the configured device
        images = images.view(-1, 28*28).to(device)   # reshaping the images tensors (batch_size x 784)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        
        # Every time a varpiable is back propogated through, the gradient will be accumulated instead of being replaced.
        # Since the backward() function accumulates gradients, and you don’t want to mix up gradients between minibatches,
        # you have to zero them out at the start of a new minibatch. In some scenarios you might want to accumulate though.
        optimizer.zero_grad()
        
        loss.backward()
        
        # optimizer.step is performs a parameter update based on the current gradient 
        # (stored in .grad attribute of a parameter) and the update rule.
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Epoch [1/5], Step [100/600], Loss: 0.5206
Epoch [1/5], Step [200/600], Loss: 0.4570
Epoch [1/5], Step [300/600], Loss: 0.3054
Epoch [1/5], Step [400/600], Loss: 0.2922
Epoch [1/5], Step [500/600], Loss: 0.3086
Epoch [1/5], Step [600/600], Loss: 0.2555
Epoch [2/5], Step [100/600], Loss: 0.2131
Epoch [2/5], Step [200/600], Loss: 0.1772
Epoch [2/5], Step [300/600], Loss: 0.1982
Epoch [2/5], Step [400/600], Loss: 0.3384
Epoch [2/5], Step [500/600], Loss: 0.3317
Epoch [2/5], Step [600/600], Loss: 0.1485
Epoch [3/5], Step [100/600], Loss: 0.2184
Epoch [3/5], Step [200/600], Loss: 0.2601
Epoch [3/5], Step [300/600], Loss: 0.1159
Epoch [3/5], Step [400/600], Loss: 0.2290
Epoch [3/5], Step [500/600], Loss: 0.0902
Epoch [3/5], Step [600/600], Loss: 0.1996
Epoch [4/5], Step [100/600], Loss: 0.0943
Epoch [4/5], Step [200/600], Loss: 0.1360
Epoch [4/5], Step [300/600], Loss: 0.0974
Epoch [4/5], Step [400/600], Loss: 0.2533
Epoch [4/5], Step [500/600], Loss: 0.0889
Epoch [4/5], Step [600/600], Loss:

Once the loss converges, we can check the performance on the test set.

In [13]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
# The wrapper "with torch.no_grad()" temporarily sets all the requires_grad flag to false

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.view(-1, 28*28).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 97.63 %


Save the model for later use. You can keep saving the models during the training loop as well.

In [14]:
# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')