## 1 .How to build neural networks using nn.Module class
## 2.How to build custom data input pipelines with data augmentation using Dataset and Dataloader classes.
## 3.How to configure your learning rate with different learning rate schedules
## 4.Training a Resnet bases image classifier to classify images from the CIFAR-10 dataset.

Tutorial from https://blog.paperspace.com/pytorch-101-building-neural-networks/

In [1]:
#Import the libraries
import torch.nn as nn
import torch
import numpy as np
import os
import random
from PIL import Image
import time

The nn.Module class has two methods that we have to override: the __init__ (here we define various parameters of a layer such as filters, kernel size and etc) function and the foward function (doesn't need to be explicitly called)

In [2]:
#Example of use of foward

class MyLayer(nn.Module):
  def __init__(self, param):
    super().__init__()
    self.param = param 
  
  def forward(self, x):
    return x * self.param

In [3]:
myLayerObject = MyLayer(5)
output = myLayerObject(torch.Tensor([5,4,3]))
print(output)

tensor([25., 20., 15.])


Another widely used and important class is the nn.Sequential class.  When initiating this class we can pass a list of nn.Module objects in a particular sequence. The object returned by nn.Sequential is itself a nn.Module object. When this object is run with an input, it sequentially runs the input through all the nn.Module object we passed to it, in the very same order as we passed them.

In [4]:
combinedNetwork = nn.Sequential(MyLayer(5), MyLayer(10))
output = combinedNetwork([3,4])

Lets start the implementation of our network

In [5]:
#Implementing the residual block

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        
        # Conv Layer 1
        self.conv1 = nn.Conv2d(
            in_channels=in_channels, out_channels=out_channels,
            kernel_size=(3, 3), stride=stride, padding=1, bias=False
        )
        self.bn1 = nn.BatchNorm2d(out_channels)
        
        # Conv Layer 2
        self.conv2 = nn.Conv2d(
            in_channels=out_channels, out_channels=out_channels,
            kernel_size=(3, 3), stride=1, padding=1, bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)
    
        # Shortcut connection to downsample residual
        # In case the output dimensions of the residual block is not the same 
        # as it's input, have a convolutional layer downsample the layer 
        # being bought forward by approporate striding and filters
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(
                    in_channels=in_channels, out_channels=out_channels,
                    kernel_size=(1, 1), stride=stride, bias=False
                ),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = nn.ReLU()(out)
        return out

    

In [6]:
class ResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet, self).__init__()
        
        # Initial input conv
        self.conv1 = nn.Conv2d(
            in_channels=3, out_channels=64, kernel_size=(3, 3),
            stride=1, padding=1, bias=False
        )

        self.bn1 = nn.BatchNorm2d(64)
        
        # Create blocks
        self.block1 = self._create_block(64, 64, stride=1)
        self.block2 = self._create_block(64, 128, stride=2)
        self.block3 = self._create_block(128, 256, stride=2)
        self.block4 = self._create_block(256, 512, stride=2)
        self.linear = nn.Linear(512, num_classes)
    
    # A block is just two residual blocks for ResNet18
    def _create_block(self, in_channels, out_channels, stride):
        return nn.Sequential(
            ResidualBlock(in_channels, out_channels, stride),
            ResidualBlock(out_channels, out_channels, 1)
        )

    def forward(self, x):
	# Output of one layer becomes input to the next
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.block1(out)
        out = self.block2(out)
        out = self.block3(out)
        out = self.block4(out)
        out = nn.AvgPool2d(4)(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

In [7]:
data_dir = 'cifar/train/'

with open('cifar/labels.txt') as label_file:
    labels = label_file.read().split()
    label_mapping = dict(zip(labels, list(range(len(labels)))))

In [8]:
"""
Preprocessing will:

1. Randomly horizontally the image with a probability of 0.5
2. Normalise the image with mean and standard deviation of CIFAR dataset
3. Reshape it from W  H  C to C  H  W.
"""

def preprocess(image):
    image = np.array(image)
    
    if random.random() > 0.5:
        image = image[::-1,:,:]
    
    cifar_mean = np.array([0.4914, 0.4822, 0.4465]).reshape(1,1,-1)
    cifar_std  = np.array([0.2023, 0.1994, 0.2010]).reshape(1,1,-1)
    image = (image - cifar_mean) / cifar_std
    image = image.transpose(2,1,0)
    
    return image


Normally, there are two classes PyTorch provides you in relation to build input pipelines to load data.

1. torch.data.utils.dataset, which we will just refer as the dataset class now.

2. torch.data.utils.dataloader , which we will just refer as the dataloader class now.

## torch.utils.data.dataset
dataset is a class that loads the data and returns a generator so that you iterate over it. It also lets you incorporate data augmentation techniques into the input Pipeline.

If you want to create a dataset object for your data, you need to overload three functions.

__init__ function. Here, you define things related to your dataset here. Most importantly, the location of your data. You can also define various data augmentations you want to apply.

__len__ function. Here, you just return the length of the dataset.

__getitem__ function. The function takes as an argument an index i and returns a data example. This function would be called every iteration during our training loop with a different i by the dataset object.

In [9]:
class Cifar10Dataset(torch.utils.data.Dataset):
    def __init__(self, data_dir, data_size = 0, transforms = None):
        files = os.listdir(data_dir)
        files = [os.path.join(data_dir,x) for x in files]
        
        
        if data_size < 0 or data_size > len(files):
            assert("Data size should be between 0 to number of files in the dataset")
        
        if data_size == 0:
            data_size = len(files)
        
        self.data_size = data_size
        self.files = random.sample(files, self.data_size)
        self.transforms = transforms
        
    def __len__(self):
        return self.data_size
    
    def __getitem__(self, idx):
        image_address = self.files[idx]
        image = Image.open(image_address)
        image = preprocess(image)
        label_name = image_address[:-4].split("_")[-1]
        label = label_mapping[label_name]
        
        image = image.astype(np.float32)
        
        if self.transforms:
            image = self.transforms(image)

        return image, label

## torch.utils.data.Dataloader
The Dataloader class facilitates

1.Batching of Data

2.Shuffling of Data

3.Loading multiple data at a single time using threads

4.Prefetching, that is, while GPU crunches the current batch, Dataloader can load the next batch into memory in meantime. This means GPU doesn't have to wait for the next batch and it speeds up training.


In [10]:
trainset = Cifar10Dataset(data_dir='cifar/train/',transforms=None)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = Cifar10Dataset(data_dir='cifar/test/',transforms=None)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True, num_workers=2)

In [11]:
for data in trainloader: #or trainset
    img, label = data

In [12]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

clf = ResNet()
clf.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (block1): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, a

In [13]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(clf.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[150,200],gamma=0.1)

The main goal of this tutorial is to show how Pytorch works, so we are not trying to obtain a high accurate model

In [14]:
for epoch in range(10):
    losses = []
    scheduler.step()
    # Train
    start = time.time()
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()                 # Zero the gradients

        outputs = clf(inputs)                 # Forward pass
        loss = criterion(outputs, targets)    # Compute the Loss
        loss.backward()                       # Compute the Gradients

        optimizer.step()                      # Updated the weights
        losses.append(loss.item())
        end = time.time()
        
        if batch_idx % 100 == 0:
          print('Batch Index : %d Loss : %.3f Time : %.3f seconds ' % (batch_idx, np.mean(losses), end - start))
      
          start = time.time()
    # Evaluate
    clf.eval()
    total = 0
    correct = 0
    
    with torch.no_grad():
      for batch_idx, (inputs, targets) in enumerate(testloader):
          inputs, targets = inputs.to(device), targets.to(device)

          outputs = clf(inputs)
          _, predicted = torch.max(outputs.data, 1)
          total += targets.size(0)
          correct += predicted.eq(targets.data).cpu().sum()

      print('Epoch : %d Test Acc : %.3f' % (epoch, 100.*correct/total))
      print('--------------------------------------------------------------')
    clf.train()   
            



Batch Index : 0 Loss : 2.450 Time : 1.612 seconds 
Batch Index : 100 Loss : 2.700 Time : 121.797 seconds 
Batch Index : 200 Loss : 2.334 Time : 120.991 seconds 
Batch Index : 300 Loss : 2.167 Time : 120.002 seconds 
Epoch : 0 Test Acc : 35.920
--------------------------------------------------------------
Batch Index : 0 Loss : 1.673 Time : 1.413 seconds 
Batch Index : 100 Loss : 1.615 Time : 121.027 seconds 
Batch Index : 200 Loss : 1.582 Time : 120.643 seconds 
Batch Index : 300 Loss : 1.544 Time : 121.539 seconds 
Epoch : 1 Test Acc : 49.710
--------------------------------------------------------------
Batch Index : 0 Loss : 1.373 Time : 1.596 seconds 
Batch Index : 100 Loss : 1.338 Time : 121.403 seconds 
Batch Index : 200 Loss : 1.308 Time : 122.117 seconds 
Batch Index : 300 Loss : 1.279 Time : 122.142 seconds 
Epoch : 2 Test Acc : 41.440
--------------------------------------------------------------
Batch Index : 0 Loss : 0.952 Time : 1.414 seconds 
Batch Index : 100 Loss : 1.0