#Image recognition (with ResNets)

The goal is to train a neural network for visual recognition (i.e., given an image, predict its class label). Residual networks (ResNets), i.e., are current state-of-the-art architecture for this type of problem.

**Resources** (data available via PyTorch)
1. CIFAR-10

First we set up our google colab environment.

In [0]:
!apt update -qq;
!wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb;
!dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb;
!apt-key add /var/cuda-repo-8-0-local-ga2/7fa2af80.pub;
!apt-get update -qq;
!apt-get install cuda gcc-5 g++-5 -y -qq;
!ln -s /usr/bin/gcc-5 /usr/local/cuda/bin/gcc;
!ln -s /usr/bin/g++-5 /usr/local/cuda/bin/g++;
!apt install cuda-8.0;


In [0]:
!/usr/local/cuda/bin/nvcc --version

In [0]:
# http://pytorch.org/
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision


Just a little test:

In [0]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

Now we import torch.

In [0]:
import torch

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import math

from torchvision import datasets, transforms
from torch.autograd import Variable

# CIFAR10

In [22]:
# Data

print('==> Preparing data...')

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2) # batch size 128

testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=2) # batch_size 100


==> Preparing data...
Files already downloaded and verified
Files already downloaded and verified


In [0]:
# CIFAR10
train_data = trainloader
test_data = testloader

#ResNet implemenation
We are now able to implement our first ResNet-Model. We use the sample code provided within the PyTorch documentary to explore and tweak the model.

First we define the 3x3/1x1 convolution we will later use in our BasicBlock. Note that the stride is set to one as is the padding in the 3x3 convolution. This should keep the dimension of our plane constant.

In [0]:
def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes,
                     out_planes,
                     kernel_size=3,
                     stride=stride,
                     padding=1,
                     bias=False)

def conv1x1(in_planes, out_planes, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_planes,
                     out_planes,
                     kernel_size = 1,
                     stride = stride,
                     bias=False)

Now we can build the BasicBlock as described in the ResNet paper.

In [0]:
class BasicBlock(nn.Module):
    expansion = 1
    
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride
        
        
    def forward(self, x):
        residual = x # save x
        
        # conv -> bn -> relu
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        
        if self.downsample is not None:
            residual = self.downsample(x)
        
        # x + F(x) - this realizes the shortcut conn.
        out += residual
        out = self.relu(out) # final relu
        
        return out

If we would like to build a deeper Model as described by He et al. (2015) ResNet-50/101/152
 we will also need a Bottleneck.

In [0]:
class Bottleneck(nn.Module):
    """
    The expansion factor controls the number of output
    channels of the last 1x1 convolution layer.
    """
    expansion = 4
    
    def __init__(self, inplanes, planes, stride = 1, downsample=None):
        super(Bottleneck, self).__init__()
        
        self.conv1 = conv1x1(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = conv1x1(planes, planes * self.expansion)
        self.bn3 = nn.BatchNorm2d(planes*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride
        
    def forward(self, x):
        residual = x
    
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        
        out = self.conv3(out)
        out = self.bn3(out)
        
        if self.downsample is not None:
            residual = self.downsample(x)
            
        out += residual
        out = self.relu(out)
        
        return out

We can now specify the ResNet Model. We are also able to change some parameters such as kernel_size, stride, padding or output channels to improve the model.

In [0]:
#from torchvision.models import resnet18

class ResNet(nn.Module):
    
    def __init__(self, block, layers, num_classes=10):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1,
                               bias=False) 
        self.bn1 = nn.BatchNorm2d(64) 
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=1, padding=1) 
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)  
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)  
        self.avgpool = nn.AvgPool2d(4, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)
        
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)      

        return x

With the generic model specified above we can now build a ResNet-18 (18 Layers deep).

In [0]:
kwargs = {}

def resnet18(pretrained=False, **kwargs):
    """Constructs a ResNet-18 model.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
    return model

#Run the ResNet Model
We will now train our Model using the CrossEntropyLoss-Function as this is a classic classification problem where CrossEntropy usually performs well.
Note that we evaluate the model after each epoch to find the number of epochs which will minimize our test error and to observe overfitting.

In [0]:
def evaluate(model, loader):
    model.eval()
    running_corrects = 0
    for x,y in loader:
        x = x.to(device)
        y = y.to(device)

        outputs = model(x)
        _, preds = torch.max(outputs, 1)
        running_corrects += torch.sum(preds == y.data)
      
    print("Accuracy: ", float(running_corrects)/(len(test_data)*64)*100.0)

Considering the benchmark table from https://benchmarks.ai/cifar-10 Yamada et al. 2018 report a Top-1 error for the CIFAR10 dataset of 4.08 for 1800 epochs. 
We take this as our benchmark we would like to come as close as possible with our simple model.
Now, finding the best hyperparameters is the tricky part...

In [0]:
device = 'cuda:0'

model = resnet18()

opt = torch.optim.SGD(model.parameters(), lr=0.5, momentum=0.9)
#opt = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08,
#                       weight_decay=5e-4, amsgrad=False)

criterion = nn.CrossEntropyLoss()

model.to(device)
model.train()

n_epochs = 30
losses = []
#accuracyList = []

for i in range(n_epochs):
    e_loss = 0
    
    for x,y in train_data:
        
        x = x.to(device)
        y = y.to(device)
        
        model.zero_grad()    # zero gradients
        out = model(x)       # forward pass
        loss = criterion(out, y)  # compute loss
        e_loss += loss.item()     # track loss (optional)
        loss.backward()           # compute gradients via backprop.
        opt.step()                # take an optimizer step
        
    if i % 1 == 0:
      losses.append(e_loss)

    print('Epoch {}: {:.4f}'.format(i, e_loss))
    evaluate(model, test_data)
    #print("Accuracy: ", accuracy)