# DAWNBench ResNet with PyTorch
A DAWN Bench solution for CIFAR10 using ResNet18. We use the PyTorch library and vision tools to train a classifier for CIFAR10 in around 6 mins with an accuracy of 94% on the test set.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import time

We ensure we are using the GPU and setup parameters

In [2]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if not torch.cuda.is_available():
    print("Warning CUDA not Found. Using CPU")

# Hyper-parameters
num_epochs = 35
learning_rate = 0.1



Setup all the data regularisation through weak augmentation of images. Without this, the models will always overfit and give low (~80%) accuracy on the test set. Note the different regularisations pipelines for the training and test sets, though the initial pre-processing is the same.

In [3]:
transform_train = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4, padding_mode='reflect'),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

Load the standard dataset using PyTorch data pipelines.

In [4]:
trainset = torchvision.datasets.CIFAR10(
    root='cifar10', train=True, download=True, transform=transform_train)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True) #num_workers=6

testset = torchvision.datasets.CIFAR10(
    root='cifar10', train=False, download=True, transform=transform_test)
test_loader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False) #num_workers=6

100%|██████████| 170M/170M [01:16<00:00, 2.23MB/s] 


Now setup the ResNet model from scratch. You could use the pre-built model from within PyTorch, but this is more instructive from our own version. Original version of the code can be found [here](https://github.com/kuangliu/pytorch-cifar). Start with the basic ResNet block.

In [5]:
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

Then construct the general ResNet network that is capable of having its various parameters/depth defined to create different versions. Then define the actual ResNet18 that is needed for CIFAR10.

In [6]:
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

Declare the actual model for use and move it to the GPU. We print the model info to check.

In [7]:
model = ResNet18()
model = model.to(device)

if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))

#model info
print("Model No. of Parameters:", sum([param.nelement() for param in model.parameters()]))
print(model)

Model No. of Parameters: 11173962
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64

Define the optimization parameters, losses etc. We will use a piece-wise linear learning rate schedule to adjust the learning rate right up to the pre-defined number of epochs for quickest training.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=5e-4)
#Piecewise Linear Schedule
total_step = len(train_loader)
sched_linear_1 = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.005, max_lr=learning_rate, step_size_up=15, step_size_down=15, mode="triangular")
sched_linear_3 = torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=0.005/learning_rate, end_factor=0.005/5)
scheduler = torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers=[sched_linear_1, sched_linear_3], milestones=[30])

TypeError: LinearLR.__init__() got an unexpected keyword argument 'verbose'

Train the model with a simple training loop and report loss progress periodically.

In [None]:
# Train the model
model.train()
print("> Training")
start = time.time() #time generation
for epoch in range(num_epochs):

    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.5f}"
                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    scheduler.step()
end = time.time()
elapsed = end - start
print(f"Training took {elapsed:.2f} secs or {elapsed/60:.2f} mins in total")

> Training
Epoch [1/35], Step [100/391] Loss: 1.88376
Epoch [1/35], Step [200/391] Loss: 1.77183
Epoch [1/35], Step [300/391] Loss: 1.70245
Epoch [2/35], Step [100/391] Loss: 1.51951
Epoch [2/35], Step [200/391] Loss: 1.34386
Epoch [2/35], Step [300/391] Loss: 0.99767
Epoch [3/35], Step [100/391] Loss: 1.09128
Epoch [3/35], Step [200/391] Loss: 0.95656
Epoch [3/35], Step [300/391] Loss: 0.84427
Epoch [4/35], Step [100/391] Loss: 0.70018
Epoch [4/35], Step [200/391] Loss: 0.59718
Epoch [4/35], Step [300/391] Loss: 0.57562
Epoch [5/35], Step [100/391] Loss: 0.54450
Epoch [5/35], Step [200/391] Loss: 0.58696
Epoch [5/35], Step [300/391] Loss: 0.60926
Epoch [6/35], Step [100/391] Loss: 0.46383
Epoch [6/35], Step [200/391] Loss: 0.47299
Epoch [6/35], Step [300/391] Loss: 0.58954
Epoch [7/35], Step [100/391] Loss: 0.57953
Epoch [7/35], Step [200/391] Loss: 0.57442
Epoch [7/35], Step [300/391] Loss: 0.57496
Epoch [8/35], Step [100/391] Loss: 0.32289
Epoch [8/35], Step [200/391] Loss: 0.44576




Epoch [31/35], Step [100/391] Loss: 0.04588
Epoch [31/35], Step [200/391] Loss: 0.01889
Epoch [31/35], Step [300/391] Loss: 0.03945
Epoch [32/35], Step [100/391] Loss: 0.03248
Epoch [32/35], Step [200/391] Loss: 0.01975
Epoch [32/35], Step [300/391] Loss: 0.07756
Epoch [33/35], Step [100/391] Loss: 0.02008
Epoch [33/35], Step [200/391] Loss: 0.04290
Epoch [33/35], Step [300/391] Loss: 0.02206
Epoch [34/35], Step [100/391] Loss: 0.04811
Epoch [34/35], Step [200/391] Loss: 0.02198
Epoch [34/35], Step [300/391] Loss: 0.06172
Epoch [35/35], Step [100/391] Loss: 0.05103
Epoch [35/35], Step [200/391] Loss: 0.02956
Epoch [35/35], Step [300/391] Loss: 0.03899
Training took 1534.5668334960938 secs or 25.576113891601562 mins in total


Test the model through model inference and report the total testing accuracy.

In [None]:
# Test the model
print("> Testing")
start = time.time() #time generation
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy: {} %'.format(100 * correct / total))
end = time.time()
elapsed = end - start
print("Training took " + str(elapsed) + " secs or " + str(elapsed/60) + " mins in total")

> Testing
Test Accuracy: 93.82 %
Training took 4.410189867019653 secs or 0.07350316445032755 mins in total
