
(a) (4 points) Plot for VGG11, VGG11 with batch normalization, ResNet18, ResNet34, DenseNet121 and MobileNet-v3-Small, the Top-1 accuracy on ImageNet vs the inference speed. The value for the Top-1 accuracy of each model can be found on the PyTorch website. Also plot the inference speed vs the number of parameters. Does it scale proportionately? Make sure to set the model on evaluation mode, use `torch.no_grad()` and a GPU. Report the inference speed in ms for one image. Average the inference speed across multiple forward passes. Shortly, describe the trends you observe.

In [1]:
import numpy as np
import torch
import timm

# set the seed for reproducibility of the whole notebook
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():  # GPU operation have separate seed
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.determinstic = True
    torch.backends.cudnn.benchmark = False

def vit_s_8():
    """ViT-S/8 is not a default torchvision model, so we provide it by timm"""
    # Accuracy approximation comes from
    # https://openreview.net/pdf?id=LtKcMgGOeLt
    # and DINO
    # https://arxiv.org/abs/2104.14294
    return timm.create_model('vit_small_patch8_224', pretrained=True)

# import models and weights
from torchvision.models import (
    vit_b_32, ViT_B_32_Weights,
    vgg11, VGG11_Weights,
    vgg11_bn, VGG11_BN_Weights,
    resnet18, ResNet18_Weights,
    densenet121, DenseNet121_Weights,
    mobilenet_v3_small, MobileNet_V3_Small_Weights
)

# models data (name, constructor fn, weights)
models = [
    ('ViT-S/8', vit_s_8, None),
    ('ViT-B/32', vit_b_32, ViT_B_32_Weights),
    ('VGG11', vgg11, VGG11_Weights),
    ('VGG11 (BN)', vgg11_bn, VGG11_BN_Weights),
    ('ResNet18', resnet18, ResNet18_Weights),
    ('DenseNet121', densenet121, DenseNet121_Weights),
    ('MobileNet V3', mobilenet_v3_small, MobileNet_V3_Small_Weights)
]

model_accs = {
    'vit_s_8': 80., # Approximated
    'vit_b_32' : 75.912,
    'vgg11' : 69.02,
    'vgg11_bn' : 70.37,
    'resnet18' : 69.758,
    'densenet121' : 74.434,
    'mobilenet_v3_small' : 67.668,
}

In [68]:
import torch
from PIL import Image

def run_inference(device, model_ctor, weights, batch_size, grad_enabled=False, n_reps=10):
    vit_s = weights is None
    # load model and preprocessing transforms
    if vit_s:  # vit-s/8 is loaded via timm
        model = model_ctor()
        data_config = timm.data.resolve_model_data_config(model)
        transforms = timm.data.create_transform(**data_config, is_training=False)
        imarray = np.random.rand(100,100,3) * 255
        img = Image.fromarray(imarray.astype('uint8')).convert('RGBA')
    else:
        weights = weights.IMAGENET1K_V1
        model = model_ctor(weights=weights, progress=False)
        transforms = weights.transforms()
        img = torch.rand(1, 3, 224, 224).type(torch.uint8)

    # put everything on the device
    model.to(device)        

    # set model to eval
    model.eval()

    # init gpu loggers
    starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
    timings = torch.zeros(n_reps)

    def prepare_img(img):
        if vit_s:
            transformed_img = transforms(img)
            transformed_img = transformed_img.to(device)
        else:
            img = img.to(device)
            transformed_img = transforms(img)
        
        transformed_img = transformed_img.repeat(batch_size, 1, 1, 1)
        assert transformed_img.shape == (batch_size, 3, 224, 224)
        return transformed_img

    # memory check
    with torch.set_grad_enabled(grad_enabled):
        mem_before = torch.cuda.memory_allocated()
        _ = model(prepare_img(img))
        mem_after = torch.cuda.memory_allocated()

    # GPU-WARM-UP
    for _ in range(10):
        _ = model(prepare_img(img))

    # run inference
    with torch.set_grad_enabled(grad_enabled):
        for i in range(n_reps):
            transformed_img = prepare_img(img)
            starter.record()
            y_pred_probs = model(transformed_img)
            y_pred_probs.argmax(dim=1)
            ender.record()

            torch.cuda.synchronize()
            elapsed_time = starter.elapsed_time(ender)
            timings[i] = elapsed_time
    
    stats = {
        'mean_inference_time': timings.mean().item(),
        'memory_usage_before': mem_before,
        'memory_usage_after': mem_after,
    }

    return stats

(b) (2 points) Do you expect the inference speed to increase or decrease without torch.no_grad()? Why? What does torch.no_grad() do? For the same models as in (a), plot the inference speed with and without torch.no_grad().

In [69]:
import pandas as pd

assert torch.cuda.is_available(), "Inference should be run on GPU!"
device = 'cuda'
n_reps = 10
batch_size = 64

index = []
rows = []
for model_name, model_ctor, weights in models[1:]:
    print(f'{model_name}')
    stats_no_grad = run_inference(device, model_ctor, weights, batch_size, grad_enabled=False, n_reps=n_reps)
    stats_grad = run_inference(device, model_ctor, weights, batch_size, grad_enabled=True, n_reps=n_reps)

    index.append(model_name)
    renamed_stats_grad = {f'{k} (grad enabled)': v for k, v in stats_grad.items()}
    stats = {**stats_no_grad, **renamed_stats_grad}
    rows.append(stats)

df = pd.DataFrame.from_records(rows, index=index)
df

ViT-B/32
VGG11
VGG11 (BN)
ResNet18
DenseNet121
MobileNet V3


Unnamed: 0,mean_inference_time,memory_usage_before,memory_usage_after,mean_inference_time (grad enabled),memory_usage_before (grad enabled),memory_usage_after (grad enabled)
ViT-B/32,37.122356,1099036672,1099292672,39.242653,1099036672,3177423872
VGG11,19.181057,1277671936,1277927936,19.176142,1277671936,4402684416
VGG11 (BN),22.229504,1277721088,1277977088,22.341324,1277721088,6303801856
ResNet18,6.819942,793079296,793335296,6.834688,793079296,2214680064
DenseNet121,29.873972,779120128,779376128,29.957325,779120128,9140257792
MobileNet V3,4.605542,756421120,756677120,5.884314,756421120,1808499712


In [54]:
df

Unnamed: 0,inference time (grad enabled),inference time (grad disaabled),accuracy
ViT-B/32,6.876672,4.420813,75.912
VGG11,1.205043,1.091994,69.02
VGG11 (BN),1.576448,1.41312,70.37
ResNet18,2.524262,2.098586,69.758
DenseNet121,14.97344,12.321484,74.434
MobileNet V3,5.348864,4.171162,67.668


c) (2 points) For the same models as in (a), plot the amount of GPU vRAM (you can check this with code by executing torch.cuda.memory_allocated() or with the terminal using nvidia-smi) while conducting a forward pass with torch.no_grad() and without. Does torch.no_grad() influence the memory usage? Why? Make sure to save the output after the forward pass. Use batch_size=64 and report the memory in MB.

Hint: You can create a fake image with torch.rand().