
(a) (4 points) Plot for VGG11, VGG11 with batch normalization, ResNet18, ResNet34, DenseNet121 and MobileNet-v3-Small, the Top-1 accuracy on ImageNet vs the inference speed. The value for the Top-1 accuracy of each model can be found on the PyTorch website. Also plot the inference speed vs the number of parameters. Does it scale proportionately? Make sure to set the model on evaluation mode, use `torch.no_grad()` and a GPU. Report the inference speed in ms for one image. Average the inference speed across multiple forward passes. Shortly, describe the trends you observe.

In [1]:
import numpy as np
import torch
import timm

# set the seed for reproducibility of the whole notebook
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():  # GPU operation have separate seed
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.determinstic = True
    torch.backends.cudnn.benchmark = False

def vit_s_8():
    """ViT-S/8 is not a default torchvision model, so we provide it by timm"""
    # Accuracy approximation comes from
    # https://openreview.net/pdf?id=LtKcMgGOeLt
    # and DINO
    # https://arxiv.org/abs/2104.14294
    return timm.create_model('vit_small_patch8_224', pretrained=True)

# import models and weights
from torchvision.models import (
    vit_b_32, ViT_B_32_Weights,
    vgg11, VGG11_Weights,
    vgg11_bn, VGG11_BN_Weights,
    resnet18, ResNet18_Weights,
    densenet121, DenseNet121_Weights,
    mobilenet_v3_small, MobileNet_V3_Small_Weights
)

# models data (name, constructor fn, weights)
models = [
    ('ViT-S/8', vit_s_8, None),
    ('ViT-B/32', vit_b_32, ViT_B_32_Weights),
    ('VGG11', vgg11, VGG11_Weights),
    ('VGG11 (BN)', vgg11_bn, VGG11_BN_Weights),
    ('ResNet18', resnet18, ResNet18_Weights),
    ('DenseNet121', densenet121, DenseNet121_Weights),
    ('MobileNet V3', mobilenet_v3_small, MobileNet_V3_Small_Weights)
]

model_accs = {
    'vit_s_8': 80., # Approximated
    'vit_b_32' : 75.912,
    'vgg11' : 69.02,
    'vgg11_bn' : 70.37,
    'resnet18' : 69.758,
    'densenet121' : 74.434,
    'mobilenet_v3_small' : 67.668,
}

In [4]:
import torch

def run_inference(device, img, model_ctor, weights, grad_enabled=False, n_reps=10):
    # load model and preprocessing transforms
    if weights is None:  # vit-s/8 is loaded via timm
        model = model_ctor()
        data_config = timm.data.resolve_model_data_config(model)
        transforms = timm.data.create_transform(**data_config, is_training=False)
    else:
        weights = weights.IMAGENET1K_V1
        model = model_ctor(weights=weights, progress=False)
        transforms = weights.transforms()

    # put everything on the device
    model.to(device)
    img.to(device)

    # set model to eval
    model.eval()

    # init gpu loggers
    starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
    timings = torch.zeros(n_reps)

    # GPU-WARM-UP
    for _ in range(10):
        _ = model(img)

    # run inference
    with torch.set_grad_enabled(grad_enabled):
        for i in range(n_reps):
            transformed_img = transforms(img)
            starter.record()
            y_pred_probs = model(transformed_img)
            y_pred_probs.argmax(dim=1)
            ender.record()

            torch.cuda.synchronize()
            elapsed_time = starter.elapsed_time(ender)
            timings[i] = elapsed_time

    return timings

In [5]:
#assert torch.cuda.is_available(), "Inference should be run on GPU!"
device = torch.device('mps')
img = torch.rand(224, 224)
grad_enabled = False
n_reps = 1

for model_name, model_ctor, weights in models[:1]:
    timings = run_inference(device, img, model_ctor, weights, grad_enabled=grad_enabled, n_reps=n_reps)
    timings = timings.mean()
    print(f'Inference time for {model_name} (grad disabled, {n_reps} images): {timings}')

model.safetensors:   0%|          | 0.00/86.7M [00:00<?, ?B/s]

RuntimeError: Tried to instantiate dummy base class Event

(b) (2 points) Do you expect the inference speed to increase or decrease without torch.no_grad()? Why? What does torch.no_grad() do? For the same models as in (a), plot the inference speed with and without torch.no_grad().

In [None]:
img = torch.rand(224, 224)
grad_enabled = True
n_reps = 1

timings = {}
for model_name, model_ctor, weights in models:
    timings[model_name] = run_inference(device, img, model_ctor, weights, grad_enabled, n_reps)

for k, v in timings.items():
    print(f'Inference time for {k} (grad enabled, {n_reps} images): {v}')

c) (2 points) For the same models as in (a), plot the amount of GPU vRAM (you can check this with code by executing torch.cuda.memory_allocated() or with the terminal using nvidia-smi) while conducting a forward pass with torch.no_grad() and without. Does torch.no_grad() influence the memory usage? Why? Make sure to save the output after the forward pass. Use batch_size=64 and report the memory in MB.

Hint: You can create a fake image with torch.rand().