# A complex yet simple efficient training pipeline for CIFAR-10

This pipeline serves an educational purpose, hence that's why it's complex yet simple.

For a more complex and more efficient training pipeline for CIFAR-10, do check [CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds](https://github.com/KellerJordan/cifar10-airbench).

In [1]:
import os
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Optimizer
from torch.optim.lr_scheduler import StepLR
from torchvision.transforms import v2
import torchvision
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader, Dataset
import timm
from tqdm import tqdm

from multiprocessing import freeze_support
import time
from timed_decorator.simple_timed import timed

from pathlib import Path
import sys

In [2]:

LCL_PATH  = str(Path().cwd())
ROOT_PATH = str(Path(LCL_PATH).parent)
DEEPL_PATH = str(Path(ROOT_PATH)/"deep_learning")
print("""
root path:\t{}
local path:\t{}
deep learning path:\t{}""".format(ROOT_PATH, LCL_PATH, DEEPL_PATH))


root path:	/home/gheorghe/Desktop/Proiecte/master/CapitoleAvansateDinReteleNeuronale
local path:	/home/gheorghe/Desktop/Proiecte/master/CapitoleAvansateDinReteleNeuronale/laborator_2
deep learning path:	/home/gheorghe/Desktop/Proiecte/master/CapitoleAvansateDinReteleNeuronale/deep_learning


In [3]:

# adding local_folder to the system path
sys.path.append(ROOT_PATH)
sys.path.append(LCL_PATH)
sys.path.append(DEEPL_PATH)

from sys_function import * # este in root

In [4]:

sys_remove_modules("dataset.dataset_rand_append_unsupervised")
sys_remove_modules("trainer.unsupervised_trainer")
sys_remove_modules("transformers.one_hot")
sys_remove_modules("transformers.label_smoothing")
sys_remove_modules("models.unsupervised.resnet_unsupervised")
sys_remove_modules("conf_manager.train_conf")
sys_remove_modules("checks.tensor_check")

from dataset.dataset_rand_append_unsupervised import *
from trainer.unsupervised_trainer import *
from transformers.one_hot import *
from transformers.label_smoothing import *
from models.unsupervised.resnet_unsupervised import *
from conf_manager.train_conf import *
from checks.tensor_check import *

First we define some configuration variables

In [5]:
disable_compile = True
compile_is_slower = False
BATCH_SIZE = 24
IMAGE_SIZE = 32
NUM_CLASSES = 10

## Data aquisition

### Manipulation/preprocesing

#### Check function

In [6]:

def check_transforms(transform, inputs, shape, max_val, min_val, dtype):
    # check 1000 tests
    for _ in range(1000):
        x = transform(inputs)
        tensor_check(x, shape, max_val, min_val, dtype)
    else:
        print("OK")

#### Transform

In [7]:

def get_transforms(image_size: int):
    # These transformations are cached.
    # We could have used RandomCrop with padding. But we are smart, and we know we cache the initial_transforms so
    # we don't compute them during runtime. Therefore, we do the padding beforehand, and apply cropping only at
    # runtime
    random_choice = v2.RandomChoice([
        v2.RandomPerspective(
                    distortion_scale=0.15, # controls how much each corner can move. 
                    p=1.0),                # probability of applying the effect
        v2.RandomRotation(degrees=30),     # rotates an image with random angle
        v2.RandomAffine(
                    degrees=30,             # rotation ±30
                    translate=(0.15, 0.15), # horizontal/vertical translation as fraction of image
                    scale=(0.75, 1.05),     # scale factor
                    shear=10),              # shear angle ±10°
        v2.RandomCrop(
                    size=image_size,   # height & width of crop
                    padding=4),        # pixels to pad around the image
        v2.RandomResizedCrop(
                    size=image_size,
                    scale=(0.75, 1.),  # range of area proportion to crop from the original image
                    ratio=(0.8,  1.)), # range of aspect ratio (width/height)
        v2.RandomAdjustSharpness(
                    sharpness_factor=1.5, # controls the degree of sharpness; ( >1 sharpened; <1 slightly blurred)
                    p=1.),                      # probability of applying the transform
        v2.RandomAutocontrast(p=1.), # probability of applying the transform
        v2.RandomEqualize( # histogram of pixel values
                    p=1.), # probability of applying the transform
        v2.ColorJitter(  # randomly changes the brightness, contrast, saturation, and hue
                    brightness=0.5, # factor to change brightness
                    contrast=0.3,   # factor to change contrast
                    saturation=0.3, # factor to change saturation
                    hue=0.3,),      # factor to change hue
        v2.GaussianBlur(  # applies a Gaussian blur
                    kernel_size=(7, 7), # size of the Gaussian kernel
                    # standard deviation of the Gaussian kernel; a float or tuple (min, max) for random sampling
                    sigma=(0.1, 5.)),   # how to handle image borders
        v2.RandomErasing(
                    scale=(0.01, 0.15), # range of area ratio to erase (relative to image area)
                    value=10,           # fill value: single number, tuple, or 'random'
                    inplace=False,      # whether to erase in place or return a new image
                    p=1.),              # probability of applying the transform
        v2.Grayscale(num_output_channels=3), # number of channels in output image: 1 or 3
        v2.RandomHorizontalFlip(),
        v2.Identity(),  # returns the input image unchanged
    ])
    transforms = v2.Compose([
        # if use 'ToImage' tensor should be numpy array!!!
        v2.ToImage(), # data are transorm to torch tensor in Dataset manager, tensor should be numpy array!!!
        v2.Resize(
            size=int(image_size*1.3),),
        v2.CenterCrop(image_size),
        random_choice,
        v2.ToDtype(torch.float32, scale=True), # converts uint8 [0,255] -> float32 [0,1]
        v2.Normalize(mean=(0.5, 0.5, 0.5), 
                     std=(0.5, 0.5, 0.5), 
                     inplace=True),
        ])
    # We use the inplace flag because we can safely change the tensors inplace when normalize is used.
    # For is_train=False, we can safely change the tensors inplace because we do it only once, when caching.
    # For is_train=True, we can safely change the tensors inplace because we clone the cached tensors first.

    # Q: How to make this faster?
    # A: Use batched runtime transformations.
    return transforms

In [None]:

transform = get_transforms(IMAGE_SIZE)
inputs = np.zeros((IMAGE_SIZE, IMAGE_SIZE, 3), dtype=np.uint8) + 255
shape = (3, IMAGE_SIZE, IMAGE_SIZE)
check_transforms(transform, inputs, shape, 1, -1, torch.float32)

### Aquisition

In [8]:

cifar10_train = CIFAR10(root="./data", train=True, transform=None, download=True)
print(str(cifar10_train))
cifar10_test = CIFAR10(root="./data", train=False, transform=None, download=True)
print(str(cifar10_test))

Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./data
    Split: Train
Dataset CIFAR10
    Number of datapoints: 10000
    Root location: ./data
    Split: Test


In [9]:

d_train = dict(inputs=np.array(cifar10_train.data, dtype=np.uint8), 
               targets=np.array(cifar10_train.targets, dtype=np.uint16), 
               num_classes=NUM_CLASSES)
d_test  = dict(inputs=np.array(cifar10_test.data,  dtype=np.uint8), 
               targets=np.array(cifar10_test.targets,  dtype=np.uint16), 
               num_classes=NUM_CLASSES)

In [None]:
d_train["inputs"][0].shape

### Efficient in-memory dataset wrapper for caching

Beware that this dataset keeps all data in memory. If it is too large, we might opt to cache the data on the disk and read it in `__getitem__()`.

In [10]:

train_ds = DatasetRandAppendUsupervised(d_train, transform=get_transforms(IMAGE_SIZE), 
                             train=True, freq_rand=10)
test_ds  = DatasetRandAppendUsupervised(d_test,  transform=get_transforms(IMAGE_SIZE), 
                             train=False)

In [None]:

in_shape  = (3, IMAGE_SIZE, IMAGE_SIZE)
out_shape = ()
for idx in range(1000):
    inputs = train_ds[idx]
    tensor_check(inputs, in_shape, 1, -1, torch.float32, arr_type=torchvision.tv_tensors._image.Image)
else:
    print("OK")

In [None]:

train_dl = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, drop_last=False)
test_dl  = DataLoader(test_ds , batch_size=BATCH_SIZE, shuffle=True, num_workers=0, drop_last=False)

In [None]:

in_shape = (BATCH_SIZE, 3, IMAGE_SIZE, IMAGE_SIZE)
for idx, inputs in zip(range(1000), train_dl):
    tensor_check(inputs, in_shape, 1, -1, torch.float32)
else:
    print("OK")

In [None]:

@timed(use_seconds=True, show_args=True, return_time=True)
def load_data(dataset, num_workers: int):
    dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=num_workers, drop_last=False)
    for _ in dataloader:
        pass  # Simulate training

In [None]:

times = []
freeze_support()
for num_workers in range(32, 36):
    _, t0 = load_data(test_ds, num_workers)
    times.append(t0)
print("argmin {}".format(np.argmin(times)))

In [None]:

times = []
freeze_support()
for num_workers in range(32, 36):
    _, t0 = load_data(train_ds, num_workers)
    times.append(t0)
print("argmin {}".format(np.argmin(times)))

In [11]:

# select the best number workers
train_dl = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,  num_workers=30, drop_last=False)
test_dl  = DataLoader(test_ds , batch_size=BATCH_SIZE, shuffle=False, num_workers=30, drop_last=False)

In [12]:

in_shape  = (BATCH_SIZE, 3, IMAGE_SIZE, IMAGE_SIZE)
for idx, inputs in zip(range(1000), train_dl):
    tensor_check(inputs, in_shape, 1, -1, torch.float32)
else:
    print("OK")

OK


In [None]:

in_shape  = (BATCH_SIZE, 3, IMAGE_SIZE, IMAGE_SIZE)
for idx, inputs in zip(range(1000), train_dl):
    tensor_check(inputs, in_shape, 1, -1, torch.float32)
else:
    print("OK")

## Autoencode

This is the classification model, which leverages PyTorch Image Models to create backbones.

Beware that not all backbones have a fully connected (fc) layer at the end. Some of them do, especially the resnet variants.

In [13]:
"""
Input=(img_channels, out_channels, kernel_size, stride)
body={body_name={in_channels, expansion, stride, intermediate_channels, num_residual_blocks}}
Output=(in_features, out_features)
"""
resnet_conf = dict(
    Input=dict(
        img_channels=3, 
        out_channels=9, 
        kernel_size=3, 
        stride=1),
    encode=dict(
        enc_conv_1x=dict(in_channels=9, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=32, 
                     num_residual_blocks=3),
        enc_conv_2x=dict(in_channels=128, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=64, 
                     num_residual_blocks=4),
        enc_conv_3x=dict(in_channels=256, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=256, 
                     num_residual_blocks=3),
    ),
    decode=dict(
        dec_conv_1x=dict(in_channels=1024, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=256, 
                     num_residual_blocks=2),
        dec_conv_2x=dict(in_channels=1024, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=256, 
                     num_residual_blocks=2),
        dec_conv_3x=dict(in_channels=1024, 
                     expansion=4, 
                     stride=2, 
                     intermediate_channels=256, 
                     num_residual_blocks=2),
    ),
    Output=(1024, 3)
)

In [14]:

model = ResNetUnsupervised("resnet_32x32_cifar", **resnet_conf)

In [None]:
model

In [15]:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
trainer_obj = UnsupervisedTrainer(
            model,
            optimizer,
            criterion,
            use_cpu=False,
            type_compile="normal",
            disable_tqdm=True, )

Using device: cuda


  _C._set_float32_matmul_precision(precision)


In [16]:

EPOCH = 100
history_path = "{}/{}".format(LCL_PATH, "logs/unsupervised_deep_5x_conf_logs.csv")
run_conf_obj = RunConfigs(model, trainer_obj, epochs=EPOCH, train_dl=train_dl, val_dl=test_dl, history_path=history_path)

is_epoch None


In [17]:

# name:(optimizer, (lr_scheduler=..., opt_hyperparameters=..., lr=float))

multiple_runs_conf = dict(
    adam=dict(optimizer=torch.optim.Adam,
            opt_hyperparameters=None,
            lr=0.001,
            lr_scheduler=None,
            lr_scheduler_hyperparameters=None,
    ),
    sgd=dict(optimizer=torch.optim.SGD, 
            opt_hyperparameters=None,
            lr=0.001,
            lr_scheduler=None,
            lr_scheduler_hyperparameters=None,
    ),
    sgd_momentum_nesterov=dict(optimizer=torch.optim.SGD, 
            opt_hyperparameters=dict(momentum=0.9, nesterov=True),
            lr=0.001,
            lr_scheduler=None,
            lr_scheduler_hyperparameters=None,
    ),
    sgd_momentum_nesterov_weight_decay=dict(optimizer=torch.optim.SGD, 
            opt_hyperparameters=dict(momentum=0.9, nesterov=True, weight_decay=0.01, fused=True),
            lr=0.001,
            lr_scheduler=None,
            lr_scheduler_hyperparameters=None,
    ),
    adamW=dict(optimizer=torch.optim.AdamW,
            opt_hyperparameters=dict(fused=True),
            lr=0.001,
            lr_scheduler=None,
            lr_scheduler_hyperparameters=None,
    ),
    # scheduler
    sgd_scheduler=dict(optimizer=torch.optim.SGD,
            opt_hyperparameters=None,
            lr=0.001,
            lr_scheduler=torch.optim.lr_scheduler.OneCycleLR,
            lr_scheduler_hyperparameters=dict(max_lr=0.1, steps_per_epoch=10, epochs=EPOCH), 
    ),
    sgd_momentum_nesterov_scheduler=dict(optimizer=torch.optim.SGD,
            opt_hyperparameters=dict(momentum=0.9, nesterov=True),
            lr=0.001,
            lr_scheduler=torch.optim.lr_scheduler.OneCycleLR,
            lr_scheduler_hyperparameters=dict(max_lr=0.1, steps_per_epoch=10, epochs=EPOCH), 
    ),
    adam_scheduler=dict(optimizer=torch.optim.Adam, 
            opt_hyperparameters=None,
            lr=0.001,
            lr_scheduler=torch.optim.lr_scheduler.OneCycleLR,
            lr_scheduler_hyperparameters=dict(max_lr=0.1, steps_per_epoch=10, epochs=EPOCH), 
    ),
)
multiple_runs_conf

{'adam': {'optimizer': torch.optim.adam.Adam,
  'opt_hyperparameters': None,
  'lr': 0.001,
  'lr_scheduler': None,
  'lr_scheduler_hyperparameters': None},
 'sgd': {'optimizer': torch.optim.sgd.SGD,
  'opt_hyperparameters': None,
  'lr': 0.001,
  'lr_scheduler': None,
  'lr_scheduler_hyperparameters': None},
 'sgd_momentum_nesterov': {'optimizer': torch.optim.sgd.SGD,
  'opt_hyperparameters': {'momentum': 0.9, 'nesterov': True},
  'lr': 0.001,
  'lr_scheduler': None,
  'lr_scheduler_hyperparameters': None},
 'sgd_momentum_nesterov_weight_decay': {'optimizer': torch.optim.sgd.SGD,
  'opt_hyperparameters': {'momentum': 0.9,
   'nesterov': True,
   'weight_decay': 0.01,
   'fused': True},
  'lr': 0.001,
  'lr_scheduler': None,
  'lr_scheduler_hyperparameters': None},
 'adamW': {'optimizer': torch.optim.adamw.AdamW,
  'opt_hyperparameters': {'fused': True},
  'lr': 0.001,
  'lr_scheduler': None,
  'lr_scheduler_hyperparameters': None},
 'sgd_scheduler': {'optimizer': torch.optim.sgd.SGD,


In [18]:

run_conf_obj(**multiple_runs_conf)

Start running name: 'adam', conf {'optimizer': <class 'torch.optim.adam.Adam'>, 'opt_hyperparameters': None, 'lr': 0.001, 'lr_scheduler': None, 'lr_scheduler_hyperparameters': None}
Running 100 epochs


Training: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [4:47:12<00:00, 172.33s/it, train_loss=4.96e-5, val_loss=5.53e-5]


Start running name: 'sgd', conf {'optimizer': <class 'torch.optim.sgd.SGD'>, 'opt_hyperparameters': None, 'lr': 0.001, 'lr_scheduler': None, 'lr_scheduler_hyperparameters': None}
Running 100 epochs


Training:  27%|██████████████████████████████████████████▋                                                                                                                   | 27/100 [1:17:50<3:30:26, 172.97s/it, train_loss=0.0025, val_loss=0.00192]


KeyboardInterrupt: 

The comments are self-explainatory. If you do not know what a transformation does, the official documentation is your friend.
Reading documentation helps your brain.

The full training script is available in [complex_yet_simple_training_pipeline.py](./complex_yet_simple_training_pipeline.py).

## Excercises

1. Create your own efficient training pipeline for CIFAR-10.
2. Adapt your pipeline (and this pipeline) to use some batched transformations. Measure the speedup!
3. Adapt your pipeline (and this pipeline) to include Automatic Mixed Precision. Read the documentation first!
4. Adjust your pipeline (or this pipeline) to achieve 96% on CIFAR-10 (hard). You may change the model, but pretrained weights are forbidden.

---

| All     | [advanced_pytorch/](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/advanced_pytorch) |
|---------|-- |
| Current | [A complex yet simple efficient training pipeline for CIFAR-10](https://github.com/Tensor-Reloaded/AI-Learning-Hub/blob/main/resources/advanced_pytorch/ComplexYetSimpleTrainingPipeline.ipynb) |