# CIFAR-10 Image Classifier using Resnet50(Transfer Learning)


In [1]:
!pip install ColossalAI deepspeed

Collecting ColossalAI
  Downloading colossalai-0.0.1b0-py3-none-any.whl (234 kB)
[K     |████████████████████████████████| 234 kB 4.4 MB/s 
[?25hCollecting deepspeed
  Downloading deepspeed-0.5.8.tar.gz (517 kB)
[K     |████████████████████████████████| 517 kB 43.3 MB/s 
Collecting tensorboardX
  Downloading tensorboardX-2.4.1-py2.py3-none-any.whl (124 kB)
[K     |████████████████████████████████| 124 kB 46.9 MB/s 
Collecting ninja
  Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
[K     |████████████████████████████████| 108 kB 51.8 MB/s 
[?25hCollecting hjson
  Downloading hjson-3.0.2-py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 2.7 MB/s 
[?25hCollecting triton
  Downloading triton-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
[K     |████████████████████████████████| 18.2 MB 221 kB/s 
Building wheels for collected packages: deepspeed
  Building wheel for deepspeed (setup.py) 

In [2]:
import colossalai
from colossalai.engine import Engine, NoPipelineSchedule
from colossalai.trainer import Trainer
from colossalai.context import Config
import torch

Colossalai should be built with cuda extension to use the FP16 optimizer
Colossalai should be built with cuda extension to use the FP16 optimizer
apex is required for mixed precision training


In [3]:
import os
import torch
import torchvision
import tarfile
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data import random_split


First, we should initialize distributed environment. Though we just use single GPU in this example, we still need initialize distributed environment for compatibility. We just consider the simplest case here, so we just set the number of parallel processes to 1.

In [4]:
parallel_cfg = Config(dict(parallel=dict(
    data=dict(size=1),
    pipeline=dict(size=1),
    tensor=dict(size=1, mode=None),
)))
colossalai.init_dist(config=parallel_cfg,
          local_rank=0,
          world_size=1,
          host='127.0.0.1',
          port=8888,
          backend='nccl')

colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,971 INFO: Added key: store_based_barrier_key:1 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,973 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,980 INFO: Added key: store_based_barrier_key:2 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,984 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,987 INFO: Added key: store_based_barrier_key:3 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2021-12-06 04:24:56,991 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 1 nodes.


process rank 0 is bound to device 0


Load and normalize the CIFAR10 training and test datasets using `colossalai.nn.data`. Note that we have wrapped `torchvision.transforms`, so that we can simply use the config dict to use them.

Also doing some preprocessing on the input for better training.

In [5]:

transform_cfg = [
    dict(type='ToTensor'),
    dict(type='Normalize',
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010]),
]
transform_cfg1 = [
    dict(type='RandomCrop',size=32,padding_mode="reflect"),
    dict(type='RandomHorizontalFlip'),
    # dict(type='RandomResizedCrop',size=256,scale=(0.5,0.9), ratio=(1, 1)),
    # dict(type='ColorJitter',brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
    dict(type='ToTensor'),
    dict(type='Normalize',
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010]),
]

batch_size = 128

trainset = colossalai.nn.data.CIFAR10Dataset(transform_cfg1, root='./data', train=True,download=True)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

testset = colossalai.nn.data.CIFAR10Dataset(transform_cfg, root='./data', train=False,download=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


We just define a Resnet50 Convolutional Neural Network here and replace last layer with a fully connected layer with appropriate inputs.

In [10]:
import torch.nn as nn
import torch.nn.functional as F

from torchvision import models

class Cifar(nn.Module):
    def __init__(self, num_classes, pretrained=True):
        super().__init__()
        # Use a pretrained model
        self.network = models.resnet50(pretrained=pretrained)
        # Replace last layer
        self.network.fc = nn.Linear(self.network.fc.in_features, num_classes)

    def forward(self, xb):
        return self.network(xb)
model = Cifar(10).cuda()

Define a Loss function and optimizer. And then we use them to initialize `Engine` and `Trainer`. We provide various training / evaluating hooks. In this case, we just use the simplest hooks which can compute and print loss and accuracy. Using SGD optimizer with lr = .001 and CrossEntropyLoss


In [11]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=1e-4)
schedule = NoPipelineSchedule()
engine = Engine(
        model=model,
        criterion=criterion,
        optimizer=optimizer,
        lr_scheduler=None,
        schedule=schedule
    )
trainer = Trainer(engine=engine,
          hooks_cfg=[dict(type='LossHook'), dict(type='LogMetricByEpochHook'), dict(type='AccuracyHook')],
          verbose=True)

colossalai - rank_0 - 2021-12-06 04:38:37,365 INFO: build LogMetricByEpochHook for train, priority = 1
colossalai - rank_0 - 2021-12-06 04:38:37,366 INFO: build LossHook for train, priority = 10
colossalai - rank_0 - 2021-12-06 04:38:37,370 INFO: build AccuracyHook for train, priority = 10


Then we set training configs. We train our model for 5 epochs and it will be evaluated every 1 epoch. Set `display_progress` to `True` to display the training / evaluating progress bar.

In [12]:
%%time
num_epochs = 5
test_interval = 1
trainer.fit(
        train_dataloader=trainloader,
        test_dataloader=testloader,
        max_epochs=num_epochs,
        display_progress=True,
        test_interval=test_interval
    )

[Epoch 0 train]: 100%|██████████| 391/391 [01:02<00:00,  6.23it/s]
colossalai - rank_0 - 2021-12-06 04:39:51,454 INFO: Training - Epoch 1 - LogMetricByEpochHook: Loss = 1.10606
[Epoch 0 val]: 100%|██████████| 79/79 [00:04<00:00, 18.39it/s]
colossalai - rank_0 - 2021-12-06 04:39:55,958 INFO: Testing - Epoch 1 - LogMetricByEpochHook: Loss = 0.70309, Accuracy = 0.75960
[Epoch 1 train]: 100%|██████████| 391/391 [01:02<00:00,  6.24it/s]
colossalai - rank_0 - 2021-12-06 04:40:58,868 INFO: Training - Epoch 2 - LogMetricByEpochHook: Loss = 0.59152
[Epoch 1 val]: 100%|██████████| 79/79 [00:04<00:00, 18.56it/s]
colossalai - rank_0 - 2021-12-06 04:41:03,326 INFO: Testing - Epoch 2 - LogMetricByEpochHook: Loss = 0.58491, Accuracy = 0.80020
[Epoch 2 train]: 100%|██████████| 391/391 [01:02<00:00,  6.25it/s]
colossalai - rank_0 - 2021-12-06 04:42:06,253 INFO: Training - Epoch 3 - LogMetricByEpochHook: Loss = 0.44258
[Epoch 2 val]: 100%|██████████| 79/79 [00:04<00:00, 18.24it/s]
colossalai - rank_0 - 

CPU times: user 4min 48s, sys: 39.5 s, total: 5min 28s
Wall time: 5min 37s


Same thing with lr= 1e-4

In [13]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)

# optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=1e-4)
schedule = NoPipelineSchedule()
engine = Engine(
        model=model,
        criterion=criterion,
        optimizer=optimizer,
        lr_scheduler=None,
        schedule=schedule
    )
trainer = Trainer(engine=engine,
          hooks_cfg=[dict(type='LossHook'), dict(type='LogMetricByEpochHook'), dict(type='AccuracyHook')],
          verbose=True)

colossalai - rank_0 - 2021-12-06 04:44:45,608 INFO: build LogMetricByEpochHook for train, priority = 1
colossalai - rank_0 - 2021-12-06 04:44:45,611 INFO: build LossHook for train, priority = 10
colossalai - rank_0 - 2021-12-06 04:44:45,614 INFO: build AccuracyHook for train, priority = 10


5 Epochs

In [14]:
%%time
num_epochs = 5
test_interval = 1
trainer.fit(
        train_dataloader=trainloader,
        test_dataloader=testloader,
        max_epochs=num_epochs,
        display_progress=True,
        test_interval=test_interval
    )

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f6caf3773b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f6caf3773b0>
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
Traceback (most recent call last):
    if w.is_alive():
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
    assert self._parent_pid == os.getpid(), 'can only test a child process'
    if w.is_alive():
AssertionError: can only test a child process
  File "/usr/lib/pytho

CPU times: user 4min 48s, sys: 39.6 s, total: 5min 28s
Wall time: 5min 37s


We are already at 83.8% accuracy.
Finally using Adam Optimizer with lr=1e-4 and weight decay= 1e-4

In [15]:
criterion = nn.CrossEntropyLoss()
# optimizer = optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-4)
schedule = NoPipelineSchedule()
engine = Engine(
        model=model,
        criterion=criterion,
        optimizer=optimizer,
        lr_scheduler=None,
        schedule=schedule
    )
trainer = Trainer(engine=engine,
          hooks_cfg=[dict(type='LossHook'), dict(type='LogMetricByEpochHook'), dict(type='AccuracyHook')],
          verbose=True)

colossalai - rank_0 - 2021-12-06 04:50:42,975 INFO: build LogMetricByEpochHook for train, priority = 1
colossalai - rank_0 - 2021-12-06 04:50:42,977 INFO: build LossHook for train, priority = 10
colossalai - rank_0 - 2021-12-06 04:50:42,978 INFO: build AccuracyHook for train, priority = 10


5 epochs

In [16]:
%%time
num_epochs = 5
test_interval = 1
trainer.fit(
        train_dataloader=trainloader,
        test_dataloader=testloader,
        max_epochs=num_epochs,
        display_progress=True,
        test_interval=test_interval
    )

[Epoch 0 train]: 100%|██████████| 391/391 [01:07<00:00,  5.80it/s]
colossalai - rank_0 - 2021-12-06 04:52:12,838 INFO: Training - Epoch 1 - LogMetricByEpochHook: Loss = 0.34013
[Epoch 0 val]: 100%|██████████| 79/79 [00:04<00:00, 18.52it/s]
colossalai - rank_0 - 2021-12-06 04:52:17,290 INFO: Testing - Epoch 1 - LogMetricByEpochHook: Loss = 0.51375, Accuracy = 0.83460
[Epoch 1 train]: 100%|██████████| 391/391 [01:06<00:00,  5.84it/s]
colossalai - rank_0 - 2021-12-06 04:53:24,492 INFO: Training - Epoch 2 - LogMetricByEpochHook: Loss = 0.26153
[Epoch 1 val]: 100%|██████████| 79/79 [00:04<00:00, 18.42it/s]
colossalai - rank_0 - 2021-12-06 04:53:28,980 INFO: Testing - Epoch 2 - LogMetricByEpochHook: Loss = 0.50180, Accuracy = 0.84080
[Epoch 2 train]: 100%|██████████| 391/391 [01:07<00:00,  5.83it/s]
colossalai - rank_0 - 2021-12-06 04:54:36,236 INFO: Training - Epoch 3 - LogMetricByEpochHook: Loss = 0.20791
[Epoch 2 val]: 100%|██████████| 79/79 [00:04<00:00, 18.46it/s]
colossalai - rank_0 - 

CPU times: user 4min 29s, sys: 1min 18s, total: 5min 47s
Wall time: 5min 58s


We can see that 85.3% accuracy has been achieved in only 17 mins of training. This could be only possible due to the easy to implement code of colossalAI which let me train on a gpu without having to change my coding habits of normal pytorch code.

Refrences:
[ColossalAI Cifar10 example](https://github.com/hpcaitech/ColossalAI/blob/main/examples/colossal_cifar_demo.ipynb)