# Introduction

In this notebook we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.

### Prepare Environment

Before you start with APIs delivered by BigDL-Nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to setup your environment.

### Load Cifar10 Dataset

Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.

Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.

In [14]:
from torch.utils.data import DataLoader, Subset

from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import CIFAR10

def create_dataloader(data_path, batch_size):
    train_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.ColorJitter(),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.Resize(128),
        transforms.ToTensor()
    ])

    full_dataset = CIFAR10(root=data_path, train=True,
                           download=True, transform=train_transform)

    # use a subset of full dataset to shorten the training time
    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 20)))

    train_loader = DataLoader(train_dataset, batch_size=batch_size,
                              shuffle=True, num_workers=0)

    return train_loader

### Custom Model

We use the Resnet18 module but add a Linear layer to change its output size to 10, because the CIFAR10 dataset has 10 classes.

In [15]:
from torch import nn

from bigdl.nano.pytorch.vision.models import vision

class ResNet18(nn.Module):
    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
        super().__init__()
        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
        output_size = backbone.get_output_size()
        head = nn.Linear(output_size, num_classes)
        self.model = nn.Sequential(backbone, head)

    def forward(self, x):
        return self.model(x)

### Define Train Loop

Suppose the custom train loop is as follows:

In [16]:
import os
import torch

data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01

model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)

model.train()

for _i in range(max_epochs):
    total_loss, num = 0, 0
    for X, y in train_loader:
        optimizer.zero_grad()
        l = loss(model(X), y)
        l.backward()
        optimizer.step()
        
        total_loss += l.sum()
        num += 1
    print(f'avg_loss: {total_loss / num}')

Files already downloaded and verified
avg_loss: 3.4890761375427246
avg_loss: 2.6124730110168457
avg_loss: 2.3601737022399902
avg_loss: 2.2400059700012207
avg_loss: 2.140841484069824
avg_loss: 2.0801239013671875
avg_loss: 2.0631775856018066
avg_loss: 2.008615493774414
avg_loss: 1.989851951599121
avg_loss: 1.9810504913330078


The `LightningLite` (`bigdl.nano.pytorch.lite.LightningLite`) class is the place where we integrate most optimizations. It extends PyTorch Lightning's `LightningLite` class and has a few more parameters and methods specific to BigDL-Nano.

We can accelerate the train loop above by the following steps:

- define a class `Lite` derived from our `LightningLite`
- copy all codes into the `run` method of `Lite`
- add two extra lines to setup model, optimizer and dataloader
- change the backward call

In [17]:
import os
import torch

from bigdl.nano.pytorch.lite import LightningLite

class Lite(LightningLite):
    def run(self):
        # copy all codes into this method
        data_path = os.environ.get("DATA_PATH", ".")
        batch_size = 256
        max_epochs = 10
        lr = 0.01

        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
        loss = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        train_loader = create_dataloader(data_path, batch_size)

        model, optimizer = self.setup(model, optimizer)      # add this line to setup model and optimizer
        train_loader = self.setup_dataloaders(train_loader)  # add this line to setup dataloader
        model.train()

        for _i in range(max_epochs):
            total_loss, num = 0, 0
            for X, y in train_loader:
                optimizer.zero_grad()
                l = loss(model(X), y)
                self.backward(l)  # change the backward call
                optimizer.step()
                
                total_loss += l.sum()
                num += 1
            print(f'avg_loss: {total_loss / num}')

### Train in Non-distributed Mode

To run the train loop, we only need to create an instance of `LightningLite` and call its `run` method.

In [18]:
Lite().run()

Files already downloaded and verified
avg_loss: 3.367298126220703
avg_loss: 2.558518886566162
avg_loss: 2.347032070159912
avg_loss: 2.122859477996826
avg_loss: 2.083937883377075
avg_loss: 2.042186737060547
avg_loss: 2.0038819313049316
avg_loss: 1.9734426736831665
avg_loss: 1.972240686416626
avg_loss: 1.9732662439346313


Intel Extension for Pytorch (a.k.a [IPEX](https://github.com/intel/intel-extension-for-pytorch)) extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `LightningLite`, you can turn on IPEX optimization by setting `use_ipex=True`.

In [19]:
Lite(use_ipex=True).run()

Files already downloaded and verified




avg_loss: 3.6688168048858643
avg_loss: 2.7911651134490967
avg_loss: 2.3961710929870605
avg_loss: 2.2255048751831055
avg_loss: 2.1438281536102295
avg_loss: 2.08095121383667
avg_loss: 2.044187068939209
avg_loss: 2.0520944595336914
avg_loss: 2.0577313899993896
avg_loss: 2.0262503623962402


### Train in Distributed Mode

You can set the number of processes to enable distributed training to acclerate training. You can also set different distributed strategies, now BigDL-Nano supports 'spawn', 'subprocess' and 'ray', the default strategy is 'subprocess'.

- Note: only the 'subprocess' strategy can be used in interactive environment.

In [20]:
Lite(num_processes=2, strategy="subprocess").run()

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------



Files already downloaded and verified
Files already downloaded and verified
avg_loss: 3.551307201385498
avg_loss: 3.5610828399658203
avg_loss: 2.8926703929901123
avg_loss: 2.9128172397613525
avg_loss: 2.4087586402893066
avg_loss: 2.458486557006836
avg_loss: 2.3466358184814453avg_loss: 2.3318121433258057

avg_loss: 2.183196544647217
avg_loss: 2.2092738151550293
avg_loss: 2.1372807025909424
avg_loss: 2.129611015319824
avg_loss: 2.079845905303955
avg_loss: 2.0982301235198975
avg_loss: 2.0651657581329346
avg_loss: 2.0590758323669434
avg_loss: 2.0040674209594727avg_loss: 2.0231480598449707

avg_loss: 1.9930778741836548
avg_loss: 1.9897146224975586


Of course you can enable both distributed training and IPEX

In [21]:
Lite(use_ipex=True, num_processes=2, strategy="subprocess").run()

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------



Files already downloaded and verified
Files already downloaded and verified


2022-07-20 15:12:28,746 - root - INFO - Reducer buckets have been rebuilt in this iteration.
2022-07-20 15:12:28,746 - root - INFO - Reducer buckets have been rebuilt in this iteration.


avg_loss: 3.7563118934631348
avg_loss: 3.6480026245117188
avg_loss: 3.02734637260437
avg_loss: 3.014892101287842
avg_loss: 2.8125622272491455
avg_loss: 2.8375699520111084
avg_loss: 2.434741497039795
avg_loss: 2.4648845195770264
avg_loss: 2.283917188644409
avg_loss: 2.327892780303955
avg_loss: 2.189699411392212
avg_loss: 2.1382477283477783
avg_loss: 2.1003966331481934
avg_loss: 2.1151137351989746
avg_loss: 2.056234836578369
avg_loss: 2.065807342529297
avg_loss: 2.0304946899414062
avg_loss: 2.0351834297180176
avg_loss: 1.9886785745620728
avg_loss: 1.9950816631317139
