# Introduction

In this notebook we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.

### Prepare Environment

Before you start with APIs delivered by BigDL-Nano, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to setup your environment.

### Load Cifar10 Dataset

Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.

Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.

In [9]:
from torch.utils.data import DataLoader, Subset

from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import CIFAR10

def create_dataloader(data_path, batch_size):
    train_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.ColorJitter(),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.Resize(128),
        transforms.ToTensor()
    ])

    full_dataset = CIFAR10(root=data_path, train=True,
                           download=True, transform=train_transform)

    # use a subset of full dataset to shorten the training time
    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))

    train_loader = DataLoader(train_dataset, batch_size=batch_size,
                              shuffle=True, num_workers=0)

    return train_loader

### Custom Model

We use the Resnet18 module but add a Linear layer to change its output size to 10, because the CIFAR10 dataset has 10 classes.

In [10]:
from torch import nn

from bigdl.nano.pytorch.vision.models import vision

class ResNet18(nn.Module):
    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
        super().__init__()
        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
        output_size = backbone.get_output_size()
        head = nn.Linear(output_size, num_classes)
        self.model = nn.Sequential(backbone, head)

    def forward(self, x):
        return self.model(x)

### Define Train Loop

Suppose the custom train loop is as follows:

In [11]:
import os
import torch

data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01

model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)

model.train()

for _i in range(max_epochs):
    total_loss, num = 0, 0
    for X, y in train_loader:
        optimizer.zero_grad()
        loss = loss_func(model(X), y)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.sum()
        num += 1
    print(f'avg_loss: {total_loss / num}')

Files already downloaded and verified
avg_loss: 3.538733959197998
avg_loss: 3.2777256965637207
avg_loss: 3.0464797019958496
avg_loss: 2.5146260261535645
avg_loss: 2.3445262908935547
avg_loss: 2.301717519760132
avg_loss: 2.1688358783721924
avg_loss: 2.1054847240448
avg_loss: 2.1210415363311768
avg_loss: 2.1197376251220703


The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.

We only need the following steps:

- define a class `MyNano` derived from our `TorchNano`
- copy all lines of code into the `train` method of `MyNano`
- add one line to setup model, optimizer and dataloader
- replace the `loss.backward()` with `self.backward(loss)`

In [12]:
import os
import torch

from bigdl.nano.pytorch import TorchNano

class MyNano(TorchNano):
    def train(self):
        # copy all lines of code into this method
        data_path = os.environ.get("DATA_PATH", ".")
        batch_size = 256
        max_epochs = 10
        lr = 0.01

        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
        loss_func = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        train_loader = create_dataloader(data_path, batch_size)

        # add this line to setup model, optimizer and dataloaders
        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)

        model.train()

        for _i in range(max_epochs):
            total_loss, num = 0, 0
            for X, y in train_loader:
                optimizer.zero_grad()
                loss = loss_func(model(X), y)
                self.backward(loss)
                optimizer.step()
                
                total_loss += loss.sum()
                num += 1
            print(f'avg_loss: {total_loss / num}')

### Train in Non-distributed Mode

To run the train loop, we only need to create an instance of `MyNano` and call its `train` method.

In [13]:
MyNano().train()

Files already downloaded and verified
avg_loss: 3.8871588706970215
avg_loss: 4.426192283630371
avg_loss: 3.148921251296997
avg_loss: 2.879124641418457
avg_loss: 2.5443203449249268
avg_loss: 2.3415424823760986
avg_loss: 2.2631752490997314
avg_loss: 2.1276562213897705
avg_loss: 2.108708143234253
avg_loss: 2.091210126876831


Intel Extension for Pytorch (a.k.a [IPEX](https://github.com/intel/intel-extension-for-pytorch)) extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `TorchNano`, you can turn on IPEX optimization by setting `use_ipex=True`.

In [14]:
MyNano(use_ipex=True).train()

Files already downloaded and verified




avg_loss: 3.9080657958984375
avg_loss: 3.6406493186950684
avg_loss: 3.0531580448150635
avg_loss: 2.6435179710388184
avg_loss: 2.420058488845825
avg_loss: 2.249678134918213
avg_loss: 2.1675872802734375
avg_loss: 2.126716136932373
avg_loss: 2.095839500427246
avg_loss: 2.0563669204711914


### Train in Distributed Mode

You can set the number of processes to enable distributed training to acclerate training. You can also set different distributed backends, now BigDL-Nano supports `spawn`, `subprocess` and `ray`, the default backend is `subprocess`.

- Note: only the `subprocess` backend can be used in interactive environment.

In [15]:
MyNano(num_processes=2, distributed_backend="subprocess").train()

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------



Files already downloaded and verified
Files already downloaded and verified
avg_loss: 4.357134819030762
avg_loss: 4.236389636993408
avg_loss: 3.013183832168579
avg_loss: 3.0220577716827393
avg_loss: 3.2535336017608643
avg_loss: 3.0781147480010986
avg_loss: 3.075800657272339
avg_loss: 3.1724421977996826
avg_loss: 3.1078386306762695
avg_loss: 2.918447494506836
avg_loss: 2.6689560413360596
avg_loss: 2.783597230911255
avg_loss: 2.3848423957824707avg_loss: 2.309765100479126

avg_loss: 2.3010752201080322
avg_loss: 2.279109001159668
avg_loss: 2.2204744815826416avg_loss: 2.2609469890594482

avg_loss: 2.1395022869110107
avg_loss: 2.211986541748047


Of course you can enable both distributed training and IPEX

In [16]:
MyNano(use_ipex=True, num_processes=2, distributed_backend="subprocess").train()

Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------



Files already downloaded and verifiedFiles already downloaded and verified



2022-07-27 15:45:26,008 - root - INFO - Reducer buckets have been rebuilt in this iteration.
2022-07-27 15:45:26,009 - root - INFO - Reducer buckets have been rebuilt in this iteration.


avg_loss: 3.8353183269500732
avg_loss: 4.010594844818115
avg_loss: 3.5082931518554688
avg_loss: 3.6064693927764893
avg_loss: 3.8697359561920166
avg_loss: 3.9609947204589844
avg_loss: 3.309493064880371
avg_loss: 3.2898263931274414
avg_loss: 2.688565969467163
avg_loss: 2.639798879623413
avg_loss: 2.828411340713501
avg_loss: 2.84289288520813
avg_loss: 2.4235198497772217
avg_loss: 2.4322192668914795
avg_loss: 2.3998563289642334
avg_loss: 2.399547576904297
avg_loss: 2.259463310241699
avg_loss: 2.217374563217163
avg_loss: 2.2705495357513428avg_loss: 2.2279140949249268

