[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch-lightning/pytorch_lightning_cv_data_pipeline.ipynb)

# Accelerate Computer Vision Data Processing Pipeline

You can use `transforms` and `datasets` from `bigdl.nano.pytorch.vision` as a drop-in replacement of `torchvision.transforms` and `torchvision.datasets` to easily accelerate your computer vision data processing pipeline in PyTorch Lightning applications.

In order to use Nano to accelerate your computer vision data processing pipeline in PyTorch Lightning, you need to install BigDL-Nano for PyTorch first:

In [None]:
!pip install --pre --upgrade bigdl-nano[pytorch] # install the nightly-built version
!source bigdl-nano-init # set environment variables

> 📝 **Note**
>
> Before starting your PyTorch Lightning application, it is highly recommended to run `source bigdl-nano-init` to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most PyTorch Lightning applications on training workloads.

> ⚠️ **Warning**
> 
> For Jupyter Notebook users, we recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

Let's take a self-defined `LightningModule` (based on a [ResNet-18 model](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) pretrained on ImageNet dataset) as an example, and suppose we would like to finetune the model on OxfordIIITPet dataset:

In [None]:
# Define LightningModule

import torch
from torchvision.models import resnet18
import pytorch_lightning as pl

class MyLightningModule(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.model = resnet18(pretrained=True)
        num_ftrs = self.model.fc.in_features
        # here the size of each output sample is set to 37.
        self.model.fc = torch.nn.Linear(num_ftrs, 37)
        self.criterion = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        output = self.model(x)
        loss = self.criterion(output, y)
        self.log('train_loss', loss)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        output = self.forward(x)
        loss = self.criterion(output, y)
        pred = torch.argmax(output, dim=1)
        acc = torch.sum(y == pred).item() / (len(y) * 1.0)
        metrics = {'test_acc': acc, 'test_loss': loss}
        self.log_dict(metrics)

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=0.002, momentum=0.9, weight_decay=5e-4)

In [None]:
model = MyLightningModule()

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _The definition of_ `MyLightningModule` _can be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch-lightning/pytorch_lightning_cv_data_pipeline.ipynb).

To finetune the model on OxfordIIITPet dataset, we need to create required train/validate datasets and dataloaders. To accelerate the data processing pipeline, you could simply **import BigDL-Nano** `transforms` **and** `datasets` **to replace** `torchvision.transforms` **and** `torchvision.datasets`:

In [None]:
# from torchvision import transforms
# from torchvision.datasets import OxfordIIITPet
from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import OxfordIIITPet

# Data processing steps are the same as using torchvision
train_transform = transforms.Compose([transforms.Resize(256),
                                        transforms.RandomCrop(224),
                                        transforms.RandomHorizontalFlip(),
                                        transforms.ColorJitter(brightness=.5, hue=.3),
                                        transforms.ToTensor(),
                                        transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])
val_transform = transforms.Compose([transforms.Resize(256),
                                    transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

train_dataset = OxfordIIITPet(root="/tmp/data", transform=train_transform, download=True)
val_dataset = OxfordIIITPet(root="/tmp/data", transform=val_transform)

In [None]:
from torch.utils.data.dataloader import DataLoader

def create_dataloaders(train_dataset, val_dataset):
    # obtain training indices that will be used for validation
    indices = torch.randperm(len(train_dataset))
    val_size = len(train_dataset) // 4
    train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
    val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])

    # prepare data loaders
    train_dataloader = DataLoader(train_dataset, batch_size=32)
    val_dataloader = DataLoader(val_dataset, batch_size=32)

    return train_dataloader, val_dataloader

In [None]:
train_loader, val_loader = create_dataloaders(train_dataset, val_dataset)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _The definition of_ `create_dataloaders` _can be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch-lightning/pytorch_lightning_cv_data_pipeline.ipynb).

You could then do the training and evaluation steps with Nano `Trainer`:

In [None]:
from bigdl.nano.pytorch import Trainer

trainer = Trainer(max_epochs=5)
trainer.fit(model, train_dataloaders=train_loader)
trainer.validate(model, dataloaders=val_loader)

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)