# Fine-tune Transformers Faster with Lightning Flash and Torch ORT
Reference:
* https://devblog.pytorchlightning.ai/fine-tune-transformers-faster-with-lightning-flash-and-torch-ort-ec2d53789dc3
* https://lightning-flash.readthedocs.io/en/latest/reference/text_classification.html

In [1]:
# Torch-ort must lie aside with CUDA 10.2, cudnn 7.6 and 
# lightning-flash is working only on Ubuntu, since one issue related with llvm on macOS required further investigation
import torch
import flash
from flash.core.data.utils import download_data
from flash.text import TextClassificationData, TextClassifier
from pytorch_lightning.plugins import DeepSpeedPlugin

In [2]:
flash.__version__

'0.7.0rc0'

In [3]:
# download data from IMDB data https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/")



In [4]:
# 1. Create the DataModule
datamodule = TextClassificationData.from_csv(
    "review",
    "sentiment",
    train_file="data/imdb/train.csv",
    val_file="data/imdb/valid.csv",
    batch_size=1024
)
# 2. Build the task
model = TextClassifier(backbone="facebook/bart-large", 
                       num_classes=datamodule.num_classes,
                       enable_ort=True)

Using custom data configuration default-ea1faad28a555881


Downloading and preparing dataset csv/default to /home/kemove/storage/dev/workspace/datasets/cache/csv/default-ea1faad28a555881/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e...


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

Dataset csv downloaded and prepared to /home/kemove/storage/dev/workspace/datasets/cache/csv/default-ea1faad28a555881/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

0ex [00:00, ?ex/s]

Using custom data configuration default-20ce708e62a61405


Downloading and preparing dataset csv/default to /home/kemove/storage/dev/workspace/datasets/cache/csv/default-20ce708e62a61405/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e...


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

Dataset csv downloaded and prepared to /home/kemove/storage/dev/workspace/datasets/cache/csv/default-20ce708e62a61405/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

0ex [00:00, ?ex/s]

  rank_zero_deprecation(
Using 'facebook/bart-large' provided by Hugging Face/transformers (https://github.com/huggingface/transformers).
Some weights of BartForSequenceClassification were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['classification_head.out_proj.bias', 'classification_head.dense.weight', 'classification_head.out_proj.weight', 'classification_head.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
# 3. Create the trainer and finetune the model
# ! pip install deepspeed
trainer = flash.Trainer(max_epochs=3, gpus=1, plugins=DeepSpeedPlugin(stage=1))
trainer.fit(model, datamodule=datamodule)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type       | Params
---------------------------------------------
0 | train_metrics | ModuleDict | 0     
1 | val_metrics   | ModuleDict | 0     
2 | test_metrics  | ModuleDict | 0     
3 | model         | ORTModule  | 407 M 
---------------------------------------------
407 M     Trainable params
0         Non-trainable params
407 M     Total params
1,629.372 Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 23.65 GiB total capacity; 19.65 GiB already allocated; 1.60 GiB free; 21.05 GiB reserved in total by PyTorch)