# Fine-tune Transformers Faster with Lightning Flash and Torch ORT
Reference:
* https://devblog.pytorchlightning.ai/fine-tune-transformers-faster-with-lightning-flash-and-torch-ort-ec2d53789dc3
* https://lightning-flash.readthedocs.io/en/latest/reference/text_classification.html

In [1]:
# Torch-ort must lie aside with CUDA 10.2, cudnn 7.6 and 
# lightning-flash is working only on Ubuntu, since one issue related with llvm on macOS required further investigation
import torch
import flash
from flash.core.data.utils import download_data
from flash.text import TextClassificationData, TextClassifier
from pytorch_lightning.plugins import DeepSpeedPlugin

In [2]:
# download data from IMDB data https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/")



In [3]:
# 1. Create the DataModule, consuming about 22G of GPU
datamodule = TextClassificationData.from_csv(
    "review",
    "sentiment",
    train_file="data/imdb/train.csv",
    val_file="data/imdb/valid.csv",
    batch_size=4
)
# 2. Build the task
model = TextClassifier(backbone="facebook/bart-large", 
                       num_classes=datamodule.num_classes,
                       enable_ort=True)

Using custom data configuration default-9bbd85eb52fb9e3e


Downloading and preparing dataset csv/default to /home/i058959/.cache/huggingface/datasets/csv/default-9bbd85eb52fb9e3e/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff...


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset csv downloaded and prepared to /home/i058959/.cache/huggingface/datasets/csv/default-9bbd85eb52fb9e3e/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/22500 [00:00<?, ?ex/s]

  0%|          | 0/23 [00:00<?, ?ba/s]

Using custom data configuration default-0f24d2bf518729b4


Downloading and preparing dataset csv/default to /home/i058959/.cache/huggingface/datasets/csv/default-0f24d2bf518729b4/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff...


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset csv downloaded and prepared to /home/i058959/.cache/huggingface/datasets/csv/default-0f24d2bf518729b4/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/2500 [00:00<?, ?ex/s]

  0%|          | 0/3 [00:00<?, ?ba/s]

Using 'facebook/bart-large' provided by Hugging Face/transformers (https://github.com/huggingface/transformers).
Some weights of BartForSequenceClassification were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['classification_head.out_proj.weight', 'classification_head.dense.bias', 'classification_head.out_proj.bias', 'classification_head.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# 3. Create the trainer and finetune the model
# ! pip install deepspeed
trainer = flash.Trainer(max_epochs=3, gpus=2, plugins=DeepSpeedPlugin(stage=1))
trainer.fit(model, datamodule=datamodule)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/2


RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=2, worker_count=1, timeout=0:30:00)