# Fine-Tuning with Foundation Model (BERT) Using NeMo
Now, it's time for you to take the reins! In this section, you will have the opportunity to experience the process of loading your own foundation model, fine-tuning it for a specific downstream task, and performing inference! Let's roll up our sleeves and start NeMo-ing away!

In our lecture on the Foundation Model, we discussed how a model can be fine-tuned for various downstream tasks.
> Our specific task will be **Sentiment Analysis**. Sentiment Analysis involves detecting the sentiment or emotional tone expressed in a piece of text. For example, `Just got promoted at work and feeling good!` has a positive sentiment while `Lost my wallet today. Feeling frustrated and worried.` has a negative sentiment.

In NeMo, We can simplify this problem by treating it as a text classification task.<br><br><br>

## Setup and Load NeMo
Our first most important step is to setup NeMo! Since we will be doing model training, also check if NVIDIA GPU is enabled!
*Information to setup and additional details from: https://github.com/NVIDIA/NeMo*
<br>
1. Setup NeMo (already installed)
2. Load NeMo and dependencies
3. Check your NeMo version
4. Check if GPU is enabled

We can skip the installation step since NeMo is already pre-installed!

In [27]:
### Setup NeMo
# you will need to specify BRANCH to download data from github repo for later on!
BRANCH = 'main' 
# !python -m pip install -q git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[nlp]

In [28]:
# Load NeMo and dependencies
import nemo
from nemo.collections import nlp as nemo_nlp
from nemo.utils.exp_manager import exp_manager  # TAKE NOTE!

import os
import wget
import torch
import pytorch_lightning as pl
from omegaconf import OmegaConf

In [29]:
# Check your NeMo version

nemo.__version__

'1.22.0'

In [30]:
# Check if GPU is enabled
!nvidia-smi

Wed May  1 17:31:39 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10-24Q      On   | 00000002:00:00.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    290MiB / 24512MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

<br><br><br>

## Dataset
We going to use [The Stanford Sentiment Treebank (SST-2)](https://nlp.stanford.edu/sentiment/index.html) corpus for sentiment analysis. This version of the dataset contains a collection of sentences with binary labels of positive and negative. It is a standard benchmark for sentence classification and is part of the GLUE Benchmark: https://gluebenchmark.com/tasks. Please download and unzip the SST-2 dataset from GLUE. It should contain three files of train.tsv, dev.tsv, and test.tsv which can be used for training, validation, and test respectively.



### Download Dataset

First, we need to prepare the environment for working with the dataset by defining our data and work directory.

In [31]:
# Prepare the environment for working with the dataset

DATA_DIR = "DATA_DIR"
WORK_DIR = "WORK_DIR"
os.environ['DATA_DIR'] = DATA_DIR
os.makedirs(WORK_DIR, exist_ok=True)
os.makedirs(DATA_DIR, exist_ok=True)


- Download the `SST-2` .zip file dataset from https://dl.fbaipublicfiles.com/glue/data/SST-2.zip

Since it is a .zip file, execute the cell box to unzip dataset and perform some preprocessing steps to format the dataset appropriately for NeMo. Our `training_nemo_format.tsv` and `dev_nemo_format.tsv` are the training and testing dataset for our model to train and evaluate on later!

In [32]:
# Download the SST-2 dataset, unzip and preprocess the dataset

!wget https://dl.fbaipublicfiles.com/glue/data/SST-2.zip
!unzip -o SST-2.zip -d {DATA_DIR}
!sed 1d {DATA_DIR}/SST-2/train.tsv > {DATA_DIR}/SST-2/train_nemo_format.tsv
!sed 1d {DATA_DIR}/SST-2/dev.tsv > {DATA_DIR}/SST-2/dev_nemo_format.tsv

--2024-05-01 17:31:40--  https://dl.fbaipublicfiles.com/glue/data/SST-2.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 108.157.254.124, 108.157.254.102, 108.157.254.15, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|108.157.254.124|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7439277 (7.1M) [application/zip]
Saving to: ‘SST-2.zip.2’


2024-05-01 17:31:40 (297 MB/s) - ‘SST-2.zip.2’ saved [7439277/7439277]

Archive:  SST-2.zip
  inflating: DATA_DIR/SST-2/dev.tsv  
  inflating: DATA_DIR/SST-2/original/README.txt  
  inflating: DATA_DIR/SST-2/original/SOStr.txt  
  inflating: DATA_DIR/SST-2/original/STree.txt  
  inflating: DATA_DIR/SST-2/original/datasetSentences.txt  
  inflating: DATA_DIR/SST-2/original/datasetSplit.txt  
  inflating: DATA_DIR/SST-2/original/dictionary.txt  
  inflating: DATA_DIR/SST-2/original/original_rt_snippets.txt  
  inflating: DATA_DIR/SST-2/original/sentiment_labels.txt  
  inflating: DATA_DIR/SST

### Explore Dataset

We want to better understand what data we are working with, so let's take a look at the dataset!


- Check dataset folder in directory
- Inspect first 5 lines of train and test dataset

In [33]:
# check dataset folder in directory

!ls -l {DATA_DIR}/SST-2

total 38232
-rw-r--r-- 1 root root   677859 May  1 17:31 cached_dev_nemo_format.tsv_RobertaTokenizer_256_50265_-1_1_False.pkl
-rw-r--r-- 1 root root 30447299 May  1 17:27 cached_train_nemo_format.tsv_RobertaTokenizer_256_50265_-1_1_True.pkl
-rw-rw-r-- 1 root root    94931 May  2  2018 dev.tsv
-rw-r--r-- 1 root root    94916 May  1 17:31 dev_nemo_format.tsv
drwxr-xr-x 2 root root     4096 May  1 17:31 original
-rw-rw-r-- 1 root root   197335 May  2  2018 test.tsv
-rw-rw-r-- 1 root root  3806081 May  2  2018 train.tsv
-rw-r--r-- 1 root root  3806066 May  1 17:31 train_nemo_format.tsv


In [34]:
# Inspect train and test dataset

print('\nContents (first 5 lines) of train.tsv:\n')
!head -n 5 {DATA_DIR}/SST-2/train_nemo_format.tsv

print('\nContents (first 5 lines) of test.tsv:\n')
!head -n 5 {DATA_DIR}/SST-2/dev_nemo_format.tsv


Contents (first 5 lines) of train.tsv:

hide new secretions from the parental units 	0
contains no wit , only labored gags 	0
that loves its characters and communicates something rather beautiful about human nature 	1
remains utterly satisfied to remain the same throughout 	0
on the worst revenge-of-the-nerds clichés the filmmakers could dredge up 	0

Contents (first 5 lines) of test.tsv:

it 's a charming and often affecting journey . 	1
unflinchingly bleak and desperate 	0
allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . 	1
the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . 	1
it 's slow -- very , very slow . 	0


<br><br><br>

## Create Model/Training Configuration

In the previous part, we covered the basics of NeMo, where we demonstrated how to load a model and make modifications to its configuration. Now, let's explore an alternative approach where we create the model's configuration and experiment manager (*we talked about this before!*) before instantiating or loading a pretrained model. NeMo offers flexibility in various ways to things in its workflow.

Here, we set up all the configuration that will be used by the Trainer when we train the model later on. The *model is defined in a config file which declares multiple important sections*. The most important ones are:
- **model**: All arguments that are related to the Model - language model, tokenizer, head classifier, optimizer, schedulers, and datasets/data loaders.

- **trainer**: Any argument to be passed to PyTorch Lightning including number of epochs, number of GPUs, precision level, etc.<br><br>






### Download Configuration File
(The config directory and path has been set for you!)
- Download configuration file


In [35]:
# download the text classification configuration file
MODEL_CONFIG = "text_classification_config.yaml"
CONFIG_DIR = WORK_DIR + '/configs/'

os.makedirs(CONFIG_DIR, exist_ok=True)
if not os.path.exists(CONFIG_DIR + MODEL_CONFIG):
    print('Downloading config file...')
    wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/text_classification/conf/' + MODEL_CONFIG, CONFIG_DIR)
    print('Config file downloaded!')
else:
    print ('config file already exists')
config_path = f'{WORK_DIR}/configs/{MODEL_CONFIG}'
print(config_path)
config = OmegaConf.load(config_path)

config file already exists
WORK_DIR/configs/text_classification_config.yaml


### Prepare Configuration
Now, we will make changes to the configuration as required for our model/task.

- Set the number of classes for sentiment analysis (*hint: how many classes does SST-2 have?*)
- Set other model parameters e.g. epochs

In [36]:
# Set the number of classes for sentiment analysis (hint: how many classes does SST-2 have)
config.model.dataset.num_classes= 2
config.model.train_ds.file_path = os.path.join(DATA_DIR, 'SST-2/train_nemo_format.tsv')
config.model.train_ds.batch_size = 128
config.model.train_ds.num_workers = 2
config.model.validation_ds.file_path = os.path.join(DATA_DIR, 'SST-2/dev_nemo_format.tsv')
config.model.validation_ds.batch_size = 128
config.model.validation_ds.num_workers = 2


# Name of the .nemo file where trained model will be saved.
config.save_to = 'trained-model.nemo'
config.export_to = 'trained-model.onnx'

# Training stops when max_step or max_epochs is reached (earliest)
config.trainer.max_epochs = 2

- Inspect train dataloader, test dataloader, and trainer config

In [37]:
# Inspect config
# OmegaConf.to_yaml() is used to create a proper format for printing the train dataloader's config

print("Train dataloader:", OmegaConf.to_yaml(config.model.train_ds))
print("Train dataset:\n", OmegaConf.to_yaml(config.model.dataset))
print("Trainer config:\n", OmegaConf.to_yaml(config.trainer))

Train dataloader: file_path: DATA_DIR/SST-2/train_nemo_format.tsv
batch_size: 128
shuffle: true
num_samples: -1
num_workers: 2
drop_last: false
pin_memory: false

Train dataset:
 num_classes: 2
do_lower_case: false
max_seq_length: 256
class_balancing: null
use_cache: false

Trainer config:
 devices: 1
num_nodes: 1
max_epochs: 2
max_steps: -1
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32
accelerator: gpu
log_every_n_steps: 1
val_check_interval: 1.0
num_sanity_val_steps: 0
enable_checkpointing: false
logger: false



- Include GPU related config (for accelerator, `gpu` or `cpu`*)

In [38]:
# Use GPU if available

config.trainer.accelerator = 'gpu' if torch.cuda.is_available() else 'cpu' 
config.trainer.devices = 1
config.trainer.strategy = 'auto'

### Setting up the Trainer (Experiment Manager)
Remember we talked about this before, an experiment manager is used in the NeMo to manage and organize training experiments. The experiment manager simplifies the process of managing experiments, including checkpoints, logs, and experiment configuration. It provides a centralized and organized approach to handle and track the progress of training experiments, making it easier to reproduce results, analyze performance, and iterate on model development.

Let's setup experiment manager. We need the PT trainer and the exp_manager config. `exp_manager` is already available in our config template!

- Create PT Trainer object

In [None]:
trainer = pl.Trainer(**config.trainer)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


In [None]:
# (Optional): Setup Experiment Manager directory for logging
# exp_dir specifies the path to store the the checkpoints and also the logs, it's default is "./nemo_experiments"
# You may set it by uncommenting the following line
config.exp_manager.exp_dir = 'nemo_experiments'

# OmegaConf.to_yaml() is used to create a proper format for printing the trainer config
print(OmegaConf.to_yaml(config.exp_manager))

exp_dir = exp_manager(trainer, config.exp_manager)
exp_dir

In [None]:
print(OmegaConf.to_yaml(config.exp_manager))

<br><br><br>

## Load Pre-trained Model

Often, we get best results by using models pre-trained on large amounts of data. Here, we load a foundation model using NeMo. There are many different models available and some are more suited for certain tasks.

Here, we will use the DistilBERT model, which offers good performance while being relatively fast to train!

In [None]:
# complete list of supported BERT-like models
print(nemo_nlp.modules.get_pretrained_lm_models_list())

- Load in DistilBERT model

In [None]:
# specify the BERT-like model, you want to use
# set the `model.language_modelpretrained_model_name' parameter in the config to the model you want to use
config.model.language_model.pretrained_model_name = 'distilroberta-base'

This is the alternative way of providing the config and trainer object to load the model which is much preferred!

In [None]:
model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer)

<br>

# Monitor Training Progress with TensorBoard Visualization
Optionally, you can use Tensorboard visualization to monitor training progress. Open [TensorBoard](/tensorboard/) in your browser. Then, click the link to see graphs of experiment metrics like loss and accuracy saved in the `training_info` folder.

<br>

## Training Model

Time to train your model! This is where the magic happens and your model starts to learn and improve its performance on the specific downstreak task, in this case to fine-tune for SST-2.

During the training phase, your model will process the training data you've prepared earlier, learning from the patterns and relationships within the dataset. It will iteratively update its internal parameters to minimize the loss function, ultimately aiming to maximize its performance on the task you've defined.

- Train your model (*hint: use trainer object with your model name*)

In [None]:
# Train your model
trainer.fit(model)

<br><br>

## Evaluate Model

Now to assess the performance of our trained DistilBERT model on the testing dataset. We start by extracting the path of the best checkpoint from the training process. This checkpoint represents the state of the model with the best performance on the validation set during training. By using the best checkpoint, we ensure that we evaluate the model at its peak performance.


- Define the checkpoint path to load the evaluation model at its best performing checkpoint

In [None]:
# Extract the path of the best checkpoint from the training
checkpoint_path = trainer.checkpoint_callback.best_model_path

# Load our best model from the previous training
eval_model = nemo_nlp.models.TextClassificationModel.load_from_checkpoint(checkpoint_path=checkpoint_path)

Once again for the evaluation model, we will need to create a config and trainer object for evaluation.

**For convenience, we have provided the code and all you need is to run the cell!**

In [None]:
# create a dataloader config for evaluation, the same data file provided in validation_ds is used here
# file_path can get updated with any file
eval_config = OmegaConf.create({'file_path': config.model.validation_ds.file_path, 'batch_size': 64, 'shuffle': False, 'num_samples': -1})
eval_model.setup_test_data(test_data_config=eval_config)
#eval_dataloader = eval_model._create_dataloader_from_config(cfg=eval_config, mode='test')

# a new trainer is created to show how to evaluate a checkpoint from an already trained model
# create a copy of the trainer config and update it to be used for final evaluation
eval_trainer_cfg = config.trainer.copy()
eval_trainer_cfg.accelerator = 'gpu' if torch.cuda.is_available() else 'cpu' # it is safer to perform evaluation on single GPU as PT is buggy with the last batch on multi-GPUs
eval_trainer_cfg.strategy = 'auto' 
eval_trainer = pl.Trainer(**eval_trainer_cfg)

For text classification, we can measure the accuracy and F1-score of the model's predictions compared to the ground truth labels. Accuracy represents the proportion of correctly classified samples, while F1-score takes into account both precision and recall of the model's predictions.

- By evaluating the model's performance on the test dataset, we gain valuable insights into how well the trained DistilBERT model generalizes to unseen data.

- This evaluation step helps us validate the effectiveness of our model and provides a quantitative measure of its performance on the sentiment classification task!

- Evaluate the best model from the checkpoint
> Take note that we use `eval_trainer.test` for evaluation as compared to `trainer.fit`
- Interpret the results
> Is your model performing well?

In [None]:
eval_trainer.test(model=eval_model, verbose=False)

<br><br>

## Making Inference
For the final and most exciting part - using our trained model for inference! Similarly, we do the same for our inference model, termed as `infer_model` by loading our best model from the previous training checkpoint.

In [None]:
# extract the path of the best checkpoint from the training, you may update it to any other checkpoint file
checkpoint_path = trainer.checkpoint_callback.best_model_path
# Create an evaluation model and load the checkpoint
infer_model = nemo_nlp.models.TextClassificationModel.load_from_checkpoint(checkpoint_path=checkpoint_path)

Move to GPU if available to speed up inference

In [None]:
# move the model to the desired device for inference
# we move the model to "cuda" if available otherwise "cpu" would be used
if torch.cuda.is_available():
    infer_model.to("cuda")
else:
    infer_model.to("cpu")

The time has come to unleash the potential of our trained model for making predictions!

- Input `queries`, `batch_size`, and `max_seq_length` details to your `infer_model`.
- Try a different query

In [None]:
# define the list of queries for inference
queries = ['by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .',
           'director rob marshall went out gunning to make a great one .',
           'uneasy mishmash of styles and genres .']

# max_seq_length=512 is the maximum length BERT supports.
results = infer_model.classifytext(queries=queries, batch_size=3, max_seq_length=512)

print('The prediction results of some sample queries with the trained model:')
for query, result in zip(queries, results):
    print(f'Query : {query}')
    print(f'Predicted label: {result}')

<br><br><br>

## Optional Exercises
We are not done yet! If you are interested or keen to explore further, we have an optional section prepared for you.

### Optional: Change Hyperparameters or Language Model
So far, you've used the basic DistilBERT language model, but that is just one of many you could try. In this optional section, try to experiment with different settings or models to observe the effects on model performance!

**Exercise:**
- Experiment with different hyperparameters!
- Try a different model to compare performance!

In [None]:
# complete list of supported NeMo NLP models
from nemo.collections import nlp as nemo_nlp
nemo_nlp.modules.get_pretrained_lm_models_list()

You may need to restart the notebook kernal to clear memory.  If you use a large model, other ways to save GPU memory space are to reduce the `batch_size` to 32, 16, or even 8 and reduce the `max_seq_length` to 64. There is no right answer to this exercise.  Rather, this is an opportunity for you to experiment.  Some of the models can take several minutes to run, so feel free to play around!

<br><br><br><br>**Congratulations on completing this section of NeMo workshop!**<br>
*In this section, we demonstrated the NLP capabilities of NeMo, in natural language understanding for sentiment analysis as a downstream task for our fine-tuning on the foundation model. NeMo provides pre-trained models and tools to fine-tune them for specific tasks.*

> Remember that NeMo offers a vast array of capabilities beyond what we've covered here. This tutorial merely scratches the surface of what NeMo can do... We hope this introductory tutorial has provided you with a basic understanding of NeMo. There are many resources to NeMo, including documentation, example code, and a supportive community, to help you on your NeMo journey!