# From SDF to SFT with LLAMA

This playbook demonstrates how to fine tune a model on a synthetic generated data set.

## Prerequisites

- Access to at lest 4 GPUs
- NGC Key

## Section one: Synthetic Data Generation

This section of this tutorial aims to demonstrate a basic loop with two stages as follows. These stages are repeated until the desired dataset size is achieved:


Data processing: perform operations such as HTML tag cleaning, quality-based filtering and semantic deduplication on the records. Synthetic data generation: query a synthetic data generation model (such as LLaMa 3.1 405B Instruct, or Nemotron-4 340B Instruct) to produce synthetic variants of existing records. Each synthetic record is then fed to a reward model (such as Nemotron-4 340B Reward), and assigned a quality score. All records are then fed to the data processing stage for further processing.

In [52]:
import os
import json
import numpy as np
from rouge_score import rouge_scorer, scoring



In [50]:
NEMO_DIR = os.path.join("/opt/NeMo")
NEMO_CURATOR_DIR = os.path.join("/opt/NeMo-Curator")
HF_TOKEN = ""

In [51]:
YOUR_WORKING_DIR = os.path.join(os.path.expanduser('~'), "exp1")
os.makedirs(YOUR_WORKING_DIR, exist_ok=True)

In [4]:
!ls -ld {NEMO_DIR} {NEMO_CURATOR_DIR} {YOUR_WORKING_DIR}

drwxr-xr-x 14 root root 4096 Aug  2 22:47 /opt/NeMo
drwxr-xr-x 11 root root 4096 Aug  2 22:41 /opt/NeMo-Curator
drwxr-xr-x  2 root root 4096 Oct  9 09:42 /root/exp1


In [5]:
os.getcwd()

'/nemo-curator'

In [9]:
!python peft-curation-with-sdg/main.py \
    --api-key "" \
    --device gpu \
    --synth-gen-rounds 1 --synth-gen-ratio 0.001 --synth-gen-model "nvidia/nemotron-4-340b-instruct" \
    --working-dir {YOUR_WORKING_DIR}


Download directory:  /root/exp1/data/raw/downloads
Downloading Law QA dataset from 'https://huggingface.co/datasets/ymoslem/Law-StackExchange/resolve/main/law-stackexchange-questions-answers.json'...
Running the initial curation pipeline on '/root/exp1/data/raw/splits/law-qa-train.jsonl'...
Reading 1 files
tokenizer_config.json: 100%|███████████████████| 350/350 [00:00<00:00, 4.29MB/s]
vocab.txt: 100%|█████████████████████████████| 232k/232k [00:00<00:00, 25.3MB/s]
tokenizer.json: 100%|████████████████████████| 466k/466k [00:00<00:00, 24.8MB/s]
special_tokens_map.json: 100%|█████████████████| 112/112 [00:00<00:00, 1.26MB/s]
config.json: 100%|█████████████████████████████| 612/612 [00:00<00:00, 8.61MB/s]
model.safetensors: 100%|████████████████████| 90.9M/90.9M [00:00<00:00, 231MB/s]
Fitting memory estimate curve for model: sentence-transformers/all-MiniLM-L6-v2
100%|█████████████████████████████████████████████| 8/8 [00:48<00:00,  6.00s/it]
2024-10-09 09:44:45,603 | 7086167 | Rank 0 | 

In [13]:
!ls -l {YOUR_WORKING_DIR}/data/curated

total 8
drwxr-xr-x 2 root root 4096 Oct  9 09:47 final
drwxr-xr-x 2 root root 4096 Oct  9 09:46 round-1


In [19]:
DATA_DIR = os.path.join(YOUR_WORKING_DIR, "data/curated/final")
!ls {DATA_DIR}

law-qa-test.jsonl  law-qa-train.jsonl  law-qa-val.jsonl


You should see the law-qa-{train/val/test}.jsonl splits resulting from following the abovementioned SDG tutorial.

In [20]:
TRAIN_DS = os.path.join(DATA_DIR, "law-qa-train.jsonl")
VAL_DS = os.path.join(DATA_DIR, "law-qa-val.jsonl")
TEST_DS = os.path.join(DATA_DIR, "law-qa-test.jsonl")

2. **Get the model**: Download the `Meta Llama 3.1 8B Instruct .nemo` model and mount the corresponding folder to the container.

In [21]:
!mkdir -p {YOUR_WORKING_DIR}/model

In [22]:
!wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/llama-3_1-8b-instruct-nemo/versions/1.0/zip -O {YOUR_WORKING_DIR}/model/llama-3_1-8b-instruct-nemo_1.0.zip

--2024-10-09 09:55:26--  https://api.ngc.nvidia.com/v2/models/nvidia/nemo/llama-3_1-8b-instruct-nemo/versions/1.0/zip
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 54.149.87.155, 34.223.159.105
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|54.149.87.155|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://files.ngc.nvidia.com/org/nvidia/team/nemo/models/llama-3_1-8b-instruct-nemo/versions/1.0/files.zip?versionId=rh8etjB3R7KsBSm8C3GnCV6pRPpjAAcX&Expires=1728554127&Signature=szoLx3aTZiq6G4aWF63ebhyKdFvzD4iluUFB091CMV~6jRC27BgZST1sl0ffUFy5NcdVbpdaBnqbRLe4AfvTKGSFcJgzt3CDHCQn3QRvNK5KlUtaAHxJGUaGfgNZRzW8MfDYbPds57FYtVQfSJXRqPWzFcZ51o~~JMnk7Y2X5NkSp~8tXnet4IjI-Sa0u0j2-rY-Ac99rNIkP2djKP7jEQrGgCrmcIYnaJ5wxB7-m5Urhe4hTWKYCUzD88LceNAtOxCKHX~hJyXYrjlAh7EqpfWcrtn0QvK1NEJy2XzKO-sRHNR1dLYjkM7xXQ94zeuk19QWjQI4b3gcc9ONkzkPjg__&Key-Pair-Id=KCX06E8E9L60W [following]
--2024-10-09 09:55:27--  https://files.ngc.nvidia.com/org/nvidia/team/nemo/models/llama-3_

In [23]:
!unzip {YOUR_WORKING_DIR}/model/llama-3_1-8b-instruct-nemo_1.0.zip -d {YOUR_WORKING_DIR}/model

Archive:  /root/exp1/model/llama-3_1-8b-instruct-nemo_1.0.zip
  inflating: /root/exp1/model/llama3_1_8b_instruct.nemo  


In [24]:
!ls {YOUR_WORKING_DIR}/model

llama-3_1-8b-instruct-nemo_1.0.zip  llama3_1_8b_instruct.nemo


### Set the Hugging Face Access Token: You can obtain this from your [Hugging Face account](https://huggingface.co/docs/hub/en/security-tokens). 

In [26]:
from huggingface_hub import login

login(token=HF_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


  from .autonotebook import tqdm as notebook_tqdm


---
##  Section 2: Data Curation

This notebook is structured into four steps:
1. Prepare the dataset
2. Run the PEFT finetuning script
3. Inference with NeMo Framework
4. Check the model accuracy

### Step 1: Prepare the dataset

This dataset has already undergone several filtering and processing operations, and it can be used to train the model for various different tasks - question title generation (summarization), law domain question answering, and question tag generation (multi-label classification).

Take a look at a single row in the dataset.

In [27]:
# TRAIN, VAL and TEST splits all follow the same structure
!head -n1 {TRAIN_DS}

{"answer":"To find out who owns a property in Australia, you can contact your local council or the Land Titles Office. You cannot take action on the property without the owner's consent. Reach out to the owner and request permission. If they refuse and the property poses a fire hazard, consult a solicitor. The solicitor can send a notice to the owner, making them aware of potential liability for damages. This may encourage the owner to take action or grant you permission to address the issue.","answer_score":0,"filename":"law-qa-train-synth-round-1.jsonl","id":"law-stackexchange-qa-5126-synth-0","question":"I'm concerned about a property near mine, covered in tall grass and Gorse Bushes, which could exacerbate a fire during a fire ban. I'd like to clear the land for fire defense but am unsure of its ownership. I'm seeking legal advice on how to address this potential hazard, as neither the local council nor any private owner seems to be taking action.","question_score":0,"tags":"austra

You will see several fields in the `.jsonl`, including `title`, `question`, `answer`, and other associated metadata.

For this tutorial, our input will be the `answer` field, and output will be it's `title`. 

The following cell does two things -
* Adds a template - a prompt instruction (which is optional), and format `{PROMPT} \nQUESTION: {data["question"]} \nTITLE: `.
* Saves the data splits into the same location, also appending a `_preprocessed` marker to them.

In [28]:
# Add a prompt instruction.
PROMPT='''Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more.'''

# Creates a preprocessed version of the data files
for input_file in [TRAIN_DS, VAL_DS, TEST_DS]:
    output_file = input_file.rsplit('.', 1)[0] + '_preprocessed.jsonl'
    with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
        for line in infile:
            # Parse each line as JSON
            data = json.loads(line)

            # Create a new dictionary with only the desired fields, renamed and formatted
            new_data = {
                "input": f'''{PROMPT} \nQUESTION: {data["question"]} \nTITLE: ''',
                "output": data['title']
            }

            # Write the new data as a JSON line to the output file
            json.dump(new_data, outfile)
            outfile.write('\n')  # Add a newline after each JSON object

    print(f"Processed {input_file} and created {output_file}")

Processed /root/exp1/data/curated/final/law-qa-train.jsonl and created /root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl
Processed /root/exp1/data/curated/final/law-qa-val.jsonl and created /root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl
Processed /root/exp1/data/curated/final/law-qa-test.jsonl and created /root/exp1/data/curated/final/law-qa-test_preprocessed.jsonl


After running the above scripts, you will see  `law-qa-{train/test/val}_preprocessed.jsonl` files appear in the data directory.

This is what an example will be formatted like -

```json
{"input": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In order to be sued in a particular jurisdiction, say New York, a company must have a minimal business presence in the jurisdiction. What constitutes such a presence? Suppose the company engaged a New York-based Plaintiff, and its representatives signed the contract with the Plaintiff in New York City. Does this satisfy the minimum presence rule? Suppose, instead, the plaintiff and contract signing were in New Jersey, but the company hired a law firm with offices in New York City. Does this qualify? \nTITLE: ", 
 "output": "What constitutes \"doing business in a jurisdiction?\""}
```


In [29]:
# clear up any cached mem-map file
!rm {DATA_DIR}/*idx*

rm: cannot remove '/root/exp1/data/curated/final/*idx*': No such file or directory


### Step 2: Run PEFT finetuning script for LoRA

NeMo framework includes a high level python script for fine-tuning  [megatron_gpt_finetuning.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py) that can abstract away some of the lower level API calls. Once you have your model downloaded and the dataset ready, LoRA fine-tuning with NeMo is essentially just running this script!

For this demonstration, this training run is capped by `max_steps`, and validation is carried out every `val_check_interval` steps. If the validation loss does not improve after a few checks, training is halted to avoid overfitting.

> `NOTE:` In the block of code below, pass the paths to your train, test and validation data files as well as path to the .nemo model.

In [36]:
print(DATA_DIR)

/root/exp1/data/curated/final


In [37]:
%%bash

# Set paths to the model, train, validation and test sets.
MODEL="/root/exp1/model/llama3_1_8b_instruct.nemo" #FIXME

TRAIN_DS="[/root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl]" #FIXME
VALID_DS="[/root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl]"   #FIXME
TEST_DS="[/root/exp1/data/curated/final/law-qa-test_preprocessed.jsonl]"   #FIXME
TEST_NAMES="[law]"

SCHEME="lora"
TP_SIZE=1
PP_SIZE=1

OUTPUT_DIR="/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen"
rm -r $OUTPUT_DIR

torchrun --nproc_per_node=2 \
/opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py \
    exp_manager.exp_dir=${OUTPUT_DIR} \
    exp_manager.explicit_log_dir=${OUTPUT_DIR} \
    trainer.devices=2 \
    trainer.num_nodes=1 \
    trainer.precision=bf16-mixed \
    trainer.val_check_interval=0.2 \
    trainer.max_steps=1000 \
    model.megatron_amp_O2=True \
    ++model.mcore_gpt=True \
    model.tensor_model_parallel_size=${TP_SIZE} \
    model.pipeline_model_parallel_size=${PP_SIZE} \
    model.micro_batch_size=1 \
    model.global_batch_size=32 \
    model.restore_from_path=${MODEL} \
    model.data.train_ds.file_names=${TRAIN_DS} \
    model.data.train_ds.concat_sampling_probabilities=[1.0] \
    model.data.validation_ds.file_names=${VALID_DS} \
    model.peft.peft_scheme=${SCHEME}

rm: cannot remove '/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen': No such file or directory
`zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
`zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
    See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(
    


[NeMo I 2024-10-09 10:09:03 megatron_gpt_finetuning:56] 
    
    ************** Experiment configuration ***********
[NeMo I 2024-10-09 10:09:03 megatron_gpt_finetuning:57] 
    name: megatron_gpt_peft_${model.peft.peft_scheme}_tuning
    trainer:
      devices: 2
      accelerator: gpu
      num_nodes: 1
      precision: bf16-mixed
      logger: false
      enable_checkpointing: false
      use_distributed_sampler: false
      max_epochs: 9999
      max_steps: 1000
      log_every_n_steps: 10
      val_check_interval: 0.2
      gradient_clip_val: 1.0
    exp_manager:
      explicit_log_dir: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen
      exp_dir: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen
      name: ${name}
      create_wandb_logger: false
      wandb_logger_kwargs:
        project: null
        name: null
      resume_if_exists: true
      resume_ignore_no_checkpoint: true
      create_checkpoint_callback: true
      checkpoint_callback_params:
        monito

[NeMo W 2024-10-09 10:09:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/_graveyard/precision.py:49: The `MixedPrecisionPlugin` is deprecated. Use `pytorch_lightning.plugins.precision.MixedPrecision` instead.
    
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


[NeMo I 2024-10-09 10:09:03 exp_manager:396] ExpManager schema
[NeMo I 2024-10-09 10:09:03 exp_manager:397] {'explicit_log_dir': None, 'exp_dir': None, 'name': None, 'version': None, 'use_datetime_version': True, 'resume_if_exists': False, 'resume_past_end': False, 'resume_ignore_no_checkpoint': False, 'resume_from_checkpoint': None, 'create_tensorboard_logger': True, 'summary_writer_kwargs': None, 'create_wandb_logger': False, 'wandb_logger_kwargs': None, 'create_mlflow_logger': False, 'mlflow_logger_kwargs': {'experiment_name': None, 'tracking_uri': None, 'tags': None, 'save_dir': './mlruns', 'prefix': '', 'artifact_location': None, 'run_id': None, 'log_model': False}, 'create_dllogger_logger': False, 'dllogger_logger_kwargs': {'verbose': False, 'stdout': False, 'json_file': './dllogger.json'}, 'create_clearml_logger': False, 'clearml_logger_kwargs': {'project': None, 'task': None, 'connect_pytorch': False, 'model_name': None, 'tags': None, 'log_model': False, 'log_cfg': False, 'log_

[NeMo E 2024-10-09 10:09:03 exp_manager:830] exp_manager received explicit_log_dir: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen and at least one of exp_dir: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen, or version: None. Please note that exp_dir, name, and version will be ignored.
[NeMo W 2024-10-09 10:09:03 exp_manager:835] Exp_manager is logging to /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen, but it already exists.
[NeMo W 2024-10-09 10:09:03 exp_manager:757] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints. Training from scratch.


[NeMo I 2024-10-09 10:09:03 exp_manager:455] Experiments will be logged at /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen
[NeMo I 2024-10-09 10:09:03 exp_manager:983] TensorboardLogger has been set up


[NeMo W 2024-10-09 10:09:03 exp_manager:1111] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to 1000. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() d

[NeMo I 2024-10-09 10:09:42 megatron_init:269] Rank 0 has data parallel group : [0, 1]
[NeMo I 2024-10-09 10:09:42 megatron_init:275] Rank 0 has combined group of data parallel and context parallel : [0, 1]
[NeMo I 2024-10-09 10:09:42 megatron_init:280] All data parallel group ranks with context parallel combined: [[0, 1]]
[NeMo I 2024-10-09 10:09:42 megatron_init:283] Ranks 0 has data parallel rank: 0
[NeMo I 2024-10-09 10:09:42 megatron_init:291] Rank 0 has context parallel group: [0]
[NeMo I 2024-10-09 10:09:42 megatron_init:294] All context parallel group ranks: [[0], [1]]
[NeMo I 2024-10-09 10:09:42 megatron_init:295] Ranks 0 has context parallel rank: 0
[NeMo I 2024-10-09 10:09:42 megatron_init:302] Rank 0 has model parallel group: [0]
[NeMo I 2024-10-09 10:09:42 megatron_init:303] All model parallel group ranks: [[0], [1]]
[NeMo I 2024-10-09 10:09:42 megatron_init:312] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-10-09 10:09:42 megatron_init:316] All tensor model par

[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: deterministic_mode in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: use_te_rng_trac

[NeMo I 2024-10-09 10:09:42 tokenizer_utils:183] Getting HuggingFace AutoTokenizer with pretrained_model_name: meta-llama/Meta-Llama-3-8B


    
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


[NeMo I 2024-10-09 10:09:42 megatron_base_model:595] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.


[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 10:09:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: deterministi

Loading distributed checkpoint with TensorStoreLoadShardedStrategy
[NeMo I 2024-10-09 10:11:01 nlp_overrides:1346] Model MegatronGPTSFTModel was successfully restored from /root/exp1/model/llama3_1_8b_instruct.nemo.
[NeMo I 2024-10-09 10:11:01 megatron_gpt_finetuning:72] Adding adapter weights to the model for PEFT
[NeMo I 2024-10-09 10:11:01 nlp_adapter_mixins:240] Before adding PEFT params:
      | Name  | Type          | Params | Mode 
    ------------------------------------------------
    0 | model | Float16Module | 8.0 B  | train
    ------------------------------------------------
    0         Trainable params
    8.0 B     Non-trainable params
    8.0 B     Total params
    32,121.045Total estimated model params size (MB)
[NeMo I 2024-10-09 10:11:04 nlp_adapter_mixins:245] After adding PEFT params:
      | Name  | Type          | Params | Mode 
    ------------------------------------------------
    0 | model | Float16Module | 8.0 B  | train
    -----------------------------

[NeMo W 2024-10-09 10:11:04 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:161: You have overridden `MegatronGPTSFTModel.configure_sharded_model` which is deprecated. Please override the `configure_model` hook instead. Instantiation with the newer hook will be created on the device right away and have the right data type depending on the precision setting in the Trainer.
    
[NeMo W 2024-10-09 10:11:04 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:143: You are using the `dataloader_iter` step flavor. If you consume the iterator more than once per step, the `batch_idx` argument in any hook that takes it will not match with the batch index of the last batch consumed. This might have unforeseen effects on callbacks or code that expects to get the correct index. This will also not work well with gradient accumulation. This feature is very experimental and subjec

[NeMo I 2024-10-09 10:11:04 megatron_gpt_sft_model:801] Building GPT SFT validation datasets.
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:116] Building data files
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:525] Processing 1 data files using 2 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:495] Building indexing for fn = /root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:507] Saving idx file = /root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl.idx.npy
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:509] Saving metadata file = /root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl.idx.info
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:535] Time building 1 / 1 mem-mapped files: 0:00:00.068018
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:525] Processing 1 data files using 2 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:00.048851
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:158] Loading data files
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:249] Loading /root/exp1/data/curated/final/law-qa-val_preprocessed.jsonl
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.001096
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:165] Computing global indices
[NeMo I 2024-10-09 10:11:04 megatron_gpt_sft_model:805] Length of val dataset: 2434
[NeMo I 2024-10-09 10:11:04 megatron_gpt_sft_model:812] Building GPT SFT traing datasets.
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:116] Building data files
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:525] Processing 1 data files using 2 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:495] Building indexing for fn = /root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:507] Saving idx file = /root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl.idx.npy
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:509] Saving metadata file = /root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl.idx.info
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:535] Time building 1 / 1 mem-mapped files: 0:00:00.075523
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:525] Processing 1 data files using 2 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:00.133688
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:158] Loading data files
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:249] Loading /root/exp1/data/curated/final/law-qa-train_preprocessed.jsonl
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000959
[NeMo I 2024-10-09 10:11:04 text_memmap_dataset:165] Computing global indices


      counts = torch.cuda.LongTensor([1])
    


make: Entering directory '/opt/NeMo/nemo/collections/nlp/data/language_modeling/megatron'
make: Nothing to be done for 'default'.
make: Leaving directory '/opt/NeMo/nemo/collections/nlp/data/language_modeling/megatron'
> building indices for blendable datasets ...
 > sample ratios:
   dataset 0, input: 1, achieved: 1
[NeMo I 2024-10-09 10:11:06 blendable_dataset:67] > elapsed time for building blendable dataset indices: 0.13 (sec)
[NeMo I 2024-10-09 10:11:06 megatron_gpt_sft_model:814] Length of train dataset: 32160
[NeMo I 2024-10-09 10:11:06 megatron_gpt_sft_model:819] Building dataloader with consumed samples: 0
[NeMo I 2024-10-09 10:11:06 megatron_gpt_sft_model:819] Building dataloader with consumed samples: 0


LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
[NeMo W 2024-10-09 10:11:06 megatron_base_model:1223] Ignoring `trainer.max_epochs` when computing `max_steps` because `trainer.max_steps` is already set to 1000.


[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_adapter
[NeMo I 2024-10-09 10:11:06 adapter_mixins:495] Unfrozen adapter : lora_kqv_


  | Name  | Type          | Params | Mode 
------------------------------------------------
0 | model | Float16Module | 8.0 B  | train
------------------------------------------------
10.5 M    Trainable params
8.0 B     Non-trainable params
8.0 B     Total params
32,162.988Total estimated model params size (MB)
[NeMo W 2024-10-09 10:11:06 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=126` in the `DataLoader` to improve performance.
    
[NeMo W 2024-10-09 10:11:06 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py:149: Found `dataloader_iter` argument in the `validation_step`. Note that the support for this signature is experimental and the behavior is subject to change.
    


Sanity Checking: |          | 0/? [00:00<?, ?it/s][NeMo I 2024-10-09 10:11:06 num_microbatches_calculator:119] setting number of micro-batches to constant 16
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:04<00:00,  0.42it/s][NeMo I 2024-10-09 10:11:11 num_microbatches_calculator:119] setting number of micro-batches to constant 16


[NeMo W 2024-10-09 10:11:11 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    
[NeMo W 2024-10-09 10:11:11 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('validation_loss_dataloader0', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    
[NeMo W 2024-10-09 10:11:11 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('validation_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    
[NeMo W 202

Epoch 0: :  20%|██        | 201/1000 [06:52<27:20, reduced_train_loss=1.950, global_step=200.0, consumed_samples=6432.0, train_step_timing in s=2.180]
Validation: |          | 0/? [00:00<?, ?it/s][A[NeMo I 2024-10-09 10:18:04 num_microbatches_calculator:119] setting number of micro-batches to constant 16

Validation:   0%|          | 0/77 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/77 [00:00<?, ?it/s][A
Validation DataLoader 0:   1%|▏         | 1/77 [00:01<01:22,  0.92it/s][A
Validation DataLoader 0:   3%|▎         | 2/77 [00:02<01:21,  0.92it/s][A
Validation DataLoader 0:   4%|▍         | 3/77 [00:03<01:20,  0.92it/s][A
Validation DataLoader 0:   5%|▌         | 4/77 [00:05<01:38,  0.74it/s][A
Validation DataLoader 0:   6%|▋         | 5/77 [00:06<01:33,  0.77it/s][A
Validation DataLoader 0:   8%|▊         | 6/77 [00:09<01:48,  0.65it/s][A
Validation DataLoader 0:   9%|▉         | 7/77 [00:10<01:47,  0.65it/s][A
Validation DataLoader 0:  10%|█         | 8/77

[rank: 0] Metric val_loss improved. New best score: 1.666
[rank: 1] Metric val_loss improved. New best score: 1.666
Epoch 0, global step 201: 'validation_loss' reached 1.66580 (best 1.66580), saving model to '/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.666-step=201-consumed_samples=6432.0.ckpt' as top 1
[NeMo W 2024-10-09 10:19:46 nlp_overrides:609] DistributedCheckpointIO configured but should not be used. Reverting back to TorchCheckpointIO


Epoch 0: :  40%|████      | 402/1000 [15:34<23:10, reduced_train_loss=1.540, global_step=401.0, consumed_samples=12864.0, train_step_timing in s=2.460, val_loss=1.670]
Validation: |          | 0/? [00:00<?, ?it/s][A[NeMo I 2024-10-09 10:26:46 num_microbatches_calculator:119] setting number of micro-batches to constant 16

Validation:   0%|          | 0/77 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/77 [00:00<?, ?it/s][A
Validation DataLoader 0:   1%|▏         | 1/77 [00:01<01:19,  0.96it/s][A
Validation DataLoader 0:   3%|▎         | 2/77 [00:02<01:17,  0.96it/s][A
Validation DataLoader 0:   4%|▍         | 3/77 [00:03<01:17,  0.95it/s][A
Validation DataLoader 0:   5%|▌         | 4/77 [00:05<01:35,  0.76it/s][A
Validation DataLoader 0:   6%|▋         | 5/77 [00:06<01:31,  0.79it/s][A
Validation DataLoader 0:   8%|▊         | 6/77 [00:08<01:46,  0.67it/s][A
Validation DataLoader 0:   9%|▉         | 7/77 [00:10<01:44,  0.67it/s][A
Validation DataLoader 0:  10%

[rank: 0] Metric val_loss improved by 0.013 >= min_delta = 0.001. New best score: 1.653
[rank: 1] Metric val_loss improved by 0.013 >= min_delta = 0.001. New best score: 1.653
Epoch 0, global step 402: 'validation_loss' reached 1.65309 (best 1.65309), saving model to '/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.653-step=402-consumed_samples=12864.0.ckpt' as top 1


Epoch 0: :  40%|████      | 402/1000 [17:16<25:41, reduced_train_loss=1.540, global_step=401.0, consumed_samples=12864.0, train_step_timing in s=2.460, val_loss=1.650][NeMo I 2024-10-09 10:28:28 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.666-step=201-consumed_samples=6432.0.ckpt
[NeMo I 2024-10-09 10:28:28 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.666-step=201-consumed_samples=6432.0-last.ckpt
Epoch 0: :  60%|██████    | 603/1000 [24:21<16:02, reduced_train_loss=1.540, global_step=602.0, consumed_samples=19296.0, train_step_timing in s=2.060, val_loss=1.650]
Validation: |          | 0/? [00:00<?, ?it/s][A[NeMo I 2024-10-09 10:35:33 num_microbatches_calculator:119] setting number of micro-batches to constant 16

Validation:   0%|          | 0/77 [00:00<?, ?it/s][A

[rank: 0] Metric val_loss improved by 0.003 >= min_delta = 0.001. New best score: 1.650
[rank: 1] Metric val_loss improved by 0.003 >= min_delta = 0.001. New best score: 1.650
Epoch 0, global step 603: 'validation_loss' reached 1.65015 (best 1.65015), saving model to '/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.650-step=603-consumed_samples=19296.0.ckpt' as top 1


Epoch 0: :  60%|██████    | 603/1000 [26:03<17:09, reduced_train_loss=1.540, global_step=602.0, consumed_samples=19296.0, train_step_timing in s=2.060, val_loss=1.650][NeMo I 2024-10-09 10:37:15 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.653-step=402-consumed_samples=12864.0.ckpt
[NeMo I 2024-10-09 10:37:15 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.653-step=402-consumed_samples=12864.0-last.ckpt
Epoch 0: :  80%|████████  | 804/1000 [33:03<08:03, reduced_train_loss=1.380, global_step=803.0, consumed_samples=25728.0, train_step_timing in s=2.280, val_loss=1.650]
Validation: |          | 0/? [00:00<?, ?it/s][A[NeMo I 2024-10-09 10:44:14 num_microbatches_calculator:119] setting number of micro-batches to constant 16

Validation:   0%|          | 0/77 [00:00<?, ?it/s]

Epoch 0, global step 804: 'validation_loss' was not in top 1


Epoch 0: :  80%|████████  | 804/1000 [34:44<08:28, reduced_train_loss=1.380, global_step=803.0, consumed_samples=25728.0, train_step_timing in s=2.280, val_loss=1.650][NeMo I 2024-10-09 10:45:56 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.650-step=603-consumed_samples=19296.0-last.ckpt
Epoch 0: :  98%|█████████▊| 980/1000 [40:47<00:49, reduced_train_loss=1.660, global_step=979.0, consumed_samples=31360.0, train_step_timing in s=2.040, val_loss=1.650]

`Trainer.fit` stopped: `max_steps=1000` reached.


Epoch 0: : 100%|██████████| 1000/1000 [41:29<00:00, reduced_train_loss=1.590, global_step=999.0, consumed_samples=3.2e+4, train_step_timing in s=2.040, val_loss=1.650]
[NeMo I 2024-10-09 10:52:41 nlp_overrides:593] Removing checkpoint: /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.653-step=804-consumed_samples=25728.0-last.ckpt


Restoring states from the checkpoint path at /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.650-step=603-consumed_samples=19296.0.ckpt
Restored all states from the checkpoint at /root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning--validation_loss=1.650-step=603-consumed_samples=19296.0.ckpt


This will create a LoRA adapter - a file named `megatron_gpt_peft_lora_tuning.nemo` in `{YOUR_WORKING_DIR}/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/`. We'll use this later.

To further configure the run above -

* **A different PEFT technique**: The `peft.peft_scheme` parameter determines the technique being used. In this case, we did LoRA, but NeMo Framework supports other techniques as well - such as P-tuning, Adapters, and IA3. For more information, refer to the [PEFT support matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/peft/landing_page.html). For example, for P-tuning, simply set 

```bash
model.peft.peft_scheme="ptuning" # instead of "lora"
```
You can override many such configurations (such as `learning rate`, `adapter dim`, and more) while running the script. A full set of possible configurations is available in [NeMo Framework Github](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml).

### Step 3: Inference with NeMo Framework

Running text generation within the framework is also possible with running a Python script. Note that is more for testing and validation, not a full-fledged  deployment solution like NVIDIA NIM.

In [38]:
# Check that the LORA model file exists
!ls -l {YOUR_WORKING_DIR}/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints

total 307500
-rw-r--r-- 1 root root 146928238 Oct  9 10:37 'megatron_gpt_peft_lora_tuning--validation_loss=1.650-step=603-consumed_samples=19296.0.ckpt'
-rw-r--r-- 1 root root 146928238 Oct  9 10:52 'megatron_gpt_peft_lora_tuning--validation_loss=1.653-step=1000-consumed_samples=32000.0-last.ckpt'
-rw-r--r-- 1 root root  21012480 Oct  9 10:52  megatron_gpt_peft_lora_tuning.nemo


In the code snippet below, the following configurations are worth noting - 

1. `model.restore_from_path` to the path for the Meta-Llama-3.1-8B-Instruct.nemo file.
2. `model.peft.restore_from_path` to the path for the PEFT checkpoint that was created in the fine-tuning run in the last step.
3. `model.test_ds.file_names` to the path of the preprocessed test file.

In [41]:
# Create a smaller test subset for a quick eval demonstration.

!head -n 128 {DATA_DIR}/law-qa-test_preprocessed.jsonl > {DATA_DIR}/law-qa-test_preprocessed-n128.jsonl

In [42]:
DATA_DIR

'/root/exp1/data/curated/final'

In [44]:
YOUR_WORKING_DIR

'/root/exp1'

If you have made any changes in model or experiment paths, please ensure they are configured correctly below.

In [45]:
%%bash
MODEL="/root/exp1/model/llama3_1_8b_instruct.nemo"

TEST_DS="[/root/exp1/data/curated/final/law-qa-test_preprocessed-n128.jsonl]" # Smaller test split
# TEST_DS="[./curated-data/law-qa-test_preprocessed.jsonl]" # Full test set
TEST_NAMES="[law]"

TP_SIZE=1
PP_SIZE=1

# This is where your LoRA checkpoint was saved
PATH_TO_TRAINED_MODEL="/root/exp1/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning.nemo"

# The generation run will save the generated outputs over the test dataset in a file prefixed like so
OUTPUT_PREFIX="law_titlegen_lora"

python /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_generate.py \
    model.restore_from_path=${MODEL} \
    model.peft.restore_from_path=${PATH_TO_TRAINED_MODEL} \
    trainer.devices=1 \
    trainer.num_nodes=1 \
    model.data.test_ds.file_names=${TEST_DS} \
    model.data.test_ds.names=${TEST_NAMES} \
    model.data.test_ds.global_batch_size=32 \
    model.data.test_ds.micro_batch_size=1 \
    model.data.test_ds.tokens_to_generate=25 \
    model.tensor_model_parallel_size=${TP_SIZE} \
    model.pipeline_model_parallel_size=${PP_SIZE} \
    inference.greedy=True  \
    model.data.test_ds.output_file_path_prefix=${OUTPUT_PREFIX} \
    model.data.test_ds.write_predictions_to_file=True \
    model.data.test_ds.truncation_field="null" \
    model.data.test_ds.add_bos=False \
    model.data.test_ds.add_eos=True \
    model.data.test_ds.add_sep=False \
    model.data.test_ds.label_key="output" \
    model.data.test_ds.prompt_template="\{input\}\ \{output\}"

`zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
    See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(
    


[NeMo I 2024-10-09 14:26:15 megatron_gpt_generate:125] 
    
    ************** Experiment configuration ***********
[NeMo I 2024-10-09 14:26:15 megatron_gpt_generate:126] 
    name: megatron_gpt_peft_${model.peft.peft_scheme}_tuning
    trainer:
      devices: 1
      accelerator: gpu
      num_nodes: 1
      precision: 16
      logger: false
      enable_checkpointing: false
      use_distributed_sampler: false
      max_epochs: 9999
      max_steps: 20000
      log_every_n_steps: 10
      val_check_interval: 200
      gradient_clip_val: 1.0
    exp_manager:
      explicit_log_dir: null
      exp_dir: null
      name: ${name}
      create_wandb_logger: false
      wandb_logger_kwargs:
        project: null
        name: null
      resume_if_exists: true
      resume_ignore_no_checkpoint: true
      create_checkpoint_callback: true
      checkpoint_callback_params:
        monitor: validation_${model.data.test_ds.metric.name}
        save_top_k: 1
        mode: max
        save_nemo_o

[NeMo W 2024-10-09 14:26:15 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/_graveyard/precision.py:49: The `MixedPrecisionPlugin` is deprecated. Use `pytorch_lightning.plugins.precision.MixedPrecision` instead.
    
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it 

[NeMo I 2024-10-09 14:26:41 megatron_init:269] Rank 0 has data parallel group : [0]
[NeMo I 2024-10-09 14:26:41 megatron_init:275] Rank 0 has combined group of data parallel and context parallel : [0]
[NeMo I 2024-10-09 14:26:41 megatron_init:280] All data parallel group ranks with context parallel combined: [[0]]
[NeMo I 2024-10-09 14:26:41 megatron_init:283] Ranks 0 has data parallel rank: 0
[NeMo I 2024-10-09 14:26:41 megatron_init:291] Rank 0 has context parallel group: [0]
[NeMo I 2024-10-09 14:26:41 megatron_init:294] All context parallel group ranks: [[0]]
[NeMo I 2024-10-09 14:26:41 megatron_init:295] Ranks 0 has context parallel rank: 0
[NeMo I 2024-10-09 14:26:41 megatron_init:302] Rank 0 has model parallel group: [0]
[NeMo I 2024-10-09 14:26:41 megatron_init:303] All model parallel group ranks: [[0]]
[NeMo I 2024-10-09 14:26:41 megatron_init:312] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-10-09 14:26:41 megatron_init:316] All tensor model parallel group ranks: 

[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:41 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: deterministi

[NeMo I 2024-10-09 14:26:42 megatron_base_model:595] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.


[NeMo W 2024-10-09 14:26:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-10-09 14:26:42 megatron_base_model:1182] The model: MegatronGPTSFTModel() does not have field.name: deterministi

Loading distributed checkpoint with TensorStoreLoadShardedStrategy
[NeMo I 2024-10-09 14:27:57 nlp_overrides:1346] Model MegatronGPTSFTModel was successfully restored from /root/exp1/model/llama3_1_8b_instruct.nemo.
[NeMo I 2024-10-09 14:27:58 nlp_adapter_mixins:240] Before adding PEFT params:
      | Name  | Type     | Params | Mode 
    -------------------------------------------
    0 | model | GPTModel | 8.0 B  | train
    -------------------------------------------
    0         Trainable params
    8.0 B     Non-trainable params
    8.0 B     Total params
    32,121.045Total estimated model params size (MB)
[NeMo I 2024-10-09 14:28:00 nlp_adapter_mixins:245] After adding PEFT params:
      | Name  | Type     | Params | Mode 
    -------------------------------------------
    0 | model | GPTModel | 8.0 B  | train
    -------------------------------------------
    10.5 M    Trainable params
    8.0 B     Non-trainable params
    8.0 B     Total params
    32,162.988Total estimate

[NeMo W 2024-10-09 14:28:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:161: You have overridden `MegatronGPTSFTModel.configure_sharded_model` which is deprecated. Please override the `configure_model` hook instead. Instantiation with the newer hook will be created on the device right away and have the right data type depending on the precision setting in the Trainer.
    
[NeMo W 2024-10-09 14:28:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:143: You are using the `dataloader_iter` step flavor. If you consume the iterator more than once per step, the `batch_idx` argument in any hook that takes it will not match with the batch index of the last batch consumed. This might have unforeseen effects on callbacks or code that expects to get the correct index. This will also not work well with gradient accumulation. This feature is very experimental and subjec

[NeMo I 2024-10-09 14:28:03 megatron_gpt_sft_model:793] Building GPT SFT test datasets.
[NeMo I 2024-10-09 14:28:03 text_memmap_dataset:116] Building data files
[NeMo I 2024-10-09 14:28:03 text_memmap_dataset:525] Processing 1 data files using 127 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

[NeMo I 2024-10-09 14:28:06 text_memmap_dataset:495] Building indexing for fn = /root/exp1/data/curated/final/law-qa-test_preprocessed-n128.jsonl
[NeMo I 2024-10-09 14:28:06 text_memmap_dataset:507] Saving idx file = /root/exp1/data/curated/final/law-qa-test_preprocessed-n128.jsonl.idx.npy
[NeMo I 2024-10-09 14:28:06 text_memmap_dataset:509] Saving metadata file = /root/exp1/data/curated/final/law-qa-test_preprocessed-n128.jsonl.idx.info
[NeMo I 2024-10-09 14:28:06 text_memmap_dataset:535] Time building 1 / 1 mem-mapped files: 0:00:02.992823
[NeMo I 2024-10-09 14:28:06 text_memmap_dataset:525] Processing 1 data files using 127 workers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

[NeMo I 2024-10-09 14:28:09 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:03.054275
[NeMo I 2024-10-09 14:28:09 text_memmap_dataset:158] Loading data files
[NeMo I 2024-10-09 14:28:09 text_memmap_dataset:249] Loading /root/exp1/data/curated/final/law-qa-test_preprocessed-n128.jsonl
[NeMo I 2024-10-09 14:28:09 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.001248
[NeMo I 2024-10-09 14:28:09 text_memmap_dataset:165] Computing global indices
[NeMo I 2024-10-09 14:28:09 megatron_gpt_sft_model:796] Length of test dataset: 128
[NeMo I 2024-10-09 14:28:09 megatron_gpt_sft_model:819] Building dataloader with consumed samples: 0


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
[NeMo W 2024-10-09 14:28:09 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=254` in the `DataLoader` to improve performance.
    
[NeMo W 2024-10-09 14:28:09 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py:149: Found `dataloader_iter` argument in the `test_step`. Note that the support for this signature is experimental and the behavior is subject to change.
    


Testing: |          | 0/? [00:00<?, ?it/s]setting number of micro-batches to constant 32


      input_info_tensor = torch.cuda.FloatTensor(input_info)
    
      string_tensor = torch.as_tensor(
    


Testing DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s]setting number of micro-batches to constant 1
setting number of micro-batches to constant 32
Testing DataLoader 0:  25%|██▌       | 1/4 [01:00<03:01,  0.02it/s]setting number of micro-batches to constant 1
setting number of micro-batches to constant 32
Testing DataLoader 0:  50%|█████     | 2/4 [02:16<02:16,  0.01it/s]setting number of micro-batches to constant 1
setting number of micro-batches to constant 32
Testing DataLoader 0:  75%|███████▌  | 3/4 [03:29<01:09,  0.01it/s]setting number of micro-batches to constant 1
setting number of micro-batches to constant 32
Testing DataLoader 0: 100%|██████████| 4/4 [04:25<00:00,  0.02it/s][NeMo I 2024-10-09 14:32:35 megatron_gpt_sft_model:551] Total deduplicated inference data size: 128 to 128
[NeMo I 2024-10-09 14:32:35 megatron_gpt_sft_model:702] Predictions saved to law_titlegen_lora_test_law_inputs_preds_labels.jsonl


[NeMo W 2024-10-09 14:32:35 megatron_gpt_sft_model:642] No training data found, reconfiguring microbatches based on validation batch sizes.


setting number of micro-batches to constant 32


[NeMo W 2024-10-09 14:32:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    
[NeMo W 2024-10-09 14:32:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('test_loss_law', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    
[NeMo W 2024-10-09 14:32:35 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:439: It is recommended to use `self.log('test_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
    


Testing DataLoader 0: 100%|██████████| 4/4 [04:25<00:00,  0.02it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m       Test metric       [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36m        test_loss        [0m[36m [0m│[35m [0m[35m   1.5235300064086914    [0m[35m [0m│
│[36m [0m[36m      test_loss_law      [0m[36m [0m│[35m [0m[35m   1.5235300064086914    [0m[35m [0m│
│[36m [0m[36m        val_loss         [0m[36m [0m│[35m [0m[35m   1.5235300064086914    [0m[35m [0m│
└───────────────────────────┴───────────────────────────┘


### Step 4: Check the model accuracy

Now that the results are in, let's read the results and calculate the accuracy on the question title generation task.
Let's take a look at one of the predictions in the generated output file. The `pred` key indicates what was generated.

In [46]:
# Take a look at predictions
!head -n1  law_titlegen_lora_test_law_inputs_preds_labels.jsonl

{"input": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In order to be sued in a particular jurisdiction, say New York, a company must have a minimal business presence in the jurisdiction. What constitutes such a presence? Suppose the company engaged a New York-based Plaintiff, and its representatives signed the contract with the Plaintiff in New York City. Does this satisfy the minimum presence rule? Suppose, instead, the plaintiff and contract signing were in New Jersey, but the company hired a law firm with offices in New York City. Does this qualify? \nTITLE:", "pred": " What constitutes a minimal business presence in a jurisdiction?", "label": " What constitutes \"doing business in a jurisdiction?\""}


For evaluating this task, we will use [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)).  It measures overlap of ngrams, and a higher score is better. While it's not perfect and it misses capturing the semantics of the prediction, it is a popular metric in academia and industry for evaluating such systems. 

The following method uses the `rouge_score` library to implement scoring. It will report `ROUGE_{1/2/L/Lsum}` metrics.

In [53]:
def compute_rouge(input_file: str) -> dict:
    ROUGE_KEYS = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
    scorer = rouge_scorer.RougeScorer(ROUGE_KEYS, use_stemmer=True)
    aggregator = scoring.BootstrapAggregator()
    lines = [json.loads(line) for line in open(input_file)]
    num_response_words = []
    num_ref_words = []
    for idx, line in enumerate(lines):
        prompt = line['input']
        response = line['pred']
        answer = line['label']
        scores = scorer.score(response, answer)
        aggregator.add_scores(scores)
        num_response_words.append(len(response.split()))
        num_ref_words.append(len(answer.split()))

    result = aggregator.aggregate()
    rouge_scores = {k: round(v.mid.fmeasure * 100, 4) for k, v in result.items()}
    print(rouge_scores)
    print(f"Average and stddev of response length: {np.mean(num_response_words):.2f}, {np.std(num_response_words):.2f}")
    print(f"Average and stddev of ref length: {np.mean(num_ref_words):.2f}, {np.std(num_ref_words):.2f}")

    return rouge_scores

In [54]:
compute_rouge("./law_titlegen_lora_test_law_inputs_preds_labels.jsonl")

{'rouge1': 39.972, 'rouge2': 19.6546, 'rougeL': 35.9545, 'rougeLsum': 35.8495}
Average and stddev of response length: 10.40, 4.38
Average and stddev of ref length: 11.26, 4.97


{'rouge1': 39.972, 'rouge2': 19.6546, 'rougeL': 35.9545, 'rougeLsum': 35.8495}

For the Llama-3.1-8B-Instruct model, you should see accuracy comparable to the below:
```
{'rouge1': 39.2082, 'rouge2': 18.8573, 'rougeL': 35.4098, 'rougeLsum': 35.3906}
```