# Finetuning a large language model to generate next best actions in a Text Adventure Game.
###  This notebook finetunes a Llama 3.1-8b-Instruct model using a Lora Adaptor and SciWorld data.

This notebook showcases performing LoRA finetuning on **Llama 3.1-8B-Instruct** using the Sciworld dataset. It is derived from the tutorial posted at https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama-3/sdg-law-title-generation/llama3-sdg-lora-nemofw.ipynb which finetunes the Llama 3.1 8B Instruct model and has been been adapted to work with the DragonFire use case to explore current game state and generate the next recommended game action.  A tutorial for the original source notebook use law data is available at https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama-3/sdg-law-title-generation/README.rst.  

We use the NVIDIA NeMo Framework which simplifies the pipeline to finetune the LLM.  We run the finetuning portion of the experiment inside the latest NeMo contained that we pulled from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo.  This container is publicly available and should be run on either a linux machine or windows with wsl. This can also be done in a cloud GPU instance. Sufficient memory, GPU, and resources should be available. We used a workstation with a NVIDIA A6000 GPU with 48gb of memory.  

To replicate, the commands used: 
This command creates the directory to store code files  
mkdir ~/nvdata  

This command pulls and runs the NeMo container  
docker run --gpus all --runtime=nvidia -it --rm -v --shm-size=16g -p 8888:8888 -p 6006:6006 \    
 -v ~/nvdata:/workspace/nvdata \  
--ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:24.07    

Once the container downloads are you are at the workspace prompt, within the container run:
pip install ipywidgets  
pip install scienceworld  
jupyter-lab --allow-root --ip='0.0.0.0'  

Now you can go to a browser and run this notebook.  

For this demonstration, we will tune the model on the task of title/subject generation, that is, given a Law StackExchange forum question, auto-generate an appropriate title for it.


In [1]:
import os
import json
import numpy as np
from rouge_score import rouge_scorer, scoring

### Retrieve the SciWorld dataset
We are finetuning the Llamamodel using the SciWorld Dataset.  Sciworld is an environment that contains multistep science tasks that is used to test out how well text based game and science agents can solve these tasts.  The environment is at https://github.com/allenai/ScienceWorld.   

For this experiment, we are using a version of this dataset used by Calmworld-GPT2 located at https://github.com/cognitiveailab/calm-scienceworld/tree/master/data.  This is a CALM project adapted for SciWorld. The CALM Project is the implemented code for the paper and study, Keep CALM and Explore: Language Models for Action Generation in Text-based Games. The paper is available at https://arxiv.org/abs/2010.02903.  While the study focused on the GPT2 model, we are expanding to use Llama 3.1 7B Instruct and GPT4.  

In [None]:
#if needed, create a data directory and pull in the formatted sciworld data
#unzip the sciworld data and save to data directory
!mkdir -p data
!cd data && wget https://github.com/cognitiveailab/calm-scienceworld/blob/master/data/sciworld_formatted_data.zip
!unzip data/sciworld_formatted_data.zip

#### Preparing dataset
Helper functions to clean up the data format it for fine-tuning  

The convert_for_finetuning functions build the prompts and instruct the model on how to output the results of a query.  
There are two different versions of the functions, one for Gpt4o and chat style LLMs and one for Llama and instruct style LLMs.  

In [2]:
import json
import pandas as pd

def write_jsonl(fname, json_objs):
    with open(fname, 'wt') as f:
        for o in json_objs:
            f.write(json.dumps(o)+"\n")

#parse json file into list for exploration
def load_jsonl(file_path):
    data = []
    with open(file_path, 'r') as file:
        for line in file:
            data.append(json.loads(line))
    return data

#used for llama instruct style models
def convert_for_finetuning(json_filename,output_path):
    sci_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''Generate the next action towards solving a task for the following observations in an interactive fiction game. The action should be the optimal action.'''
    for item in sci_data:
        #data={}
        #item_input =item['input']
        #item_output = item['target']
        #prompt = item_input
        #completion = item_output
        data = { "input": f'''{prompt} {item["input"]} \nAction: ''',
                "output": item['target'] }
       
        #data['input'] = {item_input}
        #data['output'] = {item_output}
        json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs

#used for GPT chat style models
def convert_for_gpt_finetuning(json_filename, output_path):
    sci_data = load_jsonl(json_filename)
    system_message = "You are trying to solve a task in an interactive fiction game. You will be provided with the task " \
    "and observations about your location such as the room and items. You must decide what is recommended next action" \
    "to help complete the task given the observations. You will respond with just the action." 
    json_objs = []
    for item in sci_data:
        data={}
        item_input = item['input']
        item_output = item['target']
        prompt = [{"role": "system","content": system_message},{"role": "user","content": item_input},{"role":"assistant","content":item_output}]
        data['messages'] = prompt
        json_objs.append(data)
    #sdf = pd.DataFrame(json_objs)
    write_jsonl(output_path,json_objs)
    return json_objs

In [3]:
#function takes in pandas df with input and output datasets and converts to multichoice format for RoBERTa model training
def convert_to_multichoice(df):
    
    df = pd.DataFrame(df)

    # Create a new dataframe with the multichoice format
    df_multichoice = pd.DataFrame(columns=['sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label'])

    # Loop through the original dataset
    for index, row in df.iterrows():
        # Set the value of sent2 to the input value
        sent2 = row['input']

        # Randomly select three outputs from the entire output column
        outputs = df['output'].sample(n=3, replace=False).tolist()

        # Add the actual ground truth output to the list
        outputs.append(row['output'])

        # Shuffle the list to randomize the position of the ground truth output
        np.random.shuffle(outputs)

        # Create a new row for the multichoice dataframe
        new_row = {
            'sent2': sent2,
            'ending0': outputs[0],
            'ending1': outputs[1],
            'ending2': outputs[2],
            'ending3': outputs[3],
            'label': outputs.index(row['output'])
        }

        # Append the new row to the multichoice dataframe
        df_multichoice = df_multichoice._append(new_row, ignore_index=True)
    return df_multichoice

In [None]:
test_dataset = Dataset.from_pandas(test_data)

In [None]:
test_json_objs = to_multichoice("./data/sciworld_test1_set.jsonl","data/sciworld_test1_rset.jsonl")
type(test_json_objs)

In [10]:
from datasets import load_dataset
dataset = load_dataset('json', data_files='data/sciworld_test1_set.jsonl')


Generating train split: 0 examples [00:00, ? examples/s]

In [13]:
train_dataset = dataset['train']

In [None]:
from datasets import load_dataset
dataset = load_dataset('json', data_files='data/sciworld_train1_set.jsonl')


In [14]:
test_df = pd.DataFrame(train_dataset)
test_df.head()

Unnamed: 0,input,output
0,Generate the next action towards solving a tas...,open door to kitchen [SEP]
1,Generate the next action towards solving a tas...,go to kitchen [SEP]
2,Generate the next action towards solving a tas...,pick up thermometer [SEP]
3,Generate the next action towards solving a tas...,open cupboard [SEP]
4,Generate the next action towards solving a tas...,pick up metal pot [SEP]


In [16]:
df_mc = convert_to_multichoice(test_df)
print(df_mc.head())
df_mc.to_json('data/sciworld_mc.jsonl')

                                               sent2  \
0  Generate the next action towards solving a tas...   
1  Generate the next action towards solving a tas...   
2  Generate the next action towards solving a tas...   
3  Generate the next action towards solving a tas...   
4  Generate the next action towards solving a tas...   

                                             ending0  \
0               examine stopwatch in inventory [SEP]   
1                                go to kitchen [SEP]   
2  look at inclined plane with a rubber surface [...   
3  look at inclined plane with a steel surface [SEP]   
4  look at inclined plane with a plastic surface ...   

                                             ending1  \
0                         open door to kitchen [SEP]   
1                         open door to outside [SEP]   
2                   pour jug into flower pot 2 [SEP]   
3                                open cupboard [SEP]   
4  look at inclined plane with a sandpaper sur

In [17]:
tr_dataset = load_dataset('json', data_files='data/sciworld_train1_set.jsonl')
train_dataset = tr_dataset['train']
train_df = pd.DataFrame(train_dataset)
df_train_mc = convert_to_multichoice(train_df)

Generating train split: 0 examples [00:00, ? examples/s]

In [20]:
df_train_mc.head()

Unnamed: 0,sent2,ending0,ending1,ending2,ending3,label
0,Generate the next action towards solving a tas...,connect unknown substance F terminal 2 to blac...,open door to kitchen [SEP],focus on inclined plane with a ceramic surface...,look at inclined plane with a rubber surface [...,1
1,Generate the next action towards solving a tas...,go to kitchen [SEP],focus on inclined plane with a brass surface [...,pour jug into flower pot 7 [SEP],go to outside [SEP],0
2,Generate the next action towards solving a tas...,pick up thermometer [SEP],deactivate sink [SEP],go to kitchen [SEP],move jug to sink [SEP],0
3,Generate the next action towards solving a tas...,look at inclined plane with a sandpaper surfac...,look at inclined plane with a chocolate surfac...,open cupboard [SEP],connect green wire terminal 2 to cathode in re...,2
4,Generate the next action towards solving a tas...,pick up metal pot [SEP],look at inclined plane with a platinum surface...,look at inclined plane with a sandpaper surfac...,look at inclined plane with a rubber surface [...,0


In [18]:
v_dataset = load_dataset('json', data_files='data/sciworld_val1_set.jsonl')
val_dataset = v_dataset['train']
val_df = pd.DataFrame(val_dataset)
val_train_mc = convert_to_multichoice(val_df)

Generating train split: 0 examples [00:00, ? examples/s]

In [48]:
val_train_mc.shape

(92610, 11)

In [49]:
df_train_mc.shape

(202760, 11)

In [43]:
#now lets process all the data files to generate datasets for finetuning
test_json_objs = convert_for_finetuning("./data/sciworld_formatted_test.jsonl", "data/sciworld_test1_set.jsonl")
train_json_objs = convert_for_finetuning("./data/sciworld_formatted_train.jsonl", "data/sciworld_train1_set.jsonl")
dev_json_objs = convert_for_finetuning("./data/sciworld_formatted_val.jsonl", "data/sciworld_val1_set.jsonl")

In [12]:
DATA_DIR = os.path.join("data")

!ls {DATA_DIR}

sciworld_formatted_test.jsonl	sciworld_train_set.jsonl.idx.info
sciworld_formatted_train.jsonl	sciworld_train_set.jsonl.idx.npy
sciworld_formatted_val.jsonl	sciworld_val_set.jsonl
sciworld_test_set.jsonl		sciworld_val_set.jsonl.idx.info
sciworld_train_set.jsonl	sciworld_val_set.jsonl.idx.npy


In [44]:
TRAIN_DS = os.path.join(DATA_DIR, "sciworld_train1_set.jsonl")
VAL_DS = os.path.join(DATA_DIR, "sciworld_val1_set.jsonl")
TEST_DS = os.path.join(DATA_DIR, "sciworld_test1_set.jsonl")

In [None]:
#these are extra columns in multichoice data, not sure if I need them
#videoid = 'xxx'
#fold_ind = index
#startphrase = 'xxx'
#sent1 = 'xxx'
#goldsource = 'xxx'


In [42]:
dftest = df_mc
dftest.head()

Unnamed: 0,sent2,ending0,ending1,ending2,ending3,label,videoid,startphrase,sent1,goldsource,fold_ind
0,Generate the next action towards solving a tas...,examine stopwatch in inventory [SEP],open door to kitchen [SEP],look at inclined plane with a sandpaper surfac...,connect battery anode to orange wire terminal ...,1,xxx,xxx,xxx,xxx,0
1,Generate the next action towards solving a tas...,go to kitchen [SEP],open door to outside [SEP],move egg crocodile egg in inventory to orange ...,look at inclined plane with a sandpaper surfac...,0,xxx,xxx,xxx,xxx,1
2,Generate the next action towards solving a tas...,look at inclined plane with a rubber surface [...,pour jug into flower pot 2 [SEP],open cupboard [SEP],pick up thermometer [SEP],3,xxx,xxx,xxx,xxx,2
3,Generate the next action towards solving a tas...,look at inclined plane with a steel surface [SEP],open cupboard [SEP],look at inclined plane with a sandpaper surfac...,look at inclined plane with a sandpaper surfac...,1,xxx,xxx,xxx,xxx,3
4,Generate the next action towards solving a tas...,look at inclined plane with a plastic surface ...,look at inclined plane with a sandpaper surfac...,look at inclined plane with a brass surface [SEP],pick up metal pot [SEP],3,xxx,xxx,xxx,xxx,4


In [36]:
dftest.shape

(133628, 6)

In [43]:
dftest['videoid'] = 'xxx'
dftest['fold_ind'] = dftest.index
dftest['startphrase'] = 'xxx'
dftest['sent1'] = 'xxx'
dftest['gold-source'] = 'xxx'
dftest.head()

Unnamed: 0,sent2,ending0,ending1,ending2,ending3,label,videoid,startphrase,sent1,goldsource,fold_ind,gold-source
0,Generate the next action towards solving a tas...,examine stopwatch in inventory [SEP],open door to kitchen [SEP],look at inclined plane with a sandpaper surfac...,connect battery anode to orange wire terminal ...,1,xxx,xxx,xxx,xxx,0,xxx
1,Generate the next action towards solving a tas...,go to kitchen [SEP],open door to outside [SEP],move egg crocodile egg in inventory to orange ...,look at inclined plane with a sandpaper surfac...,0,xxx,xxx,xxx,xxx,1,xxx
2,Generate the next action towards solving a tas...,look at inclined plane with a rubber surface [...,pour jug into flower pot 2 [SEP],open cupboard [SEP],pick up thermometer [SEP],3,xxx,xxx,xxx,xxx,2,xxx
3,Generate the next action towards solving a tas...,look at inclined plane with a steel surface [SEP],open cupboard [SEP],look at inclined plane with a sandpaper surfac...,look at inclined plane with a sandpaper surfac...,1,xxx,xxx,xxx,xxx,3,xxx
4,Generate the next action towards solving a tas...,look at inclined plane with a plastic surface ...,look at inclined plane with a sandpaper surfac...,look at inclined plane with a brass surface [SEP],pick up metal pot [SEP],3,xxx,xxx,xxx,xxx,4,xxx


In [44]:
dftest['fold_ind'] = dftest.index

dftest.head

<bound method NDFrame.head of                                                     sent2  \
0       Generate the next action towards solving a tas...   
1       Generate the next action towards solving a tas...   
2       Generate the next action towards solving a tas...   
3       Generate the next action towards solving a tas...   
4       Generate the next action towards solving a tas...   
...                                                   ...   
133623  Generate the next action towards solving a tas...   
133624  Generate the next action towards solving a tas...   
133625  Generate the next action towards solving a tas...   
133626  Generate the next action towards solving a tas...   
133627  Generate the next action towards solving a tas...   

                                                  ending0  \
0                    examine stopwatch in inventory [SEP]   
1                                     go to kitchen [SEP]   
2       look at inclined plane with a rubber surface [

In [45]:
dftest = dftest.reindex(columns=['videoid', 'fold_ind', 'startphrase', 'sent1', 'sent2','gold-source','ending0','ending1','ending2','ending3','label'])
dftest.head()

Unnamed: 0,videoid,fold_ind,startphrase,sent1,sent2,gold-source,ending0,ending1,ending2,ending3,label
0,xxx,0,xxx,xxx,Generate the next action towards solving a tas...,xxx,examine stopwatch in inventory [SEP],open door to kitchen [SEP],look at inclined plane with a sandpaper surfac...,connect battery anode to orange wire terminal ...,1
1,xxx,1,xxx,xxx,Generate the next action towards solving a tas...,xxx,go to kitchen [SEP],open door to outside [SEP],move egg crocodile egg in inventory to orange ...,look at inclined plane with a sandpaper surfac...,0
2,xxx,2,xxx,xxx,Generate the next action towards solving a tas...,xxx,look at inclined plane with a rubber surface [...,pour jug into flower pot 2 [SEP],open cupboard [SEP],pick up thermometer [SEP],3
3,xxx,3,xxx,xxx,Generate the next action towards solving a tas...,xxx,look at inclined plane with a steel surface [SEP],open cupboard [SEP],look at inclined plane with a sandpaper surfac...,look at inclined plane with a sandpaper surfac...,1
4,xxx,4,xxx,xxx,Generate the next action towards solving a tas...,xxx,look at inclined plane with a plastic surface ...,look at inclined plane with a sandpaper surfac...,look at inclined plane with a brass surface [SEP],pick up metal pot [SEP],3


In [21]:
#save sciworld data to multichoice format to train robert for intent dectection sciworld
val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val.jsonl")
#save training mc data to json file
df_train_mc.to_json('data/intent/sciworld_train_mc.jsonl')
#save validation to json file
val_train_mc.to_json('data/intent/sciworld_val_mc.jsonl')
#save test mc data to json file
df_mc.to_json('data/intent/sciworld_test_mc.jsonl')

In [50]:
#save train to csv
dftrain = df_train_mc
dftrain['videoid'] = 'xxx'
dftrain['fold_ind'] = dftrain.index
dftrain['startphrase'] = 'xxx'
dftrain['sent1'] = 'xxx'
dftrain['gold-source'] = 'xxx'
dftrain = dftrain.reindex(columns=['videoid', 'fold_ind', 'startphrase', 'sent1', 'sent2','gold-source','ending0','ending1','ending2','ending3','label'])
dftrain.to_csv('data/intent/train.csv')

#save validation to csv file

dfval = val_train_mc
dfval['videoid'] = 'xxx'
dfval['fold_ind'] = dfval.index
dfval['startphrase'] = 'xxx'
dfval['sent1'] = 'xxx'
dfval['gold-source'] = 'xxx'
dfval = dfval.reindex(columns=['videoid', 'fold_ind', 'startphrase', 'sent1', 'sent2','gold-source','ending0','ending1','ending2','ending3','label'])
dfval.to_csv('data/intent/val.csv')

#save test mc data to csv file
dftest = df_mc
dftest['videoid'] = 'xxx'
dftest['fold_ind'] = dftest.index
dftest['startphrase'] = 'xxx'
dftest['sent1'] = 'xxx'
dftest['gold-source'] = 'xxx'
dftest = dftest.reindex(columns=['videoid', 'fold_ind', 'startphrase', 'sent1', 'sent2','gold-source','ending0','ending1','ending2','ending3','label'])
dftest.to_csv('data/intent/test.csv')

##derived from wikihow

In [None]:
import os

# the benchmark task you want to train/evaluate on. 'goal': goal inference, 'step': step inference, 'order': step ordering.
#task = ['sgd', 'snips', 'fb-en', 'fb-es', 'fb-th'][0]
#task = 'intent'

# the name of the model you want to train/evaluate on.
#modelName = ['intent_enwh_rl', 'intent_enwh_xlmr', 'intent_eswh', 'intent_thwh', 'intent_snips_wh_id', 'intent_snips_id', 'intent_sgd_wh_id', 'intent_sgd_id', 'intent_fb-en_wh_id_rl', 'intent_fb-en_id_rl', 'intent_fb-es_wh_id', 'intent_fb-es_enwh_id', 'intent_fb-es_id', 'intent_fb-th_wh_id', 'intent_fb-th_enwh_id', 'intent_fb-th_id'][0]

In [28]:
#for convenience clone transformers libraries into the root of your project
#git clone https://github.com/huggingface/transformers.git
!ls transformers/examples/legacy/multiple_choice

run_multiple_choice.py	utils_multiple_choice.py


In [29]:
#use RoBERTa-large
import os
task = 'intent'
modelName = 'FacebookAI/roberta-large'
#os.environ['MODEL_NAME'] = f'zharry29/{modelName}'
os.environ['MODEL_NAME'] = 'FacebookAI/roberta-large'
os.environ['DATA_DIR'] = f'./data/{task}/'
os.environ['OUTPUT_DIR'] = f'./output/{task}_{modelName}'

!python transformers/examples/legacy/multiple_choice/run_multiple_choice.py \
  --task_name swag \
  --model_name_or_path $MODEL_NAME \
  --do_eval \
  --data_dir $DATA_DIR \
  --max_seq_length 200 \
  --per_gpu_eval_batch_size=16 \
  --output_dir $OUTPUT_DIR \
  --overwrite_output

12/07/2024 16:53:18 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_

### Evaluation Inference 

We wil check how well the model predicts the next action on a subset of the test data.

In [51]:
# Create a smaller test subset for a quick eval demonstration.

!head -n 128 ./data/sciworld_test1_set.jsonl > ./data/sciworld_test1_set-n128.jsonl

### Step 4: Check the model accuracy

Now that the results are in, let's read the results and calculate the accuracy on the question title generation task.
Let's take a look at one of the predictions in the generated output file. The `pred` key indicates what was generated.

The predictions for the the subset of the test dataset is output to sciworld_lora_test_sci_inputs_preds_labels.jsonl file. The end of the file contains a prediction and a label. The prediction is the LLM prediction of the action given the game state, an dthe label is the ground truth value.  You can see the above row acurately predicts the action.

For evaluating this task, we will use [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)).  It measures overlap of ngrams, and a higher score is better. While it's not perfect and it misses capturing the semantics of the prediction, it is a popular metric in academia and industry for evaluating such systems. 

The following method uses the `rouge_score` library to implement scoring. It will report `ROUGE_{1/2/L/Lsum}` metrics.

In [57]:
def compute_rouge(input_file: str) -> dict:
    ROUGE_KEYS = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
    scorer = rouge_scorer.RougeScorer(ROUGE_KEYS, use_stemmer=True)
    aggregator = scoring.BootstrapAggregator()
    lines = [json.loads(line) for line in open(input_file)]
    num_response_words = []
    num_ref_words = []
    for idx, line in enumerate(lines):
        prompt = line['input']
        response = line['pred']
        answer = line['label']
        scores = scorer.score(response, answer)
        aggregator.add_scores(scores)
        num_response_words.append(len(response.split()))
        num_ref_words.append(len(answer.split()))

    result = aggregator.aggregate()
    rouge_scores = {k: round(v.mid.fmeasure * 100, 4) for k, v in result.items()}
    print(rouge_scores)
    print(f"Average and stddev of response length: {np.mean(num_response_words):.2f}, {np.std(num_response_words):.2f}")
    print(f"Average and stddev of ref length: {np.mean(num_ref_words):.2f}, {np.std(num_ref_words):.2f}")

    return rouge_scores

In [58]:
compute_rouge("./sciworld_lora_test_sci_inputs_preds_labels.jsonl")

{'rouge1': 78.2972, 'rouge2': 65.9794, 'rougeL': 78.3603, 'rougeLsum': 78.2974}
Average and stddev of response length: 4.92, 1.89
Average and stddev of ref length: 5.20, 1.81


{'rouge1': 78.2972, 'rouge2': 65.9794, 'rougeL': 78.3603, 'rougeLsum': 78.2974}

#### see inferencing noteoook for deploying model with fine tuned weights