![NVIDIA Logo](images/nvidia.png)

# LoRA for Extractive Question Answering

In this notebook you will fine tune GPT8B with LoRA to perform extractive question answering.

![Extract LoRA](images/extract_lora.png)

---

## Learning Objectives

By the time you complete this notebook you will be able to:
- LoRA fine tune a GPT8B model for extractive question answering.

---

## Imports

In [None]:
import json

from llm_utils.models import LoraModels, Models
from llm_utils.nemo_service_models import NemoServiceBaseModel
from llm_utils.mocks import upload_qa as upload
from llm_utils.mocks import create_qa_lora_customization as create_customization

---

## List Models

In [None]:
LoraModels.list_models()

---

## Load Train Data From File

We will begin this notebook by loading the train and test prompt and label data we created in the previous notebook.

---

In [None]:
with open('data/squad_prompts_and_answers.json', 'r') as f:
    prompts_and_answers = json.load(f)

In [None]:
len(prompts_and_answers)

---

## Split Data

In preparation for fine-tuning, let's split the data, which currently contains over 2000 samples. We'll create a training set of 1000 samples, a validation set of 200 samples, and a small test set of 20 samples.

In [None]:
train_n = 1000
val_n = 200
test_n = 20

train_end = train_n
val_end = train_end + val_n
test_end = val_end + test_n

train_prompts_and_answers = prompts_and_answers[:train_n]
val_prompts_and_answers = prompts_and_answers[train_n: train_n+val_n]
test_prompts_and_answers = prompts_and_answers[train_n+val_n: train_n+val_n+test_n]

In [None]:
len(train_prompts_and_answers)

In [None]:
len(val_prompts_and_answers)

In [None]:
len(test_prompts_and_answers)

---

## Exercise: Format Data Fine-tuning

For this exercise, you will format `train_prompts_and_answers` and `val_prompts_and_answers` for NeMo Service fine tuning.

As a reminder, NeMo Service expects that data be in JSON Lines (`jsonl`) format, with each line in the file being in the following format:

```python
{"prompt": <prompt>, "completion": <completion/label>}
```

Your task is to populate the `qa_lora_train_data` and `qa_lora_val_data` lists with one dictionary for each data sample in `train_prompts_and_answers` and `val_prompts_and_answers` respectively, formatted as needed for NeMo Service LoRA fine-tuning.

If you get stuck, feel free to look at the solution below.

### Your Work Here

In [None]:
qa_lora_train_data = []
qa_lora_val_data = []

### Solution

In [None]:
qa_lora_train_data = [{'prompt': prompt, 'completion': answer} for prompt, answer in train_prompts_and_answers]

In [None]:
qa_lora_val_data = [{'prompt': prompt, 'completion': answer} for prompt, answer in val_prompts_and_answers]

Here we see examples of data well-formatted for p-tuning.

In [None]:
qa_lora_train_data[0]

In [None]:
qa_lora_val_data[0]

---

## Write NeMo Customization Data to File

We will ultimately upload our p-tuning data to the NeMo Service where it can be used for fine tuning. First we need to write it to file.

In [None]:
qa_nemo_train_filename = 'data/squad_nemo_train_prompts_and_answers_1000.jsonl'
qa_nemo_val_filename = 'data/squad_nemo_val_prompts_and_answers_200.jsonl'

In [None]:
with open(qa_nemo_train_filename, 'w') as f:
    for p_and_a in qa_lora_train_data:
        f.write(json.dumps(p_and_a) + '\n')

In [None]:
with open(qa_nemo_val_filename, 'w') as f:
    for p_and_a in qa_lora_val_data:
        f.write(json.dumps(p_and_a) + '\n')

---

## Upload Data to NeMo Service

With the data written to file in JSON lines format, we can now upload it to NeMo Service. As we did earlier, we will mock this step.

In [None]:
train_response = upload(qa_nemo_train_filename)

In [None]:
train_response

In [None]:
val_response = upload(qa_nemo_val_filename)

In [None]:
val_response

---

## Exercise: LoRA Fine-tune GPT8B for Extractive QA

For this exercise you will perform LoRA fine-tuning on GPT8B with the training and validation data you just wrote to file.

### Your Work Here

Correctly launch a (mock) LoRA customization using `create_customization` immediately below. On success, when you ascertain the customization ID, set the `customization_id` variable below to it for use later in the notebook.

In order to complete this task you'll need to pass `create_customization` the following arguments:
- `model`: This should be a LoRA fine-tuneable GPT8B model. You can use the `LoraModels` enum provided above if you wish.
- `training_dataset_file_id`: This should be the file ID returned to you above when you (mock) uploaded the training data to NeMo Service.
- `validation_dataset_file_id`: This should be the file ID returned to you above when you (mock) uploaded the validation data to NeMo Service.
- `adapter_dim`: Use the default value of `32`.
- `epochs`: Train for 1 epoch.

Worth mentioning is that since we are not providing `validation_data` explicity, NeMo Service will simply use 10% of the training data we provide for validation.

If you get stuck, feel free to check out the *Solution* below.

In [None]:
create_customization()

In [None]:
customization_id = ''

### Solution

In [None]:
create_customization(model=LoraModels.gpt8b.value,
                     training_dataset_file_id='f17e25cd-fd08-42b4-a508-12f48985be35',
                     validation_dataset_file_id='30655aa3-17de-41b1-8d73-ddd4a3fadded',
                     adapter_dim=32,
                     epochs=1)

In [None]:
customization_id = 'ebd552dc-a050-4987-afca-9136d45fbad1'

---

## Perform Extractive QA with GPT8B LoRA

Next we will try the LoRA fine-tuned GPT8B model for the extractive QA task. First we create a model instance, using the LoRA GPT8B base model and providing the model customization ID ascertained from NeMo Service.

In [None]:
gpt8b_lora = NemoServiceBaseModel(LoraModels.gpt8b.value, customization_id=customization_id)

### Sanity Check

Let's try a single QA prompt out on GPT8B.

In [None]:
prompt, label = test_prompts_and_answers[10]

In [None]:
prompt

In [None]:
label

In [None]:
gpt8b_lora.generate(prompt).strip()

At a glance it looks like the LoRA fine-tuned GPT8B model is doing well. Unlike in the previous notebook where we used just the base GPT8B model, this response does not go on and on, the answer looks to be extracted directly from the text, and is correct.

### Try on Test Data

Now let's try the fine-tuned GPT8B model on the full test set.

In [None]:
for prompt, answer in test_prompts_and_answers:
    response = gpt8b_lora.generate(prompt).strip()
    print(f'Response: {response}')
    print(f'Label: {answer}\n')

### Analysis

The LoRA fine-tuned GPT8B model is not peforming perfectly, however it is doing a relatively good job. At times its answers are incorrect, and it sometimes lists out its responses, but for the most part it is able to perform the task we would like. We will be interested to see how it peforms on the task we intend it for instead of responding to the SQuAD questions.