### Fine Tune Large Language Model on a Custom Dataset with QLoRA
1. Import required libraries
2. wandb setting
3. Loading dataset
4. Create Bitsandbytes configuration
5. Loading the Pre-Trained model
6. Tokenization
7. Test the Model with Zero Shot Inferencing
8. Pre-processing dataset
9. Preparing the model for QLoRA
10. Setup PEFT for Fine-Tuning
11. Train PEFT Adapter
12. Evaluate the Model Qualitatively (Human Evaluation)
13. Evaluate the Model Quantitatively (with ROUGE Metric)

#### 1. Import required libraries

In [9]:
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np

#### 2. Wandb setting

In [2]:
import wandb

wandb.init(entity="sinjy1203", project="qlora_finetuning")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33msinjy1203[0m. Use [1m`wandb login --relogin`[0m to force relogin


#### 3. Loding dataset
- DialogSum dataset
- dialog summarization task

In [10]:
huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name, cache_dir="/media/shin/T7/huggingface/datasets")

In [11]:
dataset['train'] [0]

{'id': 'train_0',
 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.",
 'summary': "Mr. Smith'

#### 4. Create Bitsandbytes configuration
- Load model with 4-bit quantiazation (NF4)

In [5]:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16, # torch.bfloat16을 더 많이 사용하긴 함
    bnb_4bit_use_double_quant=True,
)

#### 5. Loading the Pre-Trained model
- Phi-2: 2.7B

In [6]:
model_name = 'microsoft/phi-2'
original_model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True,
    cache_dir="/media/shin/T7/huggingface/models"
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  2.67it/s]


#### 6. Tokenization
- left-padding
    - input이 "i love apple" 일때
    - right-padding -> output: i love apple [pad] [pad] because delicious
    - left-padding -> output: [pad] [pad] i love apple because delicious

In [7]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    padding_side='left',
    add_eos_token=True,
    add_bos_token=True,
    use_fast=False,
    cache_dir="/media/shin/T7/huggingface/tokenizers",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
tokenizer.pad_token = tokenizer.eos_token

#### 7. Test the Model with Zero Shot Inferencing

In [9]:
%%time
from transformers import set_seed
from transformers import pipeline
seed = 42
set_seed(seed)

index = 10

prompt = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

formatted_prompt = f"Instruct: Summarize the following conversation. \n{prompt}\nOutput:\n"
generator = pipeline('text-generation', model=original_model, tokenizer=tokenizer)
res = generator(formatted_prompt, max_length=512)
output = res[0]['generated_text'].split("Output:\n")[1]

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Instruct: Summarize the following conversation. 
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Output:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# attends Brian's birthday p

#### 8. Pre-processing dataset
- 모델이 이해할 수 있도록 prompt를 형식화 해야 함
- huggingface 모델 문서에 나와있는 것처럼 prompt 형식으로 만들어야 함

In [16]:
def create_prompt_formats(sample):
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarized the below conversation."
    RESPONSE_KEY = "### output:"
    END_KEY = "### End"

    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"

    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample['text'] = formatted_prompt

    return sample

In [11]:
from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length

In [12]:
def preprocess_batch(batch, tokenizer, max_length):
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True
    )

In [13]:
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

In [14]:
max_length = get_max_length(original_model)
print(max_length)

seed = 42
train_dataset = preprocess_dataset(tokenizer, max_length, seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length, seed, dataset['validation'])

Found max lenth: 2048
2048
Preprocessing dataset...


Preprocessing dataset...


#### 9. preparing the model for QLoRA

In [16]:
from peft import prepare_model_for_kbit_training
original_model = prepare_model_for_kbit_training(original_model) # preprocess the quantized model for traininng

#### 10. setup PEFT for Fine-Tuning
- lora_alpha
    - learned weight의 scaling factor = alpha / r
    - higher lora_alpha -> LoRA에 더 많은 가중치

In [18]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()

peft_model = get_peft_model(original_model, config)

In [19]:
peft_model.print_trainable_parameters()

trainable params: 20,971,520 || all params: 2,800,655,360 || trainable%: 0.7488075933770015


#### 11. Train PEFT Adapter

In [20]:
output_dir = f'/media/shin/T7/model_ckpt/peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="wandb",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

In [21]:
peft_trainer.train()



Step,Training Loss,Validation Loss
25,1.6813,1.397089
50,1.1921,1.389977
75,1.4495,1.353761
100,1.2053,1.365932
125,1.4403,1.344379
150,1.1351,1.36107
175,1.4031,1.339692
200,1.1498,1.34489
225,1.4489,1.334857
250,1.2225,1.33761


config.json: 100%|██████████| 863/863 [00:00<00:00, 1.62MB/s]


TrainOutput(global_step=1000, training_loss=1.2936927890777588, metrics={'train_runtime': 3275.5253, 'train_samples_per_second': 1.221, 'train_steps_per_second': 0.305, 'total_flos': 1.850993530739712e+16, 'train_loss': 1.2936927890777588, 'epoch': 2.0})

In [23]:
wandb.finish()



0,1
eval/loss,█▇▄▅▃▅▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/runtime,█▁▁▁▁▂▂▂▁▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▂▂▂▃▂
eval/samples_per_second,▁█████▇▇█▇▇██▇██▇▇██▇▇█▇▇▇▇▇█▇▇██▇██▇▇▆▇
eval/steps_per_second,▁███████████████████▆█████████████████▆█
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/grad_norm,▂█▂▂▁▂▁▂▁▁▁▂▁▂▁▂▁▂▁▁▁▁▁▂▁▂▁▂▁▁▁▁▁▁▁▁▁▂▁▁
train/learning_rate,███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
train/loss,█▂▅▂▅▁▅▁▅▂▅▂▅▂▅▂▅▂▅▂▅▁▄▂▄▁▄▁▅▂▅▁▅▁▅▁▄▁▄▁
train/total_flos,▁

0,1
eval/loss,1.31748
eval/runtime,54.9429
eval/samples_per_second,9.082
eval/steps_per_second,1.147
train/epoch,2.0
train/global_step,1000.0
train/grad_norm,0.29146
train/learning_rate,0.0
train/loss,1.1124
train/total_flos,1.850993530739712e+16


In [2]:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16, # torch.bfloat16을 더 많이 사용하긴 함
    bnb_4bit_use_double_quant=True,
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
base_model_id = 'microsoft/phi-2'
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    quantization_config=bnb_config,
    device_map="auto",
    cache_dir="/media/shin/T7/huggingface/models"
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  2.90it/s]


In [5]:
eval_tokenizer = AutoTokenizer.from_pretrained(
    base_model_id, 
    add_bos_token=True, 
    trust_remote_code=True, 
    use_fast=False,
    cache_dir="/media/shin/T7/huggingface/tokenizers"
)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(
    base_model, 
    "/media/shin/T7/model_ckpt/peft-dialogue-summary-training-1709195425/checkpoint-1000",
    torch_dtype=torch.float16,
    is_trainable=False
)

#### 12. Evaluate the Model Qualitatively (Human Evaluation)

In [None]:
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
INSTRUCTION_KEY = "### Instruct: Summarized the below conversation."
RESPONSE_KEY = "### output:"
END_KEY = "### End"

In [18]:
def create_test_prompt_formats(dialogue):
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarized the below conversation."
    RESPONSE_KEY = "### output:"
    END_KEY = "### End"

    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{dialogue}" if dialogue else None
    response = f"{RESPONSE_KEY}\n"

    parts = [part for part in [blurb, instruction, input_context, response] if part]

    formatted_prompt = "\n\n".join(parts)
    return formatted_prompt

In [19]:
index = 0
dialogue = create_test_prompt_formats(dataset['test'][index]['dialogue'])
print(dialogue)


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Summarized the below conversation.

#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to chan

In [30]:
from transformers import set_seed
from transformers import pipeline
seed = 42
set_seed(seed)

index = 7
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

# prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
prompt = create_test_prompt_formats(dialogue)

generator = pipeline('text-generation', model=ft_model, tokenizer=eval_tokenizer)
peft_model_res = generator(prompt, max_length=512)
# print(peft_model_res[0]['generated_text'])
peft_model_output = peft_model_res[0]['generated_text'].split('### output:\n')[1].split("###")[0]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCa

---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruct: Summarized the below conversation.

#Person1#: Kate, you never believe what's happened.
#Person2#: What do you mean?
#Person1#: Masha and Hero are getting divorced.
#Person2#: You are kidding. What happened?
#Person1#: Well, I don't really know, but I heard that they are having a separation for 2 months, and filed for divorce.
#Person2#: That's really surprising. I always thought they are well matched. What about the kids? Who get custody?
#Person1#: Masha, it seems quiet and makable, no quarrelling about who get the house and stock and then contesting the divorce with other details worked out.
#Person2#: That's the change from all the back stepping we usually hear about. Well, I still can't believe it, Masha and Hero, the perfect couple. When would they

#### 13. Evaluate the Model Quantitavely (with ROUGE Metric)
- ROUGE  
    summarization, translation task를 평가하기 위한 metric

In [21]:
model_name = 'microsoft/phi-2'
original_model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True,
    cache_dir="/media/shin/T7/huggingface/models"
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  2.69it/s]


In [None]:
import pandas as pd

original_generator = pipeline('text-generation', model=original_model, tokenizer=eval_tokenizer)
peft_generator = pipeline('text-generation', model=ft_model, tokenizer=eval_tokenizer)

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
    
    original_model_res = original_generator(prompt, max_length=512)
    original_model_text_output = original_model_res[0]['generated_text'].split('Output:\n')[1]
    
    prompt_ft = create_test_prompt_formats(dialogue)
    peft_model_res = generator(prompt_ft, max_length=512)
    peft_model_text_output = peft_model_res[0]['generated_text'].split('### output:\n')[1].split("###")[0]

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

In [34]:
df.head()

Unnamed: 0,human_baseline_summaries,original_model_summaries,peft_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,"#Person1#: Ms. Dawson, I need you to take a di...",#Person1# asks Ms. Dawson to take a dictation ...
1,In order to prevent employees from wasting tim...,"#Person1#: Ms. Dawson, I need you to take a di...",#Person1# asks Ms. Dawson to take a dictation ...
2,Ms. Dawson takes a dictation for #Person1# abo...,"#Person1#: Ms. Dawson, I need you to take a di...",#Person1# asks Ms. Dawson to take a dictation ...
3,#Person2# arrives late because of traffic jam....,Person1: You're finally here! What took so lon...,#Person2# got stuck in traffic again and #Pers...
4,#Person2# decides to follow #Person1#'s sugges...,Person1: You're finally here! What took so lon...,#Person2# got stuck in traffic again and #Pers...


In [36]:
import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

ORIGINAL MODEL:
{'rouge1': 0.20253279604070257, 'rouge2': 0.0649589839530041, 'rougeL': 0.15137962833431348, 'rougeLsum': 0.17036497472682016}
PEFT MODEL:
{'rouge1': 0.4388984541596961, 'rouge2': 0.1637302244700488, 'rougeL': 0.3163552534377434, 'rougeLsum': 0.3168525507995582}
Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 23.64%
rouge2: 9.88%
rougeL: 16.50%
rougeLsum: 14.65%


### Reference
https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07