## **Perform Parameter Efficient Fine-Tuning (PEFT)**


Now install the required packages for the LLM and datasets.

In [None]:

%pip install \
    transformers \
    datasets \
    evaluate \
    rouge_score\
    loralib \
    peft --quiet

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

# Load Dataset and LLM
We are going to continue experimenting with the DialogSum Hugging Face dataset. It contains 10,000+ dialogues with the corresponding manually labeled summaries and topics.

In [None]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Load the pre-trained FLAN-T5-small model and its tokenizer directly from HuggingFace.

In [None]:

model_name='google/flan-t5-small'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

It is possible to pull out the number of model parameters and find out how many of them are trainable. The following function can be used to do that, at this stage, you do not need to go into details of it.

In [None]:
def print_number_of_trainable_model_parameters(model):
    all_model_params = model.num_parameters()
    trainable_model_params = sum(param.numel() for param in model.parameters() if param.requires_grad)

    percentage_trainable = 100 * trainable_model_params / all_model_params if all_model_params > 0 else 0

    return (f"Trainable model parameters: {trainable_model_params}\n"
            f"All model parameters: {all_model_params}\n"
            f"Percentage of trainable model parameters: {percentage_trainable:.2f}%")

print(print_number_of_trainable_model_parameters(original_model))

Trainable model parameters: 76961152
All model parameters: 76961152
Percentage of trainable model parameters: 100.00%


 ### Preprocess the Dialog-Summary Dataset

1.   List item
2.   List item


You need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM. Prepend an instruction to the start of the dialog with Summarize the following conversation and to the start of the summary with Summary as follows:

Training prompt (dialogue):

Summarize the following conversation.

    Chris: This is his part of the conversation.
    Antje: This is her part of the conversation.
    
Summary:
Training response (summary):

Both Chris and Antje participated in the conversation.
###Then preprocess the prompt-response dataset into tokens and pull out their input_ids (1 per token).

In [None]:
def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

###Setup the PEFT/LoRA model for Fine-Tuning
You need to set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter. Using PEFT/LoRA, you are freezing the underlying LLM and only training the adapter. Have a look at the LoRA configuration below. Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5-small
)

Add LoRA adapter layers/parameters to the original LLM to be trained.

In [None]:
peft_model = get_peft_model(original_model,
                            lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

Trainable model parameters: 1376256
All model parameters: 78337408
Percentage of trainable model parameters: 1.76%


In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

### Train PEFT Adapter
Define training arguments and create Trainer instance.

In [None]:
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=10,
    logging_steps=20,
    max_steps=10,
    label_names=['labels']
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],

)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Now everything is ready to train the PEFT adapter and save the model.

In [None]:

peft_trainer.train()

peft_model_path="./peft-dialogue-summary-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

Step,Training Loss


('./peft-dialogue-summary-checkpoint-local/tokenizer_config.json',
 './peft-dialogue-summary-checkpoint-local/special_tokens_map.json',
 './peft-dialogue-summary-checkpoint-local/spiece.model',
 './peft-dialogue-summary-checkpoint-local/added_tokens.json',
 './peft-dialogue-summary-checkpoint-local/tokenizer.json')

Prepare this model by adding an adapter to the original FLAN-T5-small model. We are setting is_trainable=False because the plan is only to perform inference with this PEFT model. If you were preparing the model for further training, you would set is_trainable=True.

In [None]:
from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

peft_model = PeftModel.from_pretrained(peft_model_base,
                                       './peft-dialogue-summary-checkpoint-local',
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)

The number of trainable parameters will be 0 due to is_trainable=False setting:

In [None]:
print(print_number_of_trainable_model_parameters(peft_model))

Trainable model parameters: 0
All model parameters: 78337408
Percentage of trainable model parameters: 0.00%


 ### Evaluate the Model Quantitatively (with ROUGE Metric)
Perform inferences for the sample of the test dataset (only 10 dialogues and summaries to save time).

In [None]:
from evaluate import load
rouge = load("rouge")
original_model = original_model.to("cuda")
peft_model = peft_model.to('cuda')
# 1) گرفتن 10 دیالوگ تست و 10 خلاصه انسانی
test_dialogues = dataset["test"][0:10]["dialogue"]
reference_summaries = dataset["test"][0:10]["summary"]
# ۱) گرفتن خلاصه‌ها از مدل
predictions_original = []
for dialogue in test_dialogues:
    inputs = tokenizer(dialogue, return_tensors="pt").to("cuda")
    outputs = original_model.generate(**inputs, max_new_tokens=200)
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    predictions_original.append(summary)

predictions_peft = []
for dialogue in test_dialogues:
    inputs = tokenizer(dialogue, return_tensors="pt").to("cuda")
    outputs = peft_model.generate(**inputs, max_new_tokens=200)
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    predictions_peft.append(summary)



scores_original = rouge.compute(
    predictions=predictions_original,
    references=reference_summaries,
    use_stemmer=True,
)

scores_peft = rouge.compute(
    predictions=predictions_peft,
    references=reference_summaries,
    use_stemmer=True,
)

print("ORIGINAL MODEL:")
print(scores_original)

print("PEFT MODEL:")
print(scores_peft)


ORIGINAL MODEL:
{'rouge1': np.float64(0.14700941637677367), 'rouge2': np.float64(0.027025045396923287), 'rougeL': np.float64(0.1373872208920477), 'rougeLsum': np.float64(0.1370316385333234)}
PEFT MODEL:
{'rouge1': np.float64(0.09365009990009993), 'rouge2': np.float64(0.006666666666666666), 'rougeL': np.float64(0.07498479330375882), 'rougeLsum': np.float64(0.07545451036830347)}


Calculate the improvement of PEFT over the original model:

In [None]:
print(" percentage improvement of PEFT MODEL:")
for metric in scores_original:
    diff = scores_peft[metric] - scores_original[metric]
    print(f"{metric}: {diff * 100:.2f}%")


 percentage improvement of PEFT MODEL:
rouge1: -5.34%
rouge2: -2.04%
rougeL: -6.24%
rougeLsum: -6.16%
