# Parameter Efficient Fine-Tuning of a Generative AI Model for Dialogue Summarization

Thge note demonstrates use of PEFT for fine tuning large language models. The notebook is ran on CPU and thus uses a small model from hugging face [t5-small](https://huggingface.co/google-t5/t5-small).

In [1]:
# Installing dependencies

%pip install --upgrade pip

%pip install -U datasets==2.17.0  --quiet
%pip install numpy==1.26.4  --quiet
%pip install tf-keras  --quiet
%pip install torch==2.2.0 \
    torchdata==0.10.1 --quiet

%pip install \
    transformers \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib==0.1.1 \
    peft==0.3.0 --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import dependencies

from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np
import tokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Load dataset and LLM

huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [4]:
# Load pre-trained model from huggingface

model_name='t5-small'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [5]:
# Check number of trainable parameters

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 60506624
all model parameters: 60506624
percentage of trainable model parameters: 100.00%


In [6]:
# Test the Model with Zero Shot Inferencing

index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"], 
        max_new_tokens=200,
    )[0], 
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.

Summary:

-------------------------------------------------------------------

In [7]:
# Convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM

def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids
    
    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

In [8]:
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)

In [9]:
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

print(tokenized_datasets)

Shapes of the datasets:
Training: (125, 2)
Validation: (5, 2)
Test: (15, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 125
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 5
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 15
    })
})


In [10]:
# Create LoRA config

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM
)

In [11]:
# Add LoRA adapter layers/parameters to the original LLM to be trained.

peft_model = get_peft_model(original_model, 
                            lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 1179648
all model parameters: 61686272
percentage of trainable model parameters: 1.91%


In [12]:
# Train PEFT Adapter

output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3,
    num_train_epochs=1,
    logging_steps=1,
    max_steps=1,
)
    
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [13]:
peft_trainer.train()

peft_model_path="./peft-dialogue-summary-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss
1,2.1381


('./peft-dialogue-summary-checkpoint-local/tokenizer_config.json',
 './peft-dialogue-summary-checkpoint-local/special_tokens_map.json',
 './peft-dialogue-summary-checkpoint-local/tokenizer.json')

In [14]:
# Prepare model for inference

from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("t5-small", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("t5-small")

peft_model = PeftModel.from_pretrained(peft_model_base, 
                                       './peft-dialogue-summary-checkpoint-local', 
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)

In [15]:
# Evaluate the Model Qualitatively (Human Evaluation)

tensor = torch.ones((10, 10))
tensor = tensor.to("cpu")
original_model = original_model.to("cpu")

index = 200
dialogue = dataset['test'][index]['dialogue']
human_baseline_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'PEFT MODEL: {peft_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
:: You might also want to add a new processor, or a CD-ROM drive..Person2#: Yes, but I'm not sure what exactly I would need.:: Yes, but I'm not sure what exactly I would need. upgrading your system? #Person2#: Yes.: Yes, but painting programs..,::::: it:
---------------------------------------------------------------------------------------------------
PEFT MODEL: ::: You might want to upgrade your hardware because it is pretty outdated now.: No., more memory and a faster modem.: Yes, but I'm not sure what exactly I would need.: Yes.: Yes.: You might want to add a painting program to your software. It would allow you to make up your own flyers and banners for adve

In [16]:
# Compute ROUGE score for this subset of the data. 

tensor = torch.ones((10, 10))
tensor = tensor.to("cpu")
original_model = original_model.to("cpu")

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """
    
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    human_baseline_text_output = human_baseline_summaries[idx]
    
    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,peft_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,"#Person2#: Yes, sir. Go ahead. #Person1#: Yes,...",": Yes, sir. Go ahead. #Person2#: Yes. Please g..."
1,In order to prevent employees from wasting tim...,": Yes, sir... #Person1#: Yes, sir. Go ahead. #...",": Yes, sir. Go ahead. #Person2#: Yes. Please g..."
2,Ms. Dawson takes a dictation for #Person1# abo...,": Yes, sir. Go ahead. #Person1#: Yes. Yes. Any...",": Yes, sir. Go ahead. #Person2#: Yes. Please g..."
3,#Person2# arrives late because of traffic jam....,: I'm going to miss having the freedom that yo...,: I'm going to really miss having the freedom ...
4,#Person2# decides to follow #Person1#'s sugges...,#Person1#: I'm going to quit driving to work. ...,: I'm going to really miss having the freedom ...
5,#Person2# complains to #Person1# about the tra...,: I'm going to be1#: You're finally here! What...,: I'm going to really miss having the freedom ...
6,#Person1# tells Kate that Masha and Hero get d...,"#Person1#: Kate, you never believe what's happ...",":: Kate, you never believe what's happened.: K..."
7,#Person1# tells Kate that Masha and Hero are g...,": Kate, you never believe what's happened. #Pe...",":: Kate, you never believe what's happened.: K..."
8,#Person1# and Kate talk about the divorce betw...,: You never believe what happened. #Person2#: ...,":: Kate, you never believe what's happened.: K..."
9,#Person1# and Brian are at the birthday party ...,":: Happy Birthday, Brian.: Happy Birthday, thi...",":: Happy Birthday, this is for you, Brian.: Ha..."


In [17]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

ORIGINAL MODEL:
{'rouge1': 0.16174146405808468, 'rouge2': 0.035647851079991105, 'rougeL': 0.139505346603512, 'rougeLsum': 0.1394246501933532}
PEFT MODEL:
{'rouge1': 0.1407530475468729, 'rouge2': 0.014619390368707626, 'rougeL': 0.11237975440934007, 'rougeLsum': 0.11372318411742868}


In [18]:
print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(original_model_results.values())) - np.array(list(peft_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 2.10%
rouge2: 2.10%
rougeL: 2.71%
rougeLsum: 2.57%
