# Fine-Tune a Generative AI Model for Dialogue Summarization

Fine-tune FLAN-T5 model from Hugging Face for enhanced dialogue summarization. Full fine-tuning and Parameter Efficient Fine-Tuning (PEFT) will be explored and evaluated with ROUGE metrics.

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm
  warn(


In [3]:
hf_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(hf_dataset_name)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})


In [4]:
model_name = "google/flan-t5-base"
# original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_name)

It is possible to pull out the number of model parameters and find out how many of them are trainable. The following function can be used to do that.

In [5]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0

    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters {trainable_model_params/all_model_params * 100}%"

print(print_number_of_trainable_model_parameters(original_model)) 

trainable model parameters: 247577856
all model parameters: 247577856
percentage of trainable model parameters 100.0%


### 1.3 Test the Model with Zero Shot Inferencing

In [6]:
index = 200

dialogue = dataset["test"][index]["dialogue"]
summary = dataset["test"][index]["summary"]

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

output = tokenizer.decode(original_model.generate(tokenizer(prompt, return_tensors="pt")['input_ids'].to('xpu'), max_new_tokens=200)[0], skip_special_tokens=True)

dash_line = "-".join("" for i in range(50))
print(prompt)
print(dash_line)
print("Baseline Summary:\n", summary)
print(dash_line)
print("Model Generation - Zero Shot:\n", output)



Summarize the following conversation.

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.

Summary:

-------------------------------------------------
Baseline Summary:
 #Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
------------------------

## 2. Perform Full Fine-Tuning

### 2.1 Preprocess the Dialog-Summary Dataset

In [7]:
def tokenize_function(example):
    start_prompt = "Summarize the following conversation.\n\n"
    end_prompt = "\n\nSummary: "
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    # prompt = start_prompt + example["dialogue"] + end_prompt # when batched=False
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt")['input_ids'].to('xpu')
    example['labels'] = tokenizer(example['summary'], padding="max_length", truncation=True, return_tensors="pt").input_ids
    return example

# The dataset actually contains 3 diff split: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches
tokenized_dataset = dataset.map(tokenize_function, batched=True)
tokenized_dataset = tokenized_dataset.remove_columns(['id', 'topic', 'dialogue', 'summary'])
# print(tokenized_dataset['validation'][0]['input_ids'])

Map: 100%|██████████| 500/500 [00:00<00:00, 1736.53 examples/s]


In [8]:
# To save some time, subsample the dataset:

tokenized_dataset = tokenized_dataset.filter(lambda example, index: index % 100 == 0, with_indices=True)

Filter: 100%|██████████| 500/500 [00:00<00:00, 2447.95 examples/s]


In [9]:
print("Shapes of dataset:")
print(f"Training: {tokenized_dataset['train'].shape}")
print(f"Validation: {tokenized_dataset['validation'].shape}")
print(f"Test: {tokenized_dataset['test'].shape}")
print(tokenized_dataset)

Shapes of dataset:
Training: (125, 2)
Validation: (5, 2)
Test: (15, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 125
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 5
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 15
    })
})


### 2.2 Fine-Tune the Model with the Preprocessed Dataset

In [10]:
# output_dir = f"./dialogue-summary-training-{str(int(time.time()))}"

# training_args = TrainingArguments(
#     output_dir=output_dir,
#     learning_rate=1e-5,
#     num_train_epochs=1,
#     weight_decay=0.01,
#     logging_steps=1,
#     max_steps=1
# )

# trainer = Trainer(
#     model=original_model,
#     args=training_args,
#     train_dataset=tokenized_dataset['train'],
#     eval_dataset=tokenized_dataset['validation']
# )

In [11]:
# trainer.train()

### 2.4 Evaluate the Model Quantitatively (with ROUGE Metric)

In [13]:
rouge = evaluate.load('rouge')

In [32]:
human_baseline_summaries = []
original_model_summaries = []

for i in range(10):
    human_baseline_summaries.append(dataset["test"][i]['summary'])
    prompt = f"""
        Summarize the following conversation:
        {dataset["test"][i]['dialogue']}
        Summary: 
"""
    model_output = original_model.generate(tokenizer(prompt, return_tensors='pt')['input_ids'].to('xpu'))[0]
    # original_model.generate(tokenizer(prompt, return_tensors="pt")['input_ids'].to('xpu')
    original_model_summaries.append(tokenizer.decode(model_output, skip_special_tokens=True))

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries))
df = pd.DataFrame(zipped_summaries, columns=['Human Baseline', 'Original Model'])
df

Unnamed: 0,Human Baseline,Original Model
0,Ms. Dawson helps #Person1# to write a memo to ...,#Person1#: I need to take a dictation for you.
1,In order to prevent employees from wasting tim...,#Person1#: I need to take a dictation for you.
2,Ms. Dawson takes a dictation for #Person1# abo...,#Person1#: I need to take a dictation for you.
3,#Person2# arrives late because of traffic jam....,The traffic jam at the Carrefour intersection ...
4,#Person2# decides to follow #Person1#'s sugges...,The traffic jam at the Carrefour intersection ...
5,#Person2# complains to #Person1# about the tra...,The traffic jam at the Carrefour intersection ...
6,#Person1# tells Kate that Masha and Hero get d...,Masha and Hero are getting divorced.
7,#Person1# tells Kate that Masha and Hero are g...,Masha and Hero are getting divorced.
8,#Person1# and Kate talk about the divorce betw...,Masha and Hero are getting divorced.
9,#Person1# and Brian are at the birthday party ...,"#Person1#: Happy Birthday, Brian. #Person2#: I'm"


In [33]:
original_model_results = rouge.compute(
    predictions=original_model_summaries, 
    references=human_baseline_summaries, 
    use_aggregator=True, 
    use_stemmer=True)

print(original_model_results)

{'rouge1': 0.29614779598603125, 'rouge2': 0.10811082693947144, 'rougeL': 0.255038575906223, 'rougeLsum': 0.2551552887729358}


## 3. Perform Parameter Efficient Fine-Tuning (PEFT)

