# Fine Tunning Generative IA model 

Neste notebook, será treinado uma modelo de LLM já existente da biblioteca Hugging Face para realizar o resumo de dialogos. O modelo utilizado será o FLAN-5.


# Table of contents

- 1- [Configuração do ambiente](#1)
    - 1.1- [Configuração do kernel e das denpendencias necessárias para realizar a tarefas, o conjunto de dados e o modelo LLM](#1.1);
    - 1.2- Carrega o conjunto de dados e o modelo;
    - 1.3- Realiza o treinamento para a tarefa de zero-shot-learning

<a name='1'></a>
# 1- Configuração do ambiente

<a name='1.1'></a>
## 1.1 - Configurar o kernel e as denpendencias necessárias para realizar a tarefas, o conjunto de dados e o modelo LLM; 

In [3]:
import os
import click

In [4]:
! pip install -U datasets==2.17.0
! pip install --upgrade pip
! pip install --disable-pip-version-check \
                torch==1.13.1 \
                torchdata==0.5.1 --quiet
    
! pip install \
    transformers==4.27.2 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib==0.1.1  \
    peft==0.3.0

Collecting datasets==2.17.0
  Downloading datasets-2.17.0-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow-hotfix (from datasets==2.17.0)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting xxhash (from datasets==2.17.0)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting fsspec<=2023.10.0,>=2023.1.0 (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets==2.17.0)
  Downloading fsspec-2023.10.0-py3-none-any.whl.metadata (6.8 kB)
Collecting aiohttp (from datasets==2.17.0)
  Downloading aiohttp-3.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.4 kB)
Collecting huggingface-hub>=0.19.4 (from datasets==2.17.0)
  Downloading huggingface_hub-0.21.3-py3-none-any.whl.metadata (13 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets==2.17.0)
  Downloading aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->datasets==2.17.0)
  Downloading

Import some components.

In [5]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

<a name='1.2'></a>
## 1.2 - Carrega o conjunto de dados e o modelo

In [6]:
huggingface_data_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_data_name)

dataset

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Carrega o modelo FLAN-T5 (colocar a referencia ao modelo)

In [7]:
model_name = "google/flan-t5-base"

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [8]:
def print_the_number_of_trainables_parameters(model):
    
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():   # Parametros de toda a rede
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params +=param.numel()
    return f"trainable model parameters: {trainable_model_params}. Model parameters:{all_model_params}. Percentage of trainable model parameters: {100*(trainable_model_params/all_model_params)}"

print_the_number_of_trainables_parameters(original_model)

'trainable model parameters: 247577856. Model parameters:247577856. Percentage of trainable model parameters: 100.0'

<a name='1'></a>
## 1.3 - Realiza a tarefa de zero-shot-learning

In [9]:
dialogue = dataset["test"][0]["dialogue"]
summary = dataset["test"][0]["summary"]

prompt = dialogue

tokens = tokenizer(prompt, return_tensors="pt")
output = tokenizer.decode(original_model.generate(
    tokens["input_ids"], max_new_tokens=200)[0],
    skip_special_tokens=True
)

dash_line = "*"*100
print(f"Dialogue: {dialogue}")
print(dash_line)
print(f"Real summary: {summary}")
print(dash_line)
print(f"Model (LLLM) summary: {output}")

Dialogue: #Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their communication methods. I don't want any - one using Instant Messaging in this office. It wastes too much time! Now, please continue with th

<a name='1'></a>
# 2 - Full fine-tunning

In [10]:
def tokenizer_function(dialogues):
    start_prompt = "Summaryze the following conversation:"
    end_prompt = "\n\nSummary:"
    prompt = [start_prompt + dialogue + end_prompt for dialogue in dialogues["dialogue"]]
    dialogues['input_ids'] = tokenizer(prompt, padding='max_length', truncation=True, return_tensors='pt').input_ids
    dialogues['labels'] = tokenizer(dialogues["summary"], padding='max_length', truncation=True, return_tensors='pt').input_ids
    return dialogues

# The dataset contain 3 diff splits: train, validation, test
# The tokenizer function is handle all data in batchs
tokenized_dataset = dataset.map(tokenizer_function, batched=True)
tokenized_dataset = tokenized_dataset.remove_columns(['id', 'topic', 'dialogue', 'summary'])

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [11]:
# Check the dataset before apply the subsample method
print(tokenized_dataset)

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 500
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1500
    })
})


In [12]:
tokenized_dataset = tokenized_dataset.filter(lambda examples, index: index % 100 == 0, with_indices=True)
# Check the dataset before apply the subsample method
print(tokenized_dataset)

Filter:   0%|          | 0/12460 [00:00<?, ? examples/s]

Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1500 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 125
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 5
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 15
    })
})


In [14]:
output_dir = f' ./dialogue-summary-model-full-{str(int(time.time()))}'

training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_steps=1,
    max_steps=1
)

trainer = Trainer(
    model=original_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
)

In [13]:
trainer.train()



KeyboardInterrupt: 

In [15]:
!aws s3 cp --recursive s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/ ./flan-dialogue-summary-checkpoint/

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
download: s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/generation_config.json to flan-dialogue-summary-checkpoint/generation_config.json
download: s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/config.json to flan-dialogue-summary-checkpoint/config.json
download: s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/scheduler.pt to flan-dialogue-summary-checkpoint/scheduler.pt
download: s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/trainer_state.json to flan-dialogue-summary-checkpoint/trainer_state.json
download: s3://dlai-generative-ai/models/flan-dialogue-summary-checkpoint/training_args.bin to flan-dialogue-summary-checkpoint/training_args.b

In [16]:
!ls -alh ./flan-dialogue-summary-checkpoint/pytorch_model.bin

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
-rw-r--r-- 1 root root 945M May 15  2023 ./flan-dialogue-summary-checkpoint/pytorch_model.bin


In [17]:
instruct_model = AutoModelForSeq2SeqLM.from_pretrained("./flan-dialogue-summary-checkpoint", torch_dtype=torch.bfloat16)

<a name='2.3'></a>
## 2.3- Avaliando o novo modelo após o fine-tunning

In [18]:
dialogue = dataset["test"][0]["dialogue"]
summary = dataset["test"][0]["summary"]

prompt = dialogue

tokens = tokenizer(prompt, return_tensors="pt")
inputs = tokens["input_ids"]

original_model_output = tokenizer.decode(original_model.generate(
    inputs, max_new_tokens=200)[0],
    skip_special_tokens=True
)

fine_tunned_model_output = tokenizer.decode(instruct_model.generate(
    inputs, max_new_tokens=200)[0],
    skip_special_tokens=True
)

dash_line = "*"*100
print(f"Dialogue: {dialogue}")
print(dash_line)
print(f"Real summary: {summary}")
print(dash_line)
print(f"Original Moldel summary: {original_model_output}")
print(dash_line)
print(f"Fine Tunned Model summary: {fine_tunned_model_output}")

Dialogue: #Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their communication methods. I don't want any - one using Instant Messaging in this office. It wastes too much time! Now, please continue with th

<a name='2.3'></a>
## 2.4 - Avaliando o modelo quantitativamente com a métrica ROUGE

In [19]:
rouge = evaluate.load("rouge")

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

In [20]:
# dataset["test"][0:10]["dialogue"]

In [54]:
dialogues = dataset["test"][0:10]["dialogue"]
summarys = dataset["test"][0:10]["summary"]

original_model_summaries = []
instructed_model_summaries = []

for dialogue in dialogues:
    prompt = f"""
    
    Summaryze the following conversation:
    
    {dialogue}
    
    Summary: """
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    
    original_model_outputs = tokenizer.decode(
        original_model.generate(input_ids, generation_config=GenerationConfig(max_new_tokens=200))[0],
        skip_special_tokens=True
    )
    original_model_summaries.append(original_model_outputs)
    
    instructed_model_outputs = tokenizer.decode(
        instruct_model.generate(input_ids, generation_config=GenerationConfig(max_new_tokens=200))[0],
        skip_special_tokens=True
    )
    instructed_model_summaries.append(instructed_model_outputs)
    
zipped_summaries = list(zip(summarys, original_model_summaries, instructed_model_summaries))

df = pd.DataFrame(zipped_summaries, columns=["Human Summary", "Original Model Summary", "Instructed Model Summary"])
df

Unnamed: 0,Human Summary,Original Model Summary,Instructed Model Summary
0,Ms. Dawson helps #Person1# to write a memo to ...,Employees are now obligated to use instant mes...,#Person1# asks Ms. Dawson to take a dictation ...
1,In order to prevent employees from wasting tim...,Employees will be notified of the new policy r...,#Person1# asks Ms. Dawson to take a dictation ...
2,Ms. Dawson takes a dictation for #Person1# abo...,#Person1#: This memo should go out as an intra...,#Person1# asks Ms. Dawson to take a dictation ...
3,#Person2# arrives late because of traffic jam....,The people in the city are rethinking their wa...,#Person2# got stuck in traffic again. #Person1...
4,#Person2# decides to follow #Person1#'s sugges...,#Person1: You're finally here.,#Person2# got stuck in traffic again. #Person1...
5,#Person2# complains to #Person1# about the tra...,The driver of the car is stuck in traffic.,#Person2# got stuck in traffic again. #Person1...
6,#Person1# tells Kate that Masha and Hero get d...,Masha and Hero are getting divorced. #Person1:...,Masha and Hero are getting divorced. Kate can'...
7,#Person1# tells Kate that Masha and Hero are g...,#Person1#: Masha and Hero are getting divorced...,Masha and Hero are getting divorced. Kate can'...
8,#Person1# and Kate talk about the divorce betw...,Masha and Hero are getting divorced.,Masha and Hero are getting divorced. Kate can'...
9,#Person1# and Brian are at the birthday party ...,"#Person1: Happy birthday, Brian! #Person2: Yes...",Brian's birthday is coming. Brian dances with ...


Evaluate the models results with ROUGE metric

In [24]:
original_model_results = rouge.compute(
        predictions=original_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

instruct_model_results = rouge.compute(
        predictions=instructed_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

print("ORIGINAL MODEL ROUGE METRIC:")
print(original_model_results)
print("INSCTRUCT MODEL ROUGE METRIC:")
print(instruct_model_results)

ORIGINAL MODEL ROUGE METRIC:
{'rouge1': 0.15691605479034373, 'rouge2': 0.025, 'rougeL': 0.12340679885790447, 'rougeLsum': 0.12025441862951552}
INSCTRUCT MODEL ROUGE METRIC:
{'rouge1': 0.23133932393487108, 'rouge2': 0.05470946579194001, 'rougeL': 0.1663099334843125, 'rougeLsum': 0.16418563150808485}


O arquivo dialogue-summary-training-results.csv contem os vários retornos dos modelos original e instruído que podem ser utilizados para uma validação mais extensiva.

In [25]:
results = pd.read_csv("./data/dialogue-summary-training-results.csv")

summarys = results.human_baseline_summaries
original_model_summaries = results.original_model_summaries
instruct_model_summaries = results.instruct_model_summaries

original_model_results = rouge.compute(
        predictions=original_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

instruct_model_results = rouge.compute(
        predictions=instruct_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

print("ORIGINAL MODEL ROUGE METRIC:")
print(original_model_results)
print("INSCTRUCT MODEL ROUGE METRIC:")
print(instruct_model_results)

ORIGINAL MODEL ROUGE METRIC:
{'rouge1': 0.2334158581572823, 'rouge2': 0.07603964187010573, 'rougeL': 0.20145520923859048, 'rougeLsum': 0.20145899339006135}
INSCTRUCT MODEL ROUGE METRIC:
{'rouge1': 0.42161291557556113, 'rouge2': 0.18035380596301792, 'rougeL': 0.3384439349963909, 'rougeLsum': 0.33835653595561666}


In [26]:
print("Absolute percentage improvemment of the instructed model over the original model")
improvement = np.array(list(instruct_model_results.values())) - np.array(list(original_model_results.values()))

for key, value in zip(instruct_model_results.keys(), improvement):
    print(f"{key}: {value*100:.2f}%")

Absolute percentage improvemment of the instructed model over the original model
rouge1: 18.82%
rouge2: 10.43%
rougeL: 13.70%
rougeLsum: 13.69%


In [27]:
original_model_results.values()

dict_values([0.2334158581572823, 0.07603964187010573, 0.20145520923859048, 0.20145899339006135])

<a name='3'></a>
# 3 - Desempenhando o Parameter Efficient Fine-Tunning (PEFT)

<a name='3.1'></a>
## 3.1 - Configurando o ambiente para o PEFT/Lora

In [28]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32,   # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM   # FLAN-T5
)

Adiciona o adaptador Lora na LLM original para ser treinado

In [29]:
peft_model = get_peft_model(original_model, lora_config)

print_the_number_of_trainables_parameters(peft_model)

'trainable model parameters: 3538944. Model parameters:251116800. Percentage of trainable model parameters: 1.4092820552029972'

<a name='3.2'></a>
## 3.2 - Treina o adaptador PEFT

In [30]:
output_dir = f"./peft-dialogue-summaries-training-{str(int(time.time()))}"

peft_training_args =  TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3,
    num_train_epochs=1,
    logging_steps=1,
    max_steps=1
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_dataset["train"]
)

In [31]:
peft_trainer.train()

peft_model_path = "./peft-dialogue-summary-model-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)



Step,Training Loss
1,50.75


('./peft-dialogue-summary-model-checkpoint-local/tokenizer_config.json',
 './peft-dialogue-summary-model-checkpoint-local/special_tokens_map.json',
 './peft-dialogue-summary-model-checkpoint-local/tokenizer.json')

O modelo foi treinado apenas com uma amostra dos dados. Faz então o download do modelo treinado com todos os dados.

In [42]:
!aws s3 cp --recursive s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/ ./peft-dialogue-summary-checkpoint-from-s3/ 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
download: s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/special_tokens_map.json to peft-dialogue-summary-checkpoint-from-s3/special_tokens_map.json
download: s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/adapter_config.json to peft-dialogue-summary-checkpoint-from-s3/adapter_config.json
download: s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/tokenizer_config.json to peft-dialogue-summary-checkpoint-from-s3/tokenizer_config.json
download: s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/adapter_model.bin to peft-dialogue-summary-checkpoint-from-s3/adapter_model.bin
download: s3://dlai-generative-ai/models/peft-dialogue-summary-checkpoint/tok

Verifica que o tamanho do modelo é muito menor do que o LLM original.

In [44]:
!ls -al ./peft-dialogue-summary-checkpoint-from-s3/adapter_model.bin

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
-rw-r--r-- 1 root root 14208525 May 15  2023 ./peft-dialogue-summary-checkpoint-from-s3/adapter_model.bin


In [45]:
from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

peft_model = PeftModel.from_pretrained(peft_model_base,
                                       "./peft-dialogue-summary-checkpoint-from-s3/",
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False
                                      )

Verifica a quantidade de parâmetros treinaveis do modelo. A quantidade vai ser zero porque is_trainable está como falso.

In [46]:
print_the_number_of_trainables_parameters(peft_model)

'trainable model parameters: 0. Model parameters:251116800. Percentage of trainable model parameters: 0.0'

<a name='3.3'></a>
## 3.3 - Verifica o desempenho do novo modelo

In [56]:
dialogue = dataset["test"][0]["dialogue"]
summary = dataset["test"][0]["summary"]

dialogues = dataset["test"][0:10]["dialogue"]
summarys = dataset["test"][0:10]["summary"]

original_model_summaries = []
instructed_model_summaries = []
peft_model_summaries = []

for dialogue in dialogues:
    
    prompt = f"""
    Summaryze the following conversation.
    
    {dialogue}
    
    Summary: """

    input_ids = tokenizer(prompt, return_tensors='pt').input_ids

    original_model_output = tokenizer.decode(
        original_model.generate(input_ids, generation_config=GenerationConfig(max_new_tokens=200))[0],
        skip_special_tokens=True
    )

    fine_tunned_model_output = tokenizer.decode(
        instruct_model.generate(input_ids, generation_config=GenerationConfig(max_new_tokens=200))[0],
        skip_special_tokens=True
    )
    peft_model_output = tokenizer.decode(
        peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))[0],
        skip_special_tokens=True
    )
    
    original_model_summaries.append(original_model_output)
    instructed_model_summaries.append(fine_tunned_model_output)
    peft_model_summaries.append(peft_model_output)
    
zipped_summaries = list(zip(summarys, original_model_summaries, instructed_model_summaries, peft_model_summaries))

df = pd.DataFrame(zipped_summaries, columns=["Human Summary", "Original Model Summary", "Instructed Model Summary", "PEFT Model Summary"])
df

Unnamed: 0,Human Summary,Original Model Summary,Instructed Model Summary,PEFT Model Summary
0,Ms. Dawson helps #Person1# to write a memo to ...,#Person1: This memo is to be distributed to al...,#Person1# asks Ms. Dawson to take a dictation ...,#Person1# asks Ms. Dawson to take a dictation ...
1,In order to prevent employees from wasting tim...,The Office of Management and Budgeting is now ...,#Person1# asks Ms. Dawson to take a dictation ...,#Person1# asks Ms. Dawson to take a dictation ...
2,Ms. Dawson takes a dictation for #Person1# abo...,Employees are required to use instant messaging.,#Person1# asks Ms. Dawson to take a dictation ...,#Person1# asks Ms. Dawson to take a dictation ...
3,#Person2# arrives late because of traffic jam....,A driver's slammed by a traffic jam.,#Person2# got stuck in traffic again. #Person1...,#Person2# got stuck in traffic and #Person1# s...
4,#Person2# decides to follow #Person1#'s sugges...,Taking public transport to work is a good idea...,#Person2# got stuck in traffic again. #Person1...,#Person2# got stuck in traffic and #Person1# s...
5,#Person2# complains to #Person1# about the tra...,You're finally here!,#Person2# got stuck in traffic again. #Person1...,#Person2# got stuck in traffic and #Person1# s...
6,#Person1# tells Kate that Masha and Hero get d...,Masha and Hero are getting a divorce.,Masha and Hero are getting divorced. Kate can'...,Kate tells #Person2# Masha and Hero are gettin...
7,#Person1# tells Kate that Masha and Hero are g...,Masha and Hero are getting divorced.,Masha and Hero are getting divorced. Kate can'...,Kate tells #Person2# Masha and Hero are gettin...
8,#Person1# and Kate talk about the divorce betw...,Masha and Hero are getting divorced.,Masha and Hero are getting divorced. Kate can'...,Kate tells #Person2# Masha and Hero are gettin...
9,#Person1# and Brian are at the birthday party ...,Brian's birthday is on his birthday.,Brian's birthday is coming. Brian dances with ...,Brian remembers his birthday and invites #Pers...


Calcula o Rouge.

In [57]:
original_model_results = rouge.compute(
        predictions=original_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

instruct_model_results = rouge.compute(
        predictions=instructed_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

peft_model_results = rouge.compute(
        predictions=peft_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

print("ORIGINAL MODEL ROUGE METRIC:")
print(original_model_results)
print("INSCTRUCT MODEL ROUGE METRIC:")
print(instruct_model_results)
print("PEFT MODEL ROUGE METRIC:")
print(peft_model_results)

ORIGINAL MODEL ROUGE METRIC:
{'rouge1': 0.25266498349618555, 'rouge2': 0.08959803921568629, 'rougeL': 0.21588637506284564, 'rougeLsum': 0.2180194111089252}
INSCTRUCT MODEL ROUGE METRIC:
{'rouge1': 0.3937097869450811, 'rouge2': 0.1722944839254224, 'rougeL': 0.2764648459738816, 'rougeLsum': 0.27772528954524955}
PEFT MODEL ROUGE METRIC:
{'rouge1': 0.3725351062275605, 'rouge2': 0.12138811933618107, 'rougeL': 0.27620639623170606, 'rougeLsum': 0.2758134870822362}


Nada mal em comparação com o modelo com full fine-tunning (treinado por completo)

Calcula o ROUGE para o conjunto geral de sumarios. Serão utilizados os dados já carregados de uma fonte externa para diminuit o tempo do experimento.

In [59]:
summarys = results.human_baseline_summaries
original_model_summaries = results.original_model_summaries
instruct_model_summaries = results.instruct_model_summaries
peft_model_summaries = results.peft_model_summaries

original_model_results = rouge.compute(
        predictions=original_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)

instruct_model_results = rouge.compute(
        predictions=instruct_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)
peft_model_results = rouge.compute(
        predictions=peft_model_summaries,
        references=summarys,
        use_aggregator=True,
        use_stemmer=True
)


print("ORIGINAL MODEL ROUGE METRIC:")
print(original_model_results)
print("INSCTRUCT MODEL ROUGE METRIC:")
print(instruct_model_results)
print("PEFT MODEL ROUGE METRIC:")
print(peft_model_results)

ORIGINAL MODEL ROUGE METRIC:
{'rouge1': 0.2334158581572823, 'rouge2': 0.07603964187010573, 'rougeL': 0.20145520923859048, 'rougeLsum': 0.20145899339006135}
INSCTRUCT MODEL ROUGE METRIC:
{'rouge1': 0.42161291557556113, 'rouge2': 0.18035380596301792, 'rougeL': 0.3384439349963909, 'rougeLsum': 0.33835653595561666}
PEFT MODEL ROUGE METRIC:
{'rouge1': 0.40810631575616746, 'rouge2': 0.1633255794568712, 'rougeL': 0.32507074586565354, 'rougeLsum': 0.3248950182867091}


Calcula a otimização utilizando o peft em comparação com o modelo original.

In [61]:
print("Absolute percentage improvemment of the instructed model over the original model")
improvement = np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values()))

for key, value in zip(peft_model_results.keys(), improvement):
    print(f"{key}: {value*100:.2f}%")

Absolute percentage improvemment of the instructed model over the original model
rouge1: 17.47%
rouge2: 8.73%
rougeL: 12.36%
rougeLsum: 12.34%


Calcula a otimização utilizando o peft em comparação com o modelo que foi aplicado
o full fine-tunning.

In [63]:
print("Absolute percentage improvemment of the instructed model over the original model")
improvement = np.array(list(peft_model_results.values())) - np.array(list(instruct_model_results.values()))

for key, value in zip(peft_model_results.keys(), improvement):
    print(f"{key}: {value*100:.2f}%")

Absolute percentage improvemment of the instructed model over the original model
rouge1: -1.35%
rouge2: -1.70%
rougeL: -1.34%
rougeLsum: -1.35%
