#T5-Small with LoRA Fine-Tuning and Beam Search Decoding

This notebook evaluates the performance of the T5-Small model with LoRA fine-tuning, both with and without Beam Search as a decoding strategy. The LoRA fine-tuning experiment explores the efficiency of lightweight adaptations to the pre-trained model for generating SMART goals, while the Beam Search decoding experiment investigates improvements in output quality through enhanced sequence selection. The results from both approaches are compared using the validation dataset to assess their effectiveness in adhering to SMART criteria.

In [None]:
!pip install -q datasets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m337.9/480.6 kB[0m [31m10.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's d

T5-Small with LoRA Fine-Tuning: Experiment 3

This section evaluates the T5-Small model fine-tuned using Low-Rank Adaptation (LoRA). By adding lightweight fine-tuning layers to the pre-trained model, LoRA allows for efficient adaptation to the dataset while maintaining computational efficiency. The validation dataset is used to assess the model’s performance in generating SMART goals, highlighting the impact of fine-tuning on goal quality.

In [None]:
 from google.colab import drive
 drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
import torch
from datasets import load_dataset,Dataset
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments, DataCollatorForSeq2Seq
from peft import LoraConfig, TaskType, get_peft_model
import warnings




In [None]:
# Load pre-split  cleaned datasets

train_data = pd.read_csv("/content/drive/My Drive/train_data_cleaned.csv")
val_data = pd.read_csv("/content/drive/My Drive/validation_data_cleaned.csv")
test_data = pd.read_csv("/content/drive/My Drive/test_data_cleaned.csv")

# Verify the sizes of each split
print(f"Training data: {len(train_data)} samples")
print(f"Validation data: {len(val_data)} samples")
print(f"Testing data: {len(test_data)} samples")

Training data: 3616 samples
Validation data: 452 samples
Testing data: 452 samples


In [None]:
model_id = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [None]:
class SmartGoalDataset(torch.utils.data.Dataset):
    def __init__(self, data, tokenizer, source_max_length=512, target_max_length=1024):
        self.data = data
        self.tokenizer = tokenizer
        self.source_max_length = source_max_length
        self.target_max_length = target_max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        augmented_vague_goal = str(self.data.iloc[idx]['Augmented Vague Goal'])
        smart_goal = str(self.data.iloc[idx]['SMART Goal'])

        source = self.tokenizer(
            " vague goal to SMART goal: " + augmented_vague_goal,
            max_length=self.source_max_length,
            padding="max_length",
            truncation=True,
            return_tensors="pt"
        )

        target = self.tokenizer(
            smart_goal,
            max_length=self.target_max_length,
            padding="max_length",
            truncation=True,
            return_tensors="pt"
        )

        return {
            "input_ids": source["input_ids"].squeeze(dim=0),
            "attention_mask": source["attention_mask"].squeeze(dim=0),
            "labels": target["input_ids"].squeeze(dim=0)
        }

In [None]:
# Create datasets
train_dataset = SmartGoalDataset(train_data, tokenizer)
val_dataset = SmartGoalDataset(val_data, tokenizer)
test_dataset = SmartGoalDataset(test_data, tokenizer)

In [None]:
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["q", "v"],
    bias="none"
)

In [None]:
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

trainable params: 589,824 || all params: 61,096,448 || trainable%: 0.9654


In [None]:
#label_pad_token_id = -100
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    #label_pad_token_id=label_pad_token_id,
)

In [None]:
output_dir = "t5-small-chat"
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-3,
    num_train_epochs=3,
    logging_dir=f"{output_dir}/logs",
    logging_strategy="epoch",
    save_strategy="epoch",
    push_to_hub=False
)

In [None]:
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

In [None]:
trainer.train()

Step,Training Loss
452,0.4116
904,0.3892
1356,0.3795


TrainOutput(global_step=1356, training_loss=0.3934482585715685, metrics={'train_runtime': 1329.1156, 'train_samples_per_second': 8.162, 'train_steps_per_second': 1.02, 'total_flos': 1487843780198400.0, 'train_loss': 0.3934482585715685, 'epoch': 3.0})

In [None]:
peft_model.save_pretrained("/content/drive/My Drive/t5_smart_goal_model_lora")
tokenizer.save_pretrained("/content/drive/My Drive/t5_smart_goal_model_lora")

('/content/drive/My Drive/t5_smart_goal_model_lora/tokenizer_config.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora/special_tokens_map.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora/spiece.model',
 '/content/drive/My Drive/t5_smart_goal_model_lora/added_tokens.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora/tokenizer.json')

In [None]:
# Paths to saved LoRA fine-tuned T5 model
lora_t5_model_path = "/content/drive/My Drive/t5_smart_goal_model_lora"

In [None]:
# Load LoRA fine-tuned T5 model and tokenizer
lora_t5_model = AutoModelForSeq2SeqLM.from_pretrained(lora_t5_model_path)
lora_t5_tokenizer = AutoTokenizer.from_pretrained(lora_t5_model_path)

In [None]:
# Move model to the appropriate device
device = "cuda" if torch.cuda.is_available() else "cpu"
lora_t5_model = lora_t5_model.to(device)

In [None]:
from sentence_transformers import SentenceTransformer, util

In [None]:
# Initialize Sentence-BERT model for Faithfulness calculation
sbert_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Create a DataLoader for validation dataset
from torch.utils.data import DataLoader
val_loader = DataLoader(val_dataset, batch_size=10, shuffle=False)

In [None]:
# Helper function to calculate Perplexity
def calculate_perplexity(input_ids, model):
    with torch.no_grad():
        outputs = model(input_ids=input_ids, labels=input_ids)
        return torch.exp(outputs.loss).item()

In [None]:
# Helper function for Faithfulness calculation
def calculate_faithfulness(input_text, generated_text):
    input_embedding = sbert_model.encode(input_text, convert_to_tensor=True)
    output_embedding = sbert_model.encode(generated_text, convert_to_tensor=True)
    return util.pytorch_cos_sim(input_embedding, output_embedding).item()

In [None]:
!pip install -q bert_score

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/61.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from bert_score import score as bert_score

In [None]:
# Evaluation function
def evaluate_model(model, tokenizer, model_name, data_loader, max_length=512, repetition_penalty=1.2, min_length=50):

    results_df = pd.DataFrame(columns=["Model", "Input", "Reference", "Output", "BERTScore", "Perplexity", "Faithfulness"])

    for batch_idx, batch in enumerate(data_loader):
        try:
            # Move batch data to device
            input_ids_batch = batch["input_ids"].to(device)
            attention_mask_batch = batch["attention_mask"].to(device)
            reference_ids = batch["labels"].to(device)

            # Generate outputs using default greedy decoding
            output_ids_batch = model.generate(
                input_ids=input_ids_batch,
                attention_mask=attention_mask_batch,
                max_length=512,
                repetition_penalty=repetition_penalty,
                length_penalty=1.0,
                early_stopping=True,
                pad_token_id=tokenizer.pad_token_id
            )

            # Decode generated and reference texts
            input_texts = [tokenizer.decode(input_ids, skip_special_tokens=True) for input_ids in input_ids_batch]
            generated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in output_ids_batch]
            reference_texts = [tokenizer.decode(ref, skip_special_tokens=True) for ref in reference_ids]

            # Calculate metrics
            P, R, F1 = bert_score(generated_texts, reference_texts, lang="en", model_type="bert-base-uncased")
            faithfulness_scores = [calculate_faithfulness(ref, gen) for ref, gen in zip(reference_texts, generated_texts)]
            perplexities = [calculate_perplexity(ref.unsqueeze(0), model) for ref in reference_ids]

            # Append results
            for i in range(len(generated_texts)):
                results_df.loc[len(results_df)] = [
                    model_name,
                    input_texts[i],  # Use decoded input text
                    reference_texts[i],
                    generated_texts[i],
                    F1[i].item(),
                    perplexities[i],
                    faithfulness_scores[i]
                ]

        except Exception as e:
            print(f"Error processing batch {batch_idx}: {e}")
            continue

    # Save final results to CSV
    results_df.to_csv("/content/drive/My Drive/model_evaluation_T5_LoRA_results_new.csv", index=False)
    print(f"Final results saved for LoRA")



# Run the evaluation
evaluate_model(
    lora_t5_model,
    lora_t5_tokenizer,
    "LoRA Fine-Tuned T5 ",
    val_loader,

)








tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]



Final results saved for LoRA


In [None]:
import pandas as pd

In [None]:
result_t5=pd.read_csv("/content/drive/My Drive/model_evaluation_T5_LoRA_results_new.csv")
# result_t5[result_t5['BERTScore']=='0']


In [None]:
result_t5.head(3)

Unnamed: 0,Model,Input,Reference,Output,BERTScore,Perplexity,Faithfulness
0,LoRA Fine-Tuned T5,vague goal to SMART goal: i’m looking to explo...,to navigate the challenges of my work environm...,"by the end of the next quarter, i will enhance...",0.686857,1.597824,0.80709
1,LoRA Fine-Tuned T5,vague goal to SMART goal: i’m looking to take ...,"by the end of the next quarter, i will dedicat...","by the end of the next quarter, i will enhance...",0.716913,1.395273,0.790598
2,LoRA Fine-Tuned T5,vague goal to SMART goal: i’m thinking it migh...,"by the end of the next quarter, i will enhance...","by the end of the next quarter, i will enhance...",0.781238,1.334886,0.88061


T5-Small with LoRA Fine-Tuning and Beam Search Decoding: Experiment 4

This section builds on the LoRA fine-tuning experiment by incorporating Beam Search as a decoding strategy. Beam Search enhances the quality of generated SMART goals by exploring multiple output sequences and selecting the most probable one. The validation dataset is used to evaluate the combined impact of fine-tuning with LoRA and the Beam Search decoding strategy on output quality and adherence to SMART criteria.


In [None]:
peft_model.save_pretrained("/content/drive/My Drive/t5_smart_goal_model_lora_beam")
tokenizer.save_pretrained("/content/drive/My Drive/t5_smart_goal_model_lora_beam")

('/content/drive/My Drive/t5_smart_goal_model_lora_beam/tokenizer_config.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora_beam/special_tokens_map.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora_beam/spiece.model',
 '/content/drive/My Drive/t5_smart_goal_model_lora_beam/added_tokens.json',
 '/content/drive/My Drive/t5_smart_goal_model_lora_beam/tokenizer.json')

In [None]:
# Paths to saved LoRA fine-tuned with Beam search T5 model
lora_t5_model_path = "/content/drive/My Drive/t5_smart_goal_model_lora_beam"

In [None]:
# Paths to saved LoRA fine-tuned T5 model
lora_t5_model_path = "/content/drive/My Drive/t5_smart_goal_model_lora_beam"

In [None]:
# Load LoRA fine-tuned T5 model and tokenizer
lora_t5_model = AutoModelForSeq2SeqLM.from_pretrained(lora_t5_model_path)
lora_t5_tokenizer = AutoTokenizer.from_pretrained(lora_t5_model_path)

In [None]:
# Move model to the appropriate device
device = "cuda" if torch.cuda.is_available() else "cpu"
lora_t5_model = lora_t5_model.to(device)

In [None]:
from sentence_transformers import SentenceTransformer, util
# Initialize Sentence-BERT model for Faithfulness calculation
sbert_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

In [None]:
# Create a DataLoader for validation dataset
from torch.utils.data import DataLoader
val_loader = DataLoader(val_dataset, batch_size=10, shuffle=False)

In [None]:
# Helper function to calculate Perplexity
def calculate_perplexity(input_ids, model):
    with torch.no_grad():
        outputs = model(input_ids=input_ids, labels=input_ids)
        return torch.exp(outputs.loss).item()

In [None]:
# Helper function for Faithfulness calculation
def calculate_faithfulness(input_text, generated_text):
    input_embedding = sbert_model.encode(input_text, convert_to_tensor=True)
    output_embedding = sbert_model.encode(generated_text, convert_to_tensor=True)
    return util.pytorch_cos_sim(input_embedding, output_embedding).item()

In [None]:
!pip install -q bert_score
from bert_score import score as bert_score

In [None]:
# Evaluation function
def evaluate_model_with_beam_width(model, tokenizer, model_name, data_loader, num_beams=2, max_length=512, repetition_penalty=1.2, min_length=50):

    results_df = pd.DataFrame(columns=["Model", "Input", "Reference", "Output", "BERTScore", "Perplexity", "Faithfulness"])

    for batch_idx, batch in enumerate(data_loader):
        try:
            # Move batch data to device
            input_ids_batch = batch["input_ids"].to(device)
            attention_mask_batch = batch["attention_mask"].to(device)
            reference_ids = batch["labels"].to(device)

            # Generate outputs using beam search
            output_ids_batch = model.generate(
                input_ids=input_ids_batch,
                attention_mask=attention_mask_batch,
                max_length=512,
                num_beams=num_beams,
                repetition_penalty=repetition_penalty,
                length_penalty=1.0,
                early_stopping=True,
                pad_token_id=tokenizer.pad_token_id
            )

            # Decode generated and reference texts
            input_texts = [tokenizer.decode(input_ids, skip_special_tokens=True) for input_ids in input_ids_batch]
            generated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in output_ids_batch]
            reference_texts = [tokenizer.decode(ref, skip_special_tokens=True) for ref in reference_ids]

            # Calculate metrics
            P, R, F1 = bert_score(generated_texts, reference_texts, lang="en", model_type="bert-base-uncased")
            faithfulness_scores = [calculate_faithfulness(ref, gen) for ref, gen in zip(reference_texts, generated_texts)]
            perplexities = [calculate_perplexity(ref.unsqueeze(0), model) for ref in reference_ids]

            # Append results
            for i in range(len(generated_texts)):
                results_df.loc[len(results_df)] = [
                    model_name,
                    input_texts[i],  # Use decoded input text
                    reference_texts[i],
                    generated_texts[i],
                    F1[i].item(),
                    perplexities[i],
                    faithfulness_scores[i]
                ]

        except Exception as e:
            print(f"Error processing batch {batch_idx}: {e}")
            continue

    # Save final results to CSV
    results_df.to_csv("/content/drive/MyDrive/model_evaluation_T5_Beam_results_new.csv", index=False)
    print(f"Final results saved for beam width {num_beams}")



# Run the evaluation for beam width 2
evaluate_model_with_beam_width(
    lora_t5_model,
    lora_t5_tokenizer,
    "LoRA Fine-Tuned T5 with Beam Search",
    val_loader,
    num_beams=2
)

Final results saved for beam width 2


In [None]:
result_t5=pd.read_csv("/content/drive/MyDrive/model_evaluation_T5_Beam_results_new.csv")
# result_t5[result_t5['BERTScore']=='0']


In [None]:
result_t5.head(3)

Unnamed: 0,Model,Input,Reference,Output,BERTScore,Perplexity,Faithfulness
0,LoRA Fine-Tuned T5 with Beam Search,vague goal to SMART goal: i’m looking to explo...,to navigate the challenges of my work environm...,"by the end of the next quarter, i will enhance...",0.673848,1.595718,0.717325
1,LoRA Fine-Tuned T5 with Beam Search,vague goal to SMART goal: i’m looking to take ...,"by the end of the next quarter, i will dedicat...","by the end of the next quarter, i will enhance...",0.738156,1.372081,0.81346
2,LoRA Fine-Tuned T5 with Beam Search,vague goal to SMART goal: i’m thinking it migh...,"by the end of the next quarter, i will enhance...","by the end of the next quarter, i will enhance...",0.788314,1.315776,0.909652
