# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [None]:
!pip install datasets=="3.2.0"
!pip install transformers[torch]

In [1]:
from datasets import load_dataset, concatenate_datasets

dataset = load_dataset("rotten_tomatoes")

In [2]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForSequenceClassification.from_pretrained('gpt2',
                                                      num_labels=2,
                                                      id2label={0: "NEGATIVE", 1: "POSITIVE"},
                                                      label2id={"NEGATIVE": 0, "POSITIVE": 1}).to(device)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
tokenizer.pad_token = tokenizer.eos_token

def tokenize_function(examples):
    inputs = tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )
    inputs["labels"] = examples["label"]  # Usar labels originais
    return inputs

tokenized_datasets = dataset.map(tokenize_function, batched=True)

In [5]:
train_dataset = tokenized_datasets["train"]


eval_dataset = concatenate_datasets([
    tokenized_datasets["validation"],
    tokenized_datasets["test"]
])

In [6]:
model.config.pad_token_id = tokenizer.pad_token_id

for param in model.base_model.parameters():
    param.requires_grad = True

In [7]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    accuracy = (predictions == labels).mean()
    return {"accuracy": accuracy}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./results",
        learning_rate=3e-4,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        logging_steps=1,
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.1434,0.39722,0.819418


TrainOutput(global_step=534, training_loss=0.503573418389099, metrics={'train_runtime': 272.7957, 'train_samples_per_second': 31.269, 'train_steps_per_second': 1.958, 'total_flos': 557215320637440.0, 'train_loss': 0.503573418389099, 'epoch': 1.0})

In [8]:
prior_evaluate = trainer.evaluate()

print("\n=== Resultados da Avaliação Antes do Fine-Tuning ===")
for key, value in prior_evaluate.items():
    # Formatação especial para valores numéricos
    if isinstance(value, float):
        if key == "epoch":
            print(f"{key.upper():<25}: {int(value)}")
        else:
            print(f"{key.upper():<25}: {value:.4f}")
    else:
        print(f"{key.upper():<25}: {value}")

print("===================================================")


=== Resultados da Avaliação Antes do Fine-Tuning ===
EVAL_LOSS                : 0.3972
EVAL_ACCURACY            : 0.8194
EVAL_RUNTIME             : 18.6913
EVAL_SAMPLES_PER_SECOND  : 114.0640
EVAL_STEPS_PER_SECOND    : 7.1690
EPOCH                    : 1


In [9]:
import pandas as pd
import random

results = trainer.predict(eval_dataset)

df = pd.DataFrame({
    "text": [item["text"] for item in eval_dataset],
    "predictions": results.predictions.argmax(axis=1),
    "labels": results.label_ids,
})

# Função para selecionar amostra aleatória
def show_random_samples(df, quantity: int, random_seed: int = 42):
    return df.sample(n=quantity, random_state=random_seed).reset_index(drop=True)

# Quantidade de itens para exibir
qtd_item = 10

# Exibir amostra aleatória
pd.set_option("display.max_colwidth", None)
show_random_samples(df, qtd_item)

Unnamed: 0,text,predictions,labels
0,"cool gadgets and creatures keep this fresh . not as good as the original , but what is . . .",1,1
1,an awful movie that will only satisfy the most emotionally malleable of filmgoers .,0,0
2,. . . you can be forgiven for realizing that you've spent the past 20 minutes looking at your watch and waiting for frida to just die already .,0,0
3,"though uniformly well acted , especially by young ballesta and galan ( a first-time actor ) , writer/director achero manas's film is schematic and obvious .",0,0
4,absolutely ( and unintentionally ) terrifying .,0,0
5,"shanghai ghetto , much stranger than any fiction , brings this unknown slice of history affectingly to life .",1,1
6,"while hoffman's performance is great , the subject matter goes nowhere .",0,0
7,"works because we're never sure if ohlinger's on the level or merely a dying , delusional man trying to get into the history books before he croaks .",0,1
8,"as a science fiction movie , "" minority report "" astounds .",0,1
9,"what one is left with , even after the most awful acts are committed , is an overwhelming sadness that feels as if it has made its way into your very bloodstream .",0,1


In [10]:
def get_discrepancies(results_df, num_samples=5):
    discrepancies = results_df[results_df['labels'] != results_df['predictions']]
    
    if discrepancies.empty:
        return pd.DataFrame({'Message': ['✅ No discrepancies found']})
    
    total = len(discrepancies)
    samples = min(num_samples, total)
    
    if total <= samples * 2:
        return discrepancies.reset_index(drop=True)
    
    head_df = discrepancies.head(samples).copy()
    tail_df = discrepancies.tail(samples).copy()
    
    return pd.concat([head_df, tail_df]).reset_index(drop=True)

get_discrepancies(df)

Unnamed: 0,text,predictions,labels
0,a mischievous visual style and oodles of charm make 'cherish' a very good ( but not great ) movie .,0,1
1,"the importance of being earnest , so thick with wit it plays like a reading from bartlett's familiar quotations",0,1
2,"made for teens and reviewed as such , this is recommended only for those under 20 years of age . . . and then only as a very mild rental .",0,1
3,imagine o . henry's <b>the gift of the magi</b> relocated to the scuzzy underbelly of nyc's drug scene . merry friggin' christmas !,0,1
4,nothing short of wonderful with its ten-year-old female protagonist and its steadfast refusal to set up a dualistic battle between good and evil .,0,1
5,"like the tuck family themselves , this movie just goes on and on and on and on",1,0
6,a film that plays things so nice 'n safe as to often play like a milquetoast movie of the week blown up for the big screen .,1,0
7,more intellectually scary than dramatically involving .,1,0
8,narc is all menace and atmosphere .,1,0
9,there are many definitions of 'time waster' but this movie must surely be one of them .,1,0


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [11]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    task_type="SEQ_CLS",                  # Task específica para classificação
    r=8,                                  # Rank da decomposição LoRA
    lora_alpha=64,                        # Fator de escala
    lora_dropout=0.2,                     # Dropout para regularização
    target_modules=['c_attn', 'c_proj'],  # Módulos do GPT-2 para aplicar LoRA
    bias="none"                           # Estratégia para bias
)

In [12]:
peft_model = get_peft_model(model, peft_config)



In [13]:
peft_model.print_trainable_parameters()

trainable params: 812,544 || all params: 125,253,888 || trainable%: 0.6487


In [14]:
training_args = TrainingArguments(
    output_dir="./peft_results",
    learning_rate=3e-4,    
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    evaluation_strategy="epoch",
    num_train_epochs=1,    
    logging_steps=10,
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="none"       
)



In [15]:
peft_trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

peft_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4342,0.453732,0.829737


TrainOutput(global_step=534, training_loss=0.2785607654503669, metrics={'train_runtime': 188.6974, 'train_samples_per_second': 45.205, 'train_steps_per_second': 2.83, 'total_flos': 562538328883200.0, 'train_loss': 0.2785607654503669, 'epoch': 1.0})

In [16]:
peft_model.save_pretrained("./model/gpt2-rotten-tomatoes-lora")
tokenizer.save_pretrained("./model/gpt2-rotten-tomatoes-lora")

('./model/gpt2-rotten-tomatoes-lora/tokenizer_config.json',
 './model/gpt2-rotten-tomatoes-lora/special_tokens_map.json',
 './model/gpt2-rotten-tomatoes-lora/vocab.json',
 './model/gpt2-rotten-tomatoes-lora/merges.txt',
 './model/gpt2-rotten-tomatoes-lora/added_tokens.json',
 './model/gpt2-rotten-tomatoes-lora/tokenizer.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [18]:
from peft import PeftModel

loaded_model = AutoModelForSequenceClassification.from_pretrained('gpt2').to(device)
peft_loaded = PeftModel.from_pretrained(loaded_model, "./model/gpt2-rotten-tomatoes-lora")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [21]:
print("\n=== Performance Comparison ===")
print(f"{'Metric':<25} | {'Original':<10} | {'PEFT':<10}")
print("-" * 50)

# Ensure metrics exist in both results
peft_evaluate = peft_trainer.evaluate()

for key in prior_evaluate:
    if key not in peft_evaluate:
        continue
    
    # Format original values
    original_val = prior_evaluate[key]
    formatted_original = f"{original_val:.4f}" if isinstance(original_val, float) else str(original_val)
    
    # Format PEFT values
    peft_val = peft_evaluate[key]
    formatted_peft = f"{peft_val:.4f}" if isinstance(peft_val, float) else str(peft_val)
    
    print(f"{key.upper():<25} | {formatted_original:<10} | {formatted_peft:<10}")


=== Performance Comparison ===
Metric                    | Original   | PEFT      
--------------------------------------------------
EVAL_LOSS                 | 0.3972     | 0.4537    
EVAL_ACCURACY             | 0.8194     | 0.8297    
EVAL_RUNTIME              | 18.6913    | 20.9032   
EVAL_SAMPLES_PER_SECOND   | 114.0640   | 101.9940  
EVAL_STEPS_PER_SECOND     | 7.1690     | 6.4100    
EPOCH                     | 1.0000     | 1.0000    


## Key Insights:

### Accuracy-Loss Paradox:

PEFT shows **+1.03% better accuracy** (82.97% vs 81.94%) despite +14.22% higher loss (0.4537 vs 0.3972)

Common in scenarios where:

    * Model makes more confident wrong predictions (increases loss)
    * Correct predictions have lower confidence margins

### Computational Efficiency:

Throughput **decreased by -8.73%** (104.1 samples/sec vs 114.06)
Due to adapter operations in PEFT adding computational overhead

### Training Dynamics:

Both models trained for 1 epoch

PEFT achieves better accuracy with only ~0.2% of parameters updated
Traditional method shows better loss but lower accuracy (potential overfitting)