# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: The PEFT technique used was Lora
* Model: The model used was gpt-2 as suggested
* Evaluation approach: The evaluation approach was accuracy
* Fine-tuning dataset: The Dataset chosen for fine-tuning was Emotion:
https://huggingface.co/datasets/dair-ai/emotion

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install -q peft
!pip install -q transformers
!pip install accelerate -U
!pip install -q "datasets==2.15.0"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!python --version

Python 3.10.12


In [2]:
from peft import LoraConfig,PeftConfig, get_peft_model, AutoPeftModelForCausalLM, TaskType, PeftModel, AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer, pipeline, GPT2ForSequenceClassification, AutoModelForCausalLM, AutoModelForSequenceClassification, DataCollatorWithPadding, Trainer, TrainingArguments
from datasets import load_dataset
import torch
import numpy as np
import pandas as pd

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

# Load and Split DataSet

In [4]:
# load emotion dataset:
# https://huggingface.co/datasets/dair-ai/emotion

splits = ["train", "test", "validation"]

train = load_dataset("dair-ai/emotion", split="train")
test = load_dataset("dair-ai/emotion", split="test")
valid = load_dataset("dair-ai/emotion", split="validation")

data={}
data["train"]=train.shuffle(seed=42).select(range(800)) #8000
data["test"]=test.shuffle(seed=42).select(range(100)) #1000
data["validation"]=valid.shuffle(seed=42).select(range(100)) #1000

# View the dataset characteristics
print("train size:",data["train"])
print("test size:",data["test"])
print("valid size:",data["validation"])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script:   0%|          | 0.00/3.97k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/3.28k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.78k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/592k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.9k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

train size: Dataset({
    features: ['text', 'label'],
    num_rows: 800
})
test size: Dataset({
    features: ['text', 'label'],
    num_rows: 100
})
valid size: Dataset({
    features: ['text', 'label'],
    num_rows: 100
})


# Sample train dataset

In [5]:
data["train"].shuffle(seed=42).select(range(10))[:10]

{'text': ['ive slowed down i take time to listen to my child and be in the moment and not feel like i need to immediately update my status on fb about the cute thing she did',
  'i had then these were truly terrifying and still feel shaken and uneasy because of them',
  'i am feeling a bit adventurous i put on red lipstick rouge artist and intense from make up for ever',
  'i struggled with feeling like myself because myself liked bands and the s and david hockney and photography and collecting things and no body really understood those things because no body does understand you when you re',
  'id feel completely lost without him',
  'i really feel rotten and my ear hurts so bad but i still managed to work out days and really push the intensity',
  'i feel nervous about trying something new during a lesson or if my horse shies at something',
  'i feel like im gonna be so greedy with him cuz i just love him so much',
  'i think people are merely lacking of professionalism and ethics wh

# Create a Tokenizer and Pre-process Datasets

In [6]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
#tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
MAX_LEN=100

def preprocess_function(examples):
    """Preprocess the emotion dataset by returning tokenized examples."""
    return tokenizer(examples["text"], padding="max_length", truncation=True, return_tensors="pt", max_length=MAX_LEN)


tokenized_data = {}
for split in splits:
    tokenized_data[split] = data[split].map(preprocess_function, batched=True)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [None]:
#nputs = tokenizer(train["text"][0], return_tensors="pt")
#utputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
#print(tokenizer.batch_decode(outputs))

# Show the first example of the tokenized training set
print(tokenized_data["train"][0]["input_ids"])

[4514, 16259, 287, 262, 1499, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]


## Load and set up the model

We will now load the model and freeze most of the parameters of the model: everything except the classification head.

In [41]:
model = AutoModelForSequenceClassification.from_pretrained( #use chatpt2
    "gpt2",
    num_labels=6,
    ignore_mismatched_sizes=True, #
    id2label={0: "sadness", 1: "joy", 2:"love", 3:"anger",4:"fear", 5:"surprise"},  # For converting predictions to strings
    label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5},
)
#model = AutoModelForCausalLM.from_pretrained("gpt2") #use chatpt2
model.config.pad_token_id = tokenizer.pad_token_id
# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [42]:
#tokenizer.pad_token = tokenizer.eos_token

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)
#trainer.train()

# Evaluates model performance

In [10]:
test_df = pd.DataFrame(tokenized_data["test"])
test_df = test_df[["text", "label"]]

In [11]:
test_pred = trainer.predict(tokenized_data["test"])
test_df["predicted_label"] = np.argmax(test_pred[0], axis=1)

In [12]:
test_acc =(test_df["label"] == test_df["predicted_label"]).mean()
print('Test Acuracy: ',test_acc)

Test Acuracy:  0.01


In [13]:
trainer.evaluate()

{'eval_loss': 9.601634979248047,
 'eval_accuracy': 0.04,
 'eval_runtime': 7.9128,
 'eval_samples_per_second': 12.638,
 'eval_steps_per_second': 3.159}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [44]:
config = LoraConfig(task_type=TaskType.SEQ_CLS)

lora_model = get_peft_model(model, config) #Criando um modelo PEFT

In [16]:
lora_model.print_trainable_parameters()

trainable params: 299,520 || all params: 124,743,936 || trainable%: 0.2401


# Train the model for 7 epochs

In [46]:
trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        #label_names = ["start_positions", "end_positions"],
        num_train_epochs=7,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)
#trainer.train()

In [47]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.645943,0.8
2,No log,0.724866,0.81
3,1.019700,0.572864,0.81
4,1.019700,0.60915,0.83
5,0.400200,0.613973,0.83
6,0.400200,0.531866,0.84
7,0.400200,0.513182,0.84


TrainOutput(global_step=1400, training_loss=0.5539529636928013, metrics={'train_runtime': 1188.5267, 'train_samples_per_second': 4.712, 'train_steps_per_second': 1.178, 'total_flos': 2936934708019200.0, 'train_loss': 0.5539529636928013, 'epoch': 7.0})

In [48]:
lora_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
 

# Evaluates model performance

In [50]:
val_df = pd.DataFrame(tokenized_data["validation"])
val_df = val_df[["text", "label"]]

In [51]:
val_pred = trainer.predict(tokenized_data["validation"])
val_df["predicted_label"] = np.argmax(val_pred[0], axis=1)

In [52]:
val_acc =(val_df["label"] == val_df["predicted_label"]).mean()
print('validation Acuracy: ',val_acc)

validation Acuracy:  0.84


In [53]:
test_pred = trainer.predict(tokenized_data["test"])
test_df["predicted_label"] = np.argmax(test_pred[0], axis=1)

In [54]:
test_acc =(test_df["label"] == test_df["predicted_label"]).mean()
print('Test Acuracy: ',test_acc)

Test Acuracy:  0.87


In [55]:
trainer.evaluate()

{'eval_loss': 0.5131816267967224,
 'eval_accuracy': 0.84,
 'eval_runtime': 8.3207,
 'eval_samples_per_second': 12.018,
 'eval_steps_per_second': 3.005,
 'epoch': 7.0}

# Save model weights

In [57]:
lora_model.save_pretrained("gpt-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

# Load model weights after training

In [59]:
model_name="gpt-lora" #"model_lora"
lora_model = AutoPeftModelForSequenceClassification.from_pretrained(model_name, num_labels=6)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [60]:
trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        #label_names = ["start_positions", "end_positions"],
        num_train_epochs=10,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)

# Evaluates model performance after fine tuning

In [61]:
trainer.evaluate()

{'eval_loss': 0.587780237197876,
 'eval_accuracy': 0.78,
 'eval_runtime': 8.5131,
 'eval_samples_per_second': 11.747,
 'eval_steps_per_second': 11.747}

In [62]:
#tokenizer.pad_token = tokenizer.eos_token
test_pred = trainer.predict(tokenized_data["test"])
test_df["predicted_label"] = np.argmax(test_pred[0], axis=1)
test_acc =(test_df["label"] == test_df["predicted_label"]).mean()
print('Test Acuracy: ',test_acc)

Test Acuracy:  0.86
