# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique:
* Model:
* Evaluation approach:
* Fine-tuning dataset:

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [36]:
! pip install -q "datasets==2.15.0"
! pip install 'transformers[torch]'
! pip install peft
! pip install evaluate



In [37]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import evaluate
import torch
import numpy as np
import random
postfix=random.randint(1,100)

In [38]:
yelp_ds = load_dataset("yelp_review_full", split="train[0:1000]")
yelp_split_ds = yelp_ds.train_test_split(test_size=0.1)
splits = ["train", "test"]
yelp_split_ds

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 900
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 100
    })
})

In [39]:
yelp_split_ds['train'][0]

{'label': 0,
 'text': "I really feel bad about giving a one-star review.  First, it's not about the service, price, decor, or the food.  I'm writing this review after going there twice.  The first time my husband and I went, the dinner was good.  We thought this place has the potential to be our go-to Chinese restaurant in Pgh.  Today I remembered that they have an early-bird special 3-6pm (m-sat).  The won ton soup was not authentic (more like chicken noodle soup) but wasn't bad.  The entree was fine.  \\n\\nWhat really knocked the score down to 1 is the awful tobacco/cigarette smell from the tea!! ? !!   I had to go to the grocery store and buy a pack of gum.  The smell wouldn't leave me so I came home to rinse my mouth with Listerine (2-3 minutes) then brush (as long as I could) to get rid of the smell.  Did they use the tea pot as an ash tray?  \\n\\n[first visit: 3 stars] + [second visit: -5 stars] = [ -2 stars]"}

In [40]:
id2label = {1 : "worst", 2 : "bad", 3 : "ok", 4 : "good", 5 : "excellent"}
label2id = {"worst" : 1, "bad" : 2, "ok" : 3, "good" : 4, "excellent" : 5}

model_checkpoint = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(
    model_checkpoint, num_labels=5, id2label=id2label, label2id=label2id)
model

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [41]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

def preprocess_function(examples):
    print(examples)
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = yelp_split_ds[split].map(preprocess_function, batched=True)

print(tokenized_ds["train"][0])
print(tokenized_ds["train"][0]["input_ids"])


{'label': 0, 'text': "I really feel bad about giving a one-star review.  First, it's not about the service, price, decor, or the food.  I'm writing this review after going there twice.  The first time my husband and I went, the dinner was good.  We thought this place has the potential to be our go-to Chinese restaurant in Pgh.  Today I remembered that they have an early-bird special 3-6pm (m-sat).  The won ton soup was not authentic (more like chicken noodle soup) but wasn't bad.  The entree was fine.  \\n\\nWhat really knocked the score down to 1 is the awful tobacco/cigarette smell from the tea!! ? !!   I had to go to the grocery store and buy a pack of gum.  The smell wouldn't leave me so I came home to rinse my mouth with Listerine (2-3 minutes) then brush (as long as I could) to get rid of the smell.  Did they use the tea pot as an ash tray?  \\n\\n[first visit: 3 stars] + [second visit: -5 stars] = [ -2 stars]", 'input_ids': [101, 1045, 2428, 2514, 2919, 2055, 3228, 1037, 2028, 1

In [42]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [43]:
model.config.pad_token_id = tokenizer.pad_token_id
for param in model.base_model.parameters():
    param.requires_grad = False
print(model)
print(model.classifier)

pre_trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=model_checkpoint + "-yelp-bert-pre-tuning-" + str(postfix),
        learning_rate=2e-5,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=5,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

pre_trainer.train()
pre_trainer.evaluate()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.613575,0.21
2,No log,1.609515,0.21
3,No log,1.608588,0.22
4,No log,1.608149,0.23
5,No log,1.608179,0.21


{'eval_loss': 1.6081490516662598,
 'eval_accuracy': 0.23,
 'eval_runtime': 3.7886,
 'eval_samples_per_second': 26.395,
 'eval_steps_per_second': 1.056,
 'epoch': 5.0}

In [44]:
import pandas as pd
df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]
predictions = pre_trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)
df.head(10)

Unnamed: 0,text,label,predicted_label
0,Pros:\n- At least there's a climbing gym in th...,0,2
1,For $10 you can get an 8 inch sub and a GIANT ...,3,1
2,The food here was absolute perfection; each bi...,4,2
3,"Andrew here said it best: \""... food served b...",1,3
4,I really like the chili. Burgers are fair and ...,1,1
5,The coffee here was excellent! Had a double es...,4,4
6,I have a soft spot for this place -- probably ...,4,2
7,This is the best Giant Eagle in the area and i...,4,2
8,Another store which has gone the way of the Do...,1,1
9,Amazing brunch! I'd never been to Casbah befor...,3,2


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [45]:
peft_config = LoraConfig(task_type="SEQ_CLS",
                        r=4,
                        lora_alpha=32,
                        lora_dropout=0.01,
                        target_modules = ['query'])

lora_peft_model = get_peft_model(model, peft_config)
lora_peft_model.print_trainable_parameters()

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
    output_dir= model_checkpoint + "-yelp-bert-lora-peft-fine-tuning-" + str(postfix),
    learning_rate=1e-3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)
ft_trainer = Trainer(
    model=lora_peft_model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# train model
ft_trainer.train()
ft_trainer.evaluate()

trainable params: 77,573 || all params: 109,563,658 || trainable%: 0.07080176165713635


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.544792,0.32
2,No log,1.087707,0.54
3,No log,1.18538,0.45
4,No log,1.110459,0.61
5,No log,1.175595,0.52
6,No log,1.217371,0.56
7,No log,1.177991,0.6
8,No log,1.210592,0.59
9,0.906100,1.238329,0.61
10,0.906100,1.242028,0.58


{'eval_loss': 1.0877071619033813,
 'eval_accuracy': 0.54,
 'eval_runtime': 3.4732,
 'eval_samples_per_second': 28.792,
 'eval_steps_per_second': 2.015,
 'epoch': 10.0}

In [46]:
lora_peft_model.save_pretrained("yelp-bert-peft-fine-tuned", save_adapter=True, save_config=True)
lora_peft_model.merge_and_unload()


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [47]:
import pandas as pd
df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]
predictions = ft_trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)
df.head(10)

Unnamed: 0,text,label,predicted_label
0,Pros:\n- At least there's a climbing gym in th...,0,0
1,For $10 you can get an 8 inch sub and a GIANT ...,3,3
2,The food here was absolute perfection; each bi...,4,4
3,"Andrew here said it best: \""... food served b...",1,0
4,I really like the chili. Burgers are fair and ...,1,1
5,The coffee here was excellent! Had a double es...,4,3
6,I have a soft spot for this place -- probably ...,4,3
7,This is the best Giant Eagle in the area and i...,4,3
8,Another store which has gone the way of the Do...,1,1
9,Amazing brunch! I'd never been to Casbah befor...,3,3


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [48]:
from peft import AutoPeftModelForSequenceClassification
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("yelp-bert-peft-fine-tuned",  num_labels=5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [49]:
infer_input = tokenizer('one of the best restaurants i ever visited', padding="max_length", truncation=True, return_tensors='pt')
with torch.no_grad():
    infer_output = lora_model(**infer_input)
    logits = infer_output.logits

probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).numpy()[0]
print("Predicted class:", predicted_class)

Predicted class: 4
