# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique:
* Model:
* Evaluation approach:
* Fine-tuning dataset:

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install -q "datasets==2.15.0" "transformers==4.27.2"
!pip install "peft==0.7.0"



In [2]:
# Importing necessary libraries
from peft import LoraConfig, get_peft_model, PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, DataCollatorWithPadding, AutoModelForCausalLM
from datasets import load_dataset
import numpy as np

In [3]:
# Load the train and test splits of the imdb dataset
splits = ["train", "test"]
ds = {split: ds for split, ds in zip(splits, load_dataset("imdb", split=splits))}

# Thin out the dataset to make it run faster for this example
for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(500))

# Show the dataset
ds

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 })}

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")


def preprocess_function(examples):
    """Preprocess the imdb dataset by returning tokenized examples."""
    return tokenizer(examples['text'], padding = 'max_length', truncation=True)


tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)


# Check that we tokenized the examples properly
assert tokenized_ds["train"][0]["input_ids"][:5] == [101, 2045, 2003, 2053, 7189]

# Show the first example of the tokenized training set
print(tokenized_ds["train"][0]["input_ids"])

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

[101, 2045, 2003, 2053, 7189, 2012, 2035, 2090, 3481, 3771, 1998, 6337, 2099, 2021, 1996, 2755, 2008, 2119, 2024, 2610, 2186, 2055, 6355, 6997, 1012, 6337, 2099, 3504, 15594, 2100, 1010, 3481, 3771, 3504, 4438, 1012, 6337, 2099, 14811, 2024, 3243, 3722, 1012, 3481, 3771, 1005, 1055, 5436, 2024, 2521, 2062, 8552, 1012, 1012, 1012, 3481, 3771, 3504, 2062, 2066, 3539, 8343, 1010, 2065, 2057, 2031, 2000, 3962, 12319, 1012, 1012, 1012, 1996, 2364, 2839, 2003, 5410, 1998, 6881, 2080, 1010, 2021, 2031, 1000, 17936, 6767, 7054, 3401, 1000, 1012, 2111, 2066, 2000, 12826, 1010, 2000, 3648, 1010, 2000, 16157, 1012, 2129, 2055, 2074, 9107, 1029, 6057, 2518, 2205, 1010, 2111, 3015, 3481, 3771, 3504, 2137, 2021, 1010, 2006, 1996, 2060, 2192, 1010, 9177, 2027, 9544, 2137, 2186, 1006, 999, 999, 999, 1007, 1012, 2672, 2009, 1005, 1055, 1996, 2653, 1010, 2030, 1996, 4382, 1010, 2021, 1045, 2228, 2023, 2186, 2003, 2062, 2394, 2084, 2137, 1012, 2011, 1996, 2126, 1010, 1996, 5889, 2024, 2428, 2204, 1998, 6

In [5]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad=False

model.classifier

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.

Linear(in_features=768, out_features=2, bias=True)

In [6]:
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [7]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
training_args = TrainingArguments(output_dir="./results",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=5,
        weight_decay=0.01,
        warmup_steps=5,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        logging_dir='./logs',
    )

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.472583,0.778
2,No log,0.477359,0.786
3,No log,0.475553,0.81
4,0.531000,0.679966,0.772
5,0.531000,0.465353,0.83


TrainOutput(global_step=625, training_loss=0.5002173095703125, metrics={'train_runtime': 90.7053, 'train_samples_per_second': 27.562, 'train_steps_per_second': 6.89, 'total_flos': 331168496640000.0, 'train_loss': 0.5002173095703125, 'epoch': 5.0})

In [8]:
eval_result = trainer.evaluate()
print(eval_result)

{'eval_loss': 0.4653528332710266, 'eval_accuracy': 0.83, 'eval_runtime': 8.875, 'eval_samples_per_second': 56.338, 'eval_steps_per_second': 14.085, 'epoch': 5.0}


In [9]:
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(5)

Unnamed: 0,text,label,predicted_label
0,When I unsuspectedly rented A Thousand Acres...,1,1
1,This is the latest entry in the long series of...,1,1
2,This movie was so frustrating. Everything seem...,0,0
3,"I was truly and wonderfully surprised at ""O' B...",1,0
4,This movie spends most of its time preaching t...,0,0


In [10]:
# Show full cell output
pd.set_option("display.max_colwidth", None)

df[df["label"] != df["predicted_label"]].head(2)

Unnamed: 0,text,label,predicted_label
3,"I was truly and wonderfully surprised at ""O' Brother, Where Art Thou?"" The video store was out of all the movies I was planning on renting, so then I came across this. I came home and as I watched I became engrossed and found myself laughing out loud. The Coen's have made a magnificiant film again. But I think the first time you watch this movie, you get to know the characters. The second time, now that you know them, you laugh sooo hard it could hurt you. I strongly would reccomend ANYONE seeing this because if you are not, you are truly missing a film gem for the ages. 10/10",1,0
21,"Coming from Kiarostami, this art-house visual and sound exposition is a surprise. For a director known for his narratives and keen observation of humans, especially children, this excursion into minimalist cinematography begs for questions: Why did he do it? Was it to keep him busy during a vacation at the shore? ""Five, 5 Long Takes"" consists of, you guessed it, five long takes. They are (the title names are my own and the times approximate): ""Driftwood and waves"". The camera stands nearly still looking at a small piece of driftwood as it gets moved around by small waves splashing on a beach. Ten minutes. ""Watching people on the boardwalk"". The camera stands still looking at the ocean horizon and a boardwalk. People walk across the camera frame, their faces too far and blurry to make them interesting. Eleven minutes. ""Six dogs at the water's edge"". The camera stands still looking at the ocean horizon with a sandy stretch of beach nearby. Far away at the water's edge, six dogs not doing much, just relaxing. Sixteen minutes. ""Ducks in line, gaggle of ducks"". The camera stands still looking at the ocean horizon near the water's edge. Dozen and dozen of ducks stream in single file from left to right. I assume that Kiarostami released them gradually. The last two ducks stop dead on their track and suddenly a gaggle of ducks rolls quietly from right to left. I assume Kiarostami collected the ducks and re-released all at the same time. It is not the first time that he deals with the contrast between organized and disorganized behavior. Eight minutes. ""Frog symphony, oops, I mean cacophony, for a stormy night"". The camera stands over a pond at night. It's pitch black except for what appears to be the reflection of the moon on the undulating water. It is a stormy night and clouds race to cover the moon. The screen goes dark. What remains for us is the cacophony of frogs, howling dogs and, eventually, morning roosters. Hit me on the head if this was done in a single take. I saw this segment as a sound composition put together in the editing room and accompanied by a simple visualization. Twenty seven minutes! Except for the mildly amusing ducks, this exercise in minimalism left me cold. A nonessential film for Kiarostami admirers. I thought I would rate ""Five"" a five, but four is what it deserves. The film is dedicated to Yasujiru Ozu.",0,1


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [20]:

config = LoraConfig(
    r=64,
    lora_alpha=16,
    bias="none",
    lora_dropout=0.1,
    task_type="CAUSAL_LM",)
model = AutoModelForCausalLM.from_pretrained("gpt2")
lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()

trainable params: 2,359,296 || all params: 126,799,104 || trainable%: 1.8606566809809635


In [17]:
lora_model.save_pretrained("gpt-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [18]:
from peft import AutoPeftModelForCausalLM
lora_model = AutoPeftModelForCausalLM.from_pretrained("gpt-lora")

In [23]:

tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Hello, my name is ", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['Hello, my name is _____. I am a student at the University of']


In [24]:
lora_model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
        

In [29]:
!zip -r /content/results.zip /content/results

  adding: content/results/ (stored 0%)
  adding: content/results/checkpoint-625/ (stored 0%)
  adding: content/results/checkpoint-625/tokenizer_config.json (deflated 42%)
  adding: content/results/checkpoint-625/training_args.bin (deflated 51%)
  adding: content/results/checkpoint-625/vocab.txt (deflated 53%)
  adding: content/results/checkpoint-625/config.json (deflated 46%)
  adding: content/results/checkpoint-625/trainer_state.json (deflated 69%)
  adding: content/results/checkpoint-625/pytorch_model.bin (deflated 8%)
  adding: content/results/checkpoint-625/special_tokens_map.json (deflated 42%)
  adding: content/results/checkpoint-625/rng_state.pth (deflated 25%)
  adding: content/results/checkpoint-625/tokenizer.json (deflated 71%)
  adding: content/results/checkpoint-625/optimizer.pt (deflated 13%)
  adding: content/results/checkpoint-625/scheduler.pt (deflated 56%)
  adding: content/results/checkpoint-125/ (stored 0%)
  adding: content/results/checkpoint-125/tokenizer_config.js

In [None]:
from google.colab import files
files.download("/content/results.zip")