# Lightweight Fine-Tuning Project

In order to fine tune a larger foundational model (such as BERT, or GPT-2) to a more specific task, you can take the pretrained version of the model and update its parameters using more specific data. These pretrained models are very large, so in order to save on computational resources you can use the PEFT technique.

### PEFT technique

Parameter-efficient fine-tuning (PEFT) is a method of updating a predefined subset of a model's parameters to tailor it to specific tasks, without the need to modify the entire model, thus saving computational resources.

A specific PEFT technique is Low Rank Adaption (LoRA). This involves approximating a large matrices using two smaller thin matricies, greatly reducing the amount of parameters that have to be updated.

### Model

Is this notebook we examine the GPT-2 model as it is fairly small, but powerful. 
### Evaluation approach
The model will be evaluated on predicting imbd reviews.
### Fine-tuning dataset

The fine-tuning will experiement with the LoRA PEFT technique to update the model and the performance measured and compared to the non-finetuned model.

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
TRAIN_TEST_SIZE = 250

In [2]:
from datasets import load_dataset

# Load the train and test splits of the imdb dataset
splits = ["train", "test"]
ds = load_dataset("imdb")

# Thin out the dataset to make it run faster for this example
for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(TRAIN_TEST_SIZE))

ds.pop('unsupervised')
# Show the dataset
ds

  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 250
    })
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 250
    })
})

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
# GPT-2 Tokenizer doesn't have a padding token.
tokenizer.pad_token = tokenizer.eos_token

def preprocess_function(examples):
    """Preprocess the imdb dataset by returning tokenized examples."""
    tokens = tokenizer(examples['text'], padding='max_length', truncation=True)
    return tokens

# Let's use a lambda function to tokenize all the examples
tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)

# Inspect the available columns in the dataset
tokenized_ds

Map: 100%|██████████| 250/250 [00:00<00:00, 735.85 examples/s]


{'train': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 250
 }),
 'test': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 250
 })}

Load GPT-2 Model

In [4]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE":1},
)
model.config.pad_token_id = model.config.eos_token_id

for param in model.base_model.parameters():
    param.requires_grad = False

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=5,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    )

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.582934,0.684
2,No log,0.938597,0.516
3,No log,0.613515,0.672
4,No log,0.623425,0.688
5,No log,0.547336,0.732


TrainOutput(global_step=315, training_loss=0.6116632855127728, metrics={'train_runtime': 238.7531, 'train_samples_per_second': 5.236, 'train_steps_per_second': 1.319, 'total_flos': 653241876480000.0, 'train_loss': 0.6116632855127728, 'epoch': 5.0})

In [6]:
trainer.evaluate()

{'eval_loss': 0.5473360419273376,
 'eval_accuracy': 0.732,
 'eval_runtime': 22.9662,
 'eval_samples_per_second': 10.886,
 'eval_steps_per_second': 2.743,
 'epoch': 5.0}

In [7]:
def get_sentiment(sentence, model):
    inputs = tokenizer(sentence, return_tensors="pt")
    outputs = model(**inputs.to(device='cuda'))
    prediction = outputs.logits.argmax(-1).item()
    return "positive" if prediction == 1 else "negative"

sentence = "I did not like this movie!"
print(get_sentiment(sentence, model))

negative


## Performing Parameter-Efficient Fine-Tuning

Create a PEFT model from the loaded model, run a training loop, and save the PEFT model weights.

In [8]:
model2 = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE":1},
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
from peft import LoraConfig, TaskType
from peft import get_peft_model
lora_config = LoraConfig(
fan_in_fan_out=True,
task_type=TaskType.SEQ_CLS
)

lora_model = get_peft_model(model2, lora_config)
lora_model.config.pad_token_id = lora_model.config.eos_token_id

In [10]:
lora_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()

In [11]:
# LoRA takes in "labels", not "label" so we need to rename the 
# training and testing sets
train_lora = tokenized_ds['train'].rename_column('label', 'labels')
test_lora = tokenized_ds['test'].rename_column('label', 'labels')

In [12]:
trainer_lora = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_lora,
    eval_dataset=test_lora,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)
trainer_lora.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.761167,0.62
2,No log,0.36876,0.856
3,No log,0.707778,0.852
4,No log,0.748657,0.872
5,No log,0.699763,0.86


TrainOutput(global_step=315, training_loss=0.6527285136873759, metrics={'train_runtime': 389.6191, 'train_samples_per_second': 3.208, 'train_steps_per_second': 0.808, 'total_flos': 655518597120000.0, 'train_loss': 0.6527285136873759, 'epoch': 5.0})

In [13]:
lora_model.save_pretrained("gpt-lora")

In [14]:
trainer_lora.evaluate()

{'eval_loss': 0.3687603175640106,
 'eval_accuracy': 0.856,
 'eval_runtime': 23.5729,
 'eval_samples_per_second': 10.605,
 'eval_steps_per_second': 2.673,
 'epoch': 5.0}

## Performing Inference with a PEFT Model

Load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [15]:
from peft import AutoPeftModelForSequenceClassification
lora_model_loaded = AutoPeftModelForSequenceClassification.from_pretrained("gpt-lora")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [18]:
sentence = "I did not like this movie!"
print(get_sentiment(sentence, lora_model_loaded.to(device='cuda')))

negative
