<a href="https://colab.research.google.com/github/qswitcher/udacity-llm-fine-tuning/blob/main/LightweightFineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

My choices for this project include

* PEFT technique: LoRA
* Model: gpt2
* Evaluation approach: Comparing the evaluation accurracy on the test set pre and post training using `Trainer.evaluate`
* Fine-tuning dataset: The `cornell-movie-review-data/rotten_tomatoes` dataset which is a collection of movie reviews with sentiments from the Rotten Tomatoes movie review website.

## Loading and Evaluating a Foundation Model

In the cells below, we load the chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
# Install the required version of datasets in case you have an older version
# You will need to choose "Kernel > Restart Kernel" from the menu after executing this cell
! pip install -q "datasets==2.15.0"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/521.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/115.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/166.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m166.4/166.4 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/135.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.4/135.4 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
# model weight directory root
directory_root = "drive/MyDrive/udacity/project1"

In [4]:
# loading the dataset

from datasets import load_dataset

dataset = load_dataset("cornell-movie-review-data/rotten_tomatoes")

splits = ["train", "test"]



Downloading readme:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/699k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

In [5]:
# loading the transformer

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})



# tokenize all the examples
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"]), batched=True
    )

# Inspect the available columns in the dataset
tokenized_dataset["train"]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/8530 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 8530
})

In [6]:
# load the gpt2 model
from transformers import AutoModelForSequenceClassification


model = AutoModelForSequenceClassification.from_pretrained(
  "gpt2",
  num_labels=2,
  id2label={0: "negative", 1: "positive"},
  label2id={"negative": 0, "positive": 1},
)

model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id

# save pre-trained weights
model.save_pretrained(directory_root + '/gp2-text-classification-untrained')


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


## Performing Parameter-Efficient Fine-Tuning

In the cells below, we create a PEFT model from the loaded model, run a training loop, and save the PEFT model weights.

In [7]:
# create a PEFT Config
from peft import LoraConfig, TaskType

peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)

In [8]:
# create a PEFT model from the pretrained model
from peft import get_peft_model

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 296,448 || all params: 124,738,560 || trainable%: 0.2377




In [9]:
# create the trainer
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

def create_trainer(model):
  return Trainer(
      model=model,
      args=TrainingArguments(
          output_dir=directory_root + "/checkpoints",
          learning_rate=5e-5,
          per_device_train_batch_size=128,
          per_device_eval_batch_size=128,
          num_train_epochs=10,
          weight_decay=0.01,
          eval_strategy="epoch",
          save_strategy="epoch",
          load_best_model_at_end=True,
          report_to="none",
      ),
      train_dataset=tokenized_dataset["train"],
      eval_dataset=tokenized_dataset["test"],
      tokenizer=tokenizer,
      data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
      compute_metrics=compute_metrics,
  )

In [10]:
# create the trainer and train the model
trainer = create_trainer(model)

trainer.train()


  return Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.660931,0.670732
2,No log,0.577955,0.727955
3,No log,0.479249,0.790807
4,No log,0.415669,0.824578
5,No log,0.391249,0.830206
6,No log,0.384716,0.829268
7,No log,0.38287,0.827392
8,0.560300,0.373556,0.836773
9,0.560300,0.371256,0.833021
10,0.560300,0.371059,0.832083




TrainOutput(global_step=670, training_loss=0.520096212358617, metrics={'train_runtime': 638.3231, 'train_samples_per_second': 133.631, 'train_steps_per_second': 1.05, 'total_flos': 2484224972992512.0, 'train_loss': 0.520096212358617, 'epoch': 10.0})

In [13]:
# save weights
model.save_pretrained(directory_root + '/gp2-text-classification-trained')



## Performing Inference with a PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [19]:
from peft import PeftModel

model_base = AutoModelForSequenceClassification.from_pretrained(directory_root + "/gp2-text-classification-untrained")
eval_results_before = create_trainer(model_base).evaluate()

model_after = PeftModel.from_pretrained(model_base, directory_root + "/gp2-text-classification-trained")
eval_results_after = create_trainer(model_after).evaluate()

print("Accuracy before: ", eval_results_before["eval_accuracy"])
print("Accuracy after: ", eval_results_after["eval_accuracy"])

  return Trainer(


  return Trainer(


Accuracy before:  0.49906191369606
Accuracy after:  0.8320825515947468
