In [34]:
#Nice blog with PEFT fine tuing :
#https://jaotheboss.medium.com/peft-with-bert-8763d8b8a4ca

# Project Instructions
To pass this project, your code must:

1. Load a pre-trained model and evaluate its performance
2. Perform parameter-efficient fine-tuning using the pre-trained model
3. Perform inference using the fine-tuned model and compare its performance to the original model

# Getting Started
This project is fairly open-ended. As long as you follow the prescribed steps, you may choose any appropriate PEFT technique, model, evaluation approach, and fine-tuning dataset.

## PEFT technique
* The PEFT technique covered in this course was LoRA, but new techniques are continuously being developed. See the PEFT README(opens in a new tab) for links to the papers behind each of the supported techniques.
* If you are unsure, we recommend using LoRA as your PEFT technique. LoRA is the only PEFT technique that is compatible with all models at this time.
## Model
* Your choice of model will depend on your choice of PEFT technique.
* Unless you plan to use your own hardware/GPU rather than the Udacity Workspace, it's best to choose a smaller model.
* The model must be compatible with a sequence classification task.
* If you are unsure, we recommend using GPT-2 as your model. This is a relatively small model that is compatible with sequence classification and LoRA.

For specific model names in the Hugging Face registry, you can use the widget at the bottom of the PEFT documentation homepage(opens in a new tab) (select "sequence classification" from the drop-down).

## Evaluation approach
* The evaluation approach covered in this course was the evaluate method with a Hugging Face Trainer. You may use the same approach, or any other reasonable evaluation approach for a sequence classification task
* The key requirement for the evaluation is that you must be able to compare the original foundation model's performance and the fine-tuned model's performance.
## Dataset
* Your PEFT process must use a dataset from Hugging Face's datasets library. As with the selection of model, you will need to ensure that the dataset is small enough that it is usable in the Udacity Workspace.
* The key requirement for the dataset is that it matches the task. Follow this link to view Hugging Face datasets filtered by the text classification task(opens in a new tab)

# Loading and Evaluating a Foundation Model
## Loading the model
Once you have selected a model, load it in your notebook.

## Evaluating the model
Perform an initial evaluation of the model on your chosen sequence classification task. This step will require that you also load an appropriate tokenizer and dataset.

## Performing Parameter-Efficient Fine-Tuning
### Creating a PEFT config
Create a PEFT config with appropriate hyperparameters for your chosen model.

### Creating a PEFT model
Using the PEFT config and foundation model, create a PEFT model.

### Training the model
Using the PEFT model and dataset, run a training loop with at least one epoch.

### Saving the trained model
Depending on your training loop configuration, your PEFT model may have already been saved. If not, use *save_pretrained* to save your progress.

## Performing Inference with a PEFT Model
### Loading the model
Using the appropriate PEFT model class, load your trained model.

### Evaluating the model
Repeat the previous evaluation process, this time using the PEFT model. Compare the results to the results from the original foundation model.

## Submission Instructions
Projects may be submitted using the Project Workspace or by uploading a zip file or sharing a GitHub repository link on the Project Submission page.

Submission should include the notebook file and saved weights.

In [35]:
from datasets import load_dataset

In [36]:
# dataset = load_dataset("yelp_review_full")

# Load the sms_spam dataset
# See: https://huggingface.co/datasets/sms_spam

from datasets import load_dataset

# # Must be updated since imdb has train, test and unsupervised dataset
# dataset = load_dataset("stanfordnlp/imdb", split="train").train_test_split(
#     test_size=0.2, shuffle=True, seed=23
# )

# splits = ["train", "test"]

# # View the dataset characteristics
# dataset["train"]

dataset = load_dataset("stanfordnlp/imdb")

In [38]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [39]:
# model_id = "facebook/opt-350m"
# adapter_model_id = "ybelkada/opt-350m-lora"

In [40]:
from transformers import AutoTokenizer

In [41]:
model_id="google-bert/bert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [42]:
# tokenizer.pad_token = tokenizer.eos_token # Otherwise getting the error “Asking to pad but the tokenizer does not have a padding token”
# tokenizer.padding_size = 'right'
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

In [43]:
# tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenized_train_datasets = dataset["train"].map(tokenize_function, batched=True)
tokenized_test_datasets = dataset["test"].map(tokenize_function, batched=True)

In [44]:
small_train_dataset = tokenized_train_datasets.shuffle(seed=42).select(range(3000))
small_eval_dataset = tokenized_test_datasets.shuffle(seed=42).select(range(1000))

In [45]:
print(small_train_dataset)

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 3000
})


In [46]:
print(small_train_dataset[0])

{'text': 'There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...', 'label': 1, 'input_ids': [101, 1247, 1110, 1185, 6796, 1120, 1155, 1206, 3144, 2852, 1105, 26890, 1197, 1133, 1103, 1864, 1115, 1241, 1132, 2021, 1326, 1164, 5973, 6969, 119, 26890, 1197, 2736, 19501, 1183, 117, 3144, 2852, 273

## Train with PyTorch Trainer

In [12]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_id)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
print(model)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

## Training hyperparameters

In [14]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer_google-bert")

## Evaluate

In [15]:
!pip install scikit-learn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [16]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

In [17]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

## Train

In [18]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="test_trainer_google-bert", eval_strategy="epoch")

In [19]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.330745,0.875
2,0.391700,0.69501,0.835
3,0.175500,0.571986,0.895


TrainOutput(global_step=1125, training_loss=0.263149776034885, metrics={'train_runtime': 2246.4762, 'train_samples_per_second': 4.006, 'train_steps_per_second': 0.501, 'total_flos': 2367999498240000.0, 'train_loss': 0.263149776034885, 'epoch': 3.0})

In [1]:
# Show the performance of the model on the test set
# What do you think the evaluation accuracy will be?
trainer.evaluate() 

NameError: name 'trainer' is not defined

## Load adapters with 🤗 PEFT

In [21]:
!pip install peft -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [47]:
from transformers import BertLMHeadModel, AutoModelForCausalLM, AutoTokenizer
# from peft import PeftModel

peft_model_id = model_id
model2 = AutoModelForSequenceClassification.from_pretrained(peft_model_id , num_labels=2)
# Not model2 = AutoModelForCausalLM.from_pretrained(peft_model_id , num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [48]:
print(model2)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [49]:
from peft import LoraConfig, TaskType

lora_config = LoraConfig(
    task_type=TaskType.TOKEN_CLS,
    r=16,
    lora_alpha=32, #16,
    lora_dropout=0.05, #0.1,
    bias="none", #"all",
    target_modules=["query", "value"] #["q_proj", "k_proj"], 
    # modules_to_save=["classifier"]
)

In [50]:
from peft import get_peft_model

# lora_model = get_peft_model(model2, lora_config) <- problem!
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()

trainable params: 591,362 || all params: 108,903,172 || trainable%: 0.5430


## Train a PEFT adapter

In [51]:
%%time
trainer = Trainer(
    model=lora_model,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()

Step,Training Loss
500,0.0302
1000,0.0228


CPU times: user 1min 40s, sys: 36.7 s, total: 2min 16s
Wall time: 27min 21s


TrainOutput(global_step=1125, training_loss=0.028970282236735027, metrics={'train_runtime': 1641.4767, 'train_samples_per_second': 5.483, 'train_steps_per_second': 0.685, 'total_flos': 2384349474816000.0, 'train_loss': 0.028970282236735027, 'epoch': 3.0})

In [52]:
%%time
trainer.evaluate()

CPU times: user 3.6 s, sys: 2.18 s, total: 5.78 s
Wall time: 1min 10s


{'eval_loss': 0.6031149625778198,
 'eval_accuracy': 0.886,
 'eval_runtime': 69.9993,
 'eval_samples_per_second': 14.286,
 'eval_steps_per_second': 1.786,
 'epoch': 3.0}

In [53]:
model.save_pretrained("test_trainer_peft_google-bert")
# model = AutoModelForCausalLM.from_pretrained(save_dir)

### Change LORA parameters

In [54]:
lora_config = LoraConfig(
    task_type=TaskType.TOKEN_CLS,
    r=16,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="all",
    target_modules=["query", "value"] #["q_proj", "k_proj"], 
    # modules_to_save=["classifier"]
)

In [55]:
# lora_model = get_peft_model(model2, lora_config) <- problem!
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()

trainable params: 694,274 || all params: 108,903,172 || trainable%: 0.6375


In [56]:
%%time
trainer = Trainer(
    model=lora_model,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()

Step,Training Loss
500,0.0304
1000,0.0229


CPU times: user 3min 4s, sys: 59.1 s, total: 4min 3s
Wall time: 39min 1s


TrainOutput(global_step=1125, training_loss=0.029292932722303603, metrics={'train_runtime': 2340.8927, 'train_samples_per_second': 3.845, 'train_steps_per_second': 0.481, 'total_flos': 2384349474816000.0, 'train_loss': 0.029292932722303603, 'epoch': 3.0})

In [32]:
%%time
trainer.evaluate()

CPU times: user 4.27 s, sys: 1.16 s, total: 5.43 s
Wall time: 1min 39s


{'eval_loss': 0.5888401865959167,
 'eval_accuracy': 0.885,
 'eval_runtime': 99.498,
 'eval_samples_per_second': 10.05,
 'eval_steps_per_second': 1.256,
 'epoch': 3.0}