# Parameter Efficient Fine-Tuning | 🤗 PEFT



## 1. Low Rank Adaptations (LoRA)
![img.png](https://miro.medium.com/v2/resize:fit:786/format:webp/1*-a7Mv5SgObBJN7h4V6b4Iw.png)

------------------



### `Hands-On`

In [1]:
!pip install -U \
 transformers==4.55.2 \
 datasets==4.0.0 \
 evaluate==0.4.5 \
 rouge-score==0.1.2 \
 loralib==0.1.2 \
 peft==0.17.0

Collecting evaluate==0.4.5
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Collecting rouge-score==0.1.2
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting loralib==0.1.2
  Downloading loralib-0.1.2-py3-none-any.whl.metadata (15 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft==0.17.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.13.0->peft==0.17.0)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.13.0->peft==0.17.0)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.13.0->peft==0.17.0)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadat

In [2]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, TrainingArguments, Trainer
from datasets import load_dataset
import torch
from peft import LoraConfig, get_peft_model, TaskType, PeftModel, PeftConfig
from huggingface_hub import notebook_login
from evaluate import load
from transformers import GenerationConfig

In [3]:
# Load the dataset
dataset = load_dataset("knkarthick/dialogsum")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.csv:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

validation.csv: 0.00B [00:00, ?B/s]

test.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [4]:
# Load the model
model_ckpt = 'google/flan-t5-base'
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt)

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [5]:
dataset['train']['dialogue'][0]

"#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor."

In [6]:
# Tokenization function
def tokenize_function(batch):

  # Start and end prompt between dialogue
  start_prompt = 'Summarize the following conversation.\n\n'
  end_prompt = '\n\nSummary: '
  prompt = [start_prompt + dialogue + end_prompt for dialogue in batch["dialogue"]]

  # Tokenizing the prompt and the label, taking the input_ids and add them to the batch dict
  batch['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
  batch['labels'] = tokenizer(batch["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

  return batch

# Apply using map function
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary'])

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [7]:
# Setup PEFT/LoRA model for Fine Tuning
lora_config = LoraConfig(
    r=40,                       # Rank of the low-rank adaptation matrices. The higher this value, the more capacity the model has to learn complex patterns.
    lora_alpha=40,              # Scaling factor for the LoRA weights. Controls how much the low-rank matrices affect the model’s parameters.
    target_modules=["q", "v"],  # Modules to apply LoRA to. Here "q" and "v" are the query and value matrices in the Transformer architecture.
    lora_dropout=0.05,          # Dropout for the LoRA layers, helps prevent overfitting by randomly dropping units during training.
    bias="none",                # No bias terms for LoRA, which simplifies the adaptation to the low-rank matrices.
    task_type=TaskType.SEQ_2_SEQ_LM  # Specifies that this is a sequence-to-sequence task (language modeling for FLAN-T5).
)

# PEFT model
peft_model = get_peft_model(model, lora_config)

In [8]:
# Login using Key Token from HuggingFace (to be write permission)
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
# Training PEFT Adapter
output_dir = f'{model_ckpt}-peft-dialogue-summary-abdUllahsamir'

batch_size = 2
logging_steps = len(tokenized_datasets['train']) // batch_size

# Training Arguments
peft_training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=2,
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    eval_strategy='epoch',
    logging_steps=logging_steps,
    push_to_hub=True,
    log_level='info',
    disable_tqdm=False,
    report_to=["none"]  # Explicitly disable all reporting integrations
)

PyTorch: setting up devices
average_tokens_across_devices is True but world size is 1. Setting it to False automatically.


In [None]:
from transformers import TrainingArguments

# 1. تحدد مكان الحفظ + إعدادات التدريب
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/peft-checkpoints",  # جوه Google Drive
    save_strategy="epoch",   # أو "steps"
    save_steps=500,          # لو اخترت "steps"
    num_train_epochs=3,
    per_device_train_batch_size=2,
    logging_dir="./logs",
    logging_steps=100,
    evaluation_strategy="epoch",  # لو عايز تعمل eval أثناء التدريب
)

In [12]:
# Training
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    tokenizer=tokenizer)

peft_trainer.train();

  peft_trainer = Trainer(
***** Running training *****
  Num examples = 12,460
  Num Epochs = 2
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 12,460
  Number of trainable parameters = 4,423,680


Epoch,Training Loss,Validation Loss
1,2.1103,0.115174
2,0.1544,0.10685


Saving model checkpoint to google/flan-t5-base-peft-dialogue-summary-abdUllahsamir/checkpoint-500
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/config.json
Model config T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": 

In [13]:
# Save the adapters
peft_model_path = './peft-dialogue-summary-ckpt'
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

!zip -r peft-dialogue-summary-ckpt.zip peft-dialogue-summary-ckpt/
files.download("peft-dialogue-summary-ckpt.zip")

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/config.json
Model config T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repe

('./peft-dialogue-summary-ckpt/tokenizer_config.json',
 './peft-dialogue-summary-ckpt/special_tokens_map.json',
 './peft-dialogue-summary-ckpt/spiece.model',
 './peft-dialogue-summary-ckpt/added_tokens.json',
 './peft-dialogue-summary-ckpt/tokenizer.json')

In [14]:
# Finally, Saving and sharing adapters
peft_trainer.push_to_hub(commit_message='Training Complete')

Saving model checkpoint to google/flan-t5-base-peft-dialogue-summary-abdUllahsamir
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/config.json
Model config T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "le

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...ary-abdUllahsamir/training_args.bin: 100%|##########| 5.37kB / 5.37kB            

  ...-summary-abdUllahsamir/spiece.model: 100%|##########|  792kB /  792kB            

  ...llahsamir/adapter_model.safetensors: 100%|##########| 17.7MB / 17.7MB            

CommitInfo(commit_url='https://huggingface.co/abdullahzahran/flan-t5-base-peft-dialogue-summary-abdUllahsamir/commit/c6a6efe92d7c6a62901acbbda824099a5956cf16', commit_message='Training Complete', commit_description='', oid='c6a6efe92d7c6a62901acbbda824099a5956cf16', pr_url=None, repo_url=RepoUrl('https://huggingface.co/abdullahzahran/flan-t5-base-peft-dialogue-summary-abdUllahsamir', endpoint='https://huggingface.co', repo_type='model', repo_id='abdullahzahran/flan-t5-base-peft-dialogue-summary-abdUllahsamir'), pr_revision=None, pr_num=None)

----

### Inference

In [15]:
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
from pprint import pprint

In [17]:
# Define the checkpoints
base_model_ckpt = "google/flan-t5-base"
peft_model_ckpt = "google/flan-t5-base-peft-dialogue-summary-abdUllahsamir"

# Load the tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_ckpt)
base_model = AutoModelForSeq2SeqLM.from_pretrained(base_model_ckpt, torch_dtype=torch.bfloat16)

# Load the PEFT model
peft_model = PeftModel.from_pretrained(base_model,
                                       peft_model_ckpt,
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)  # only for inference

loading file spiece.model from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/spiece.model
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/tokenizer_config.json
loading file chat_template.jinja from cache at None
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-base/snapshots/7bcac572ce56db69c1ea7c8af255c5d7c9672fc2/config.json
Model config T5Config {
  "archi

### Testing on Some Samples

In [18]:
# an example
dialogue = dataset['test'][200]['dialogue']
baseline_human_summary = dataset['test'][200]['summary']

# Full prompt
prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

# tokenizing
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# PEFT model inference
peft_model_outputs = peft_model.generate(input_ids=input_ids,
                                         generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

# Dialogue & Output
print("DIALOGUE:")
pprint(dialogue)
print()

print("BASELINE HUMAN SUMMARY:")
pprint(baseline_human_summary)
print()

print("PEFT MODEL:")
pprint(peft_model_text_output)

DIALOGUE:
('#Person1#: Have you considered upgrading your system?\n'
 "#Person2#: Yes, but I'm not sure what exactly I would need.\n"
 '#Person1#: You could consider adding a painting program to your software. It '
 'would allow you to make up your own flyers and banners for advertising.\n'
 '#Person2#: That would be a definite bonus.\n'
 '#Person1#: You might also want to upgrade your hardware because it is pretty '
 'outdated now.\n'
 '#Person2#: How can we do that?\n'
 "#Person1#: You'd probably need a faster processor, to begin with. And you "
 'also need a more powerful hard disc, more memory and a faster modem. Do you '
 'have a CD-ROM drive?\n'
 '#Person2#: No.\n'
 '#Person1#: Then you might want to add a CD-ROM drive too, because most new '
 'software programs are coming out on Cds.\n'
 '#Person2#: That sounds great. Thanks.')

BASELINE HUMAN SUMMARY:
('#Person1# teaches #Person2# how to upgrade software and hardware in '
 "#Person2#'s system.")

PEFT MODEL:
('#Person2# suggest

In [19]:
# Load the ROUGE metric
rouge = load("rouge")

# ROUGE-L --> Longest Common Subsequence

# Calculate ROUGE scores
scores = rouge.compute(predictions=[peft_model_text_output], references=[baseline_human_summary])

# Print the ROUGE scores
print(f"ROUGE-1 F1: {scores['rouge1']:.4f}")
print(f"ROUGE-2 F1: {scores['rouge2']:.4f}")
print(f"ROUGE-L F1: {scores['rougeL']:.4f}")

Downloading builder script: 0.00B [00:00, ?B/s]

ROUGE-1 F1: 0.2941
ROUGE-2 F1: 0.0000
ROUGE-L F1: 0.2353


----