### 0. Setup

Install dependencies

In [1]:
pip install transformers datasets peft accelerate sentencepiece torch py7zr

Collecting datasets
  Downloading datasets-3.3.0-py3-none-any.whl.metadata (19 kB)
Collecting py7zr
  Downloading py7zr-0.22.0-py3-none-any.whl.metadata (16 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecti

Log in to huggingface (optional if you want to upload the model)

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Use gpu

In [3]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


### 1. Load the dataset

Load the dataset from the Hugging Face Hub and take a look at the dataset

In [30]:
import json
from datasets import load_dataset

# Using only 50% of the dataset for demo
dataset = load_dataset("Samsung/samsum", split="train[:50%]")
splits = dataset.train_test_split(test_size=0.1)
train_ds = splits["train"]
val_ds = splits["test"]

# Lets take a look at the dataset
example = train_ds[0]
print(json.dumps(example, indent=4))
print(len(train_ds))

{
    "id": "13828555",
    "dialogue": "John: Hey Dan, I'm gonna be late today.\r\nDaniel: Hey man, ugh no, don't leave me with this presentation alone!\r\nJohn: I know, I'm doing my best, but I'm stuck in traffic!\r\nJohn: There was an accident on the 401, ambulances & all...\r\nDaniel: OK, I'll try 2 buy us some time.\r\nJohn: I should make it in 30mins.",
    "summary": "John will be late for the presentation he has with Daniel. He should be there in 30 minutes."
}
6629


### 2. Preprocess the Dataset

Transform each question-and-answer pair into an instruction format for the model.

From
```
{
  "dialogue": "Amanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)",
  "summary": "Amanda baked cookies and will bring Jerry some tomorrow."
}
```
To
```
{
  "instruction": "Summarize the dialogue: Amanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)",
  "response": "Amanda baked cookies and will bring Jerry some tomorrow."
}
```

In [31]:
import json

def preprocess_function(sample):
    return {
        "instruction": f"Summarize the dialogue: {sample['dialogue']}",
        "response": sample["summary"],
    }

preprocessed_train_ds = train_ds.map(preprocess_function, remove_columns=["dialogue", "summary", "id"])
preprocessed_val_ds = val_ds.map(preprocess_function, remove_columns=["dialogue", "summary", "id"])
print(json.dumps(preprocessed_train_ds[0], indent=4))


Map:   0%|          | 0/6629 [00:00<?, ? examples/s]

Map:   0%|          | 0/737 [00:00<?, ? examples/s]

{
    "instruction": "Summarize the dialogue: John: Hey Dan, I'm gonna be late today.\r\nDaniel: Hey man, ugh no, don't leave me with this presentation alone!\r\nJohn: I know, I'm doing my best, but I'm stuck in traffic!\r\nJohn: There was an accident on the 401, ambulances & all...\r\nDaniel: OK, I'll try 2 buy us some time.\r\nJohn: I should make it in 30mins.",
    "response": "John will be late for the presentation he has with Daniel. He should be there in 30 minutes."
}


### 3. Load the base model

In [32]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_checkpoint = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
base_model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint).to(device)

Lets check the output of the base model before finetuning.

In [33]:
# Define the prompt
prompt = """Summarize the dialogue: Alice: Did you hear that the company is restructuring again?
Bob: Yes, I'm really worried about our job security this time.
Alice: They haven't provided any clear details, which is making everyone anxious.
Bob: I feel like they’re hiding something. It’s so frustrating not knowing what’s really going on.
Alice: I agree. I just hope this time they’re more transparent."""

inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = base_model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Bob and Alice are worried about their job security. Bob is worried about their job security.


### 4. Load the loRA model using get_peft_model()

Configure LoRA and wrap the base model. Look at the number of trainable parameters!

In [34]:
from peft import LoraConfig, TaskType, get_peft_model
lora_config = LoraConfig(
    r=16,  # LoRA rank
    lora_alpha=32,  # Scaling factor
    target_modules=["q", "v"],  # Apply LoRA to query and value projection layers
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM,
)

lora_model = get_peft_model(base_model, lora_config).to(device)
lora_model.print_trainable_parameters()  # Check that only LoRA parameters are trainable

trainable params: 688,128 || all params: 77,649,280 || trainable%: 0.8862


Tokenize the preprocessed dataset

In [35]:
def tokenize_fn(example):
    # Tokenize the input
    model_inputs = tokenizer(example["instruction"], max_length=512, truncation=True)

    # Tokenize the output (target)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(example["response"], max_length=128, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_train_ds = preprocessed_train_ds.map(tokenize_fn, batched=True, remove_columns=["instruction", "response"])
tokenized_val_ds = preprocessed_val_ds.map(tokenize_fn, batched=True, remove_columns=["instruction", "response"])

Map:   0%|          | 0/6629 [00:00<?, ? examples/s]



Map:   0%|          | 0/737 [00:00<?, ? examples/s]

Create the default data collator

In [36]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer, model=lora_model, padding=True)

In [11]:
sample = tokenized_train_ds[0]
print(tokenized_train_ds)
print("Input IDs:", sample["input_ids"])
print("Attention Mask:", sample.get("attention_mask"))
print("Labels:", sample["labels"])

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 6629
})
Input IDs: [12198, 1635, 1737, 8, 7478, 10, 2185, 10, 1521, 62, 1338, 8988, 58, 2737, 10, 2163, 6, 4306, 1458, 16, 8, 11818, 13, 451, 1795, 106, 389, 155, 9, 10, 8, 80, 416, 12, 21522, 26, 7220, 58, 5417, 10, 27, 317, 78, 233, 2737, 10, 4273, 6, 13, 503, 1]
Attention Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Labels: [2737, 6, 5417, 6, 389, 155, 9, 11, 2185, 33, 1338, 8988, 44, 4306, 1458, 16, 8, 11818, 13, 451, 1795, 106, 416, 12, 21522, 26, 7220, 5, 1]


In [37]:
import torch

# Convert the sample to tensors (simulate a batch by unsqueezing)
sample = tokenized_train_ds[0]
input_ids = torch.tensor(sample["input_ids"]).unsqueeze(0).to(device)
attention_mask = torch.tensor(sample["attention_mask"]).unsqueeze(0).to(device)
labels = torch.tensor(sample["labels"]).unsqueeze(0).to(device)

# Run the forward pass
outputs = lora_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss
print("Loss:", loss.item())

Loss: 1.4397177696228027


In [23]:
from transformers import DataCollatorForSeq2Seq

collator = DataCollatorForSeq2Seq(tokenizer, model=lora_model, padding=True)
batch = [tokenized_train_ds[i] for i in range(4)]  # sample 4 examples
collated_batch = collator(batch)

for key, tensor in collated_batch.items():
    print(f"{key}: shape {tensor.shape}")

input_ids: shape torch.Size([4, 237])
attention_mask: shape torch.Size([4, 237])
labels: shape torch.Size([4, 54])
decoder_input_ids: shape torch.Size([4, 54])


Define hyperparameters for the training and train the LoRA model

In [38]:
from transformers import Trainer, TrainingArguments

model_name = model_checkpoint.split("/")[-1]

training_args = TrainingArguments(
    f"{model_name}-finetuned-lora-samsum",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    learning_rate=2e-4,
    num_train_epochs=2, # you can also enhance the epoch
    logging_steps=300,
    fp16=False,
    evaluation_strategy="steps",
    save_strategy="epoch",
    push_to_hub=True,
    report_to=[]
)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_train_ds,
    eval_dataset=tokenized_val_ds,
    data_collator=data_collator,
)

train_results = trainer.train()




Step,Training Loss,Validation Loss
300,1.8875,1.596826
600,1.8602,1.595743
900,1.8258,1.5936
1200,1.8199,1.588121
1500,1.8425,1.586051
1800,1.8455,1.589347
2100,1.8691,1.589085
2400,1.8402,1.584915
2700,1.8443,1.581955
3000,1.8385,1.587717


Share your model

In [39]:
repo_name = f"pantageepapa/{model_name}-finetuned-lora-samsum"
lora_model.push_to_hub(repo_name)

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/pantageepapa/flan-t5-small-finetuned-lora-samsum/commit/d15618826e577d02a1495def2ef981ed096c79a1', commit_message='Upload model', commit_description='', oid='d15618826e577d02a1495def2ef981ed096c79a1', pr_url=None, repo_url=RepoUrl('https://huggingface.co/pantageepapa/flan-t5-small-finetuned-lora-samsum', endpoint='https://huggingface.co', repo_type='model', repo_id='pantageepapa/flan-t5-small-finetuned-lora-samsum'), pr_revision=None, pr_num=None)

Lets load the LoRA model along with our base model for inference.

In [40]:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForSeq2SeqLM

config = PeftConfig.from_pretrained(repo_name)
model = AutoModelForSeq2SeqLM.from_pretrained(
    config.base_model_name_or_path,
    ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
)
# Load the LoRA model
lora_model = PeftModel.from_pretrained(model, repo_name)

adapter_model.safetensors:   0%|          | 0.00/2.77M [00:00<?, ?B/s]

In [41]:
from transformers import pipeline

# Set the model to evaluation mode.
lora_model.eval()
generator = pipeline("text2text-generation", model=lora_model, tokenizer=tokenizer)

# Example prompt:
prompt = """Summarize the dialogue: Alice: Did you hear that the company is restructuring again?
Bob: Yes, I'm really worried about our job security this time.
Alice: They haven't provided any clear details, which is making everyone anxious.
Bob: I feel like they’re hiding something. It’s so frustrating not knowing what’s really going on.
Alice: I agree. I just hope this time they’re more transparent."""
result = generator(prompt, max_length=128, do_sample=True)[0]["generated_text"]
print("Generated Response:", result)

Device set to use cuda:0
The model 'PeftModelForSeq2SeqLM' is not supported for text2text-generation. Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'Qwen2AudioForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGen

Generated Response: Bob feels that the company is restructuring again. Alice felt that the company has not provided any clear details, which makes everyone anxious.
