# 2️⃣ Prompt-Tuning with Autoregressive Model

In this notebook, we will explore **Parameter-Efficient Fine-Tuning (PEFT)** techniques, focusing specifically on [**Prompt Tuning**](https://aclanthology.org/2021.emnlp-main.243/). 

---

## What you'll learn:
- The concept and mechanism of **Prompt Tuning** in LLMs:
  - How "soft prompts" are optimized instead of the model weights.
  - Why this additive approach preserves the base model and enables lightweight fine-tuning.
- A comparison between full fine-tuning and prompt tuning in terms of compute, flexibility, and generalization.
- Hands-on implementation of prompt tuning using Hugging Face's PEFT framework.

---

###  Prompt Tuning vs Fine-Tuning

<!-- <img src="../../images/prompt-tuning.png" width="600"> -->
<img src="https://raw.githubusercontent.com/ivanvykopal/peft-kinit-2025/heads/master/images/prompt-tuning.png" alt="Fine-tuning vs. Prompt Tuning" width="600"/>


The diagram illustrates that in **fine-tuning**, the model weights are adjusted for each task, often requiring full retraining and higher computational cost. In **prompt tuning**, only a small set of task-specific embeddings (the soft prompts) are learned, while the pre-trained model remains frozen. This lightweight adaptation enables multiple tasks to share the same base model with just different prompts. 

It is very similar to prompting, where we manually create the prompts and where we manually try to adapt the prompts to obtain the best results.

---

> **Key insights**:
> - Prompt tuning adds **trainable soft prompt embeddings** to the input, leaving the model’s weights untouched.
> - It's an additive PEFT technique that significantly lowers training and storage costs, and doesn't cause catastrophic forgetting.

You can also open this example in Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ivanvykopal/peft-kinit-2025/blob/master/examples/peft/02_prompt_tuning.ipynb)


## Step 1: Install and Import Dependencies

We begin by installing required libraries and importing necessary modules. In this case, we need to install transformers, datasets and peft libraries. [PEFT](https://huggingface.co/docs/peft/en/index) library cotnains the functionality to add soft prompts to the LLM and then allow us to train only those added soft prompts.

We will also set the constants and parameters for the training.

In [None]:
%pip install -q --user transformers[torch]==4.36.0
%pip install -q --user datasets
%pip install -q --user peft

In [None]:
from transformers import AutoModelForCausalLM
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType
import torch
from datasets import load_dataset
import os
from transformers import AutoTokenizer
from torch.utils.data import DataLoader
from transformers import default_data_collator, get_linear_schedule_with_warmup
from tqdm import tqdm
from datasets import load_dataset

device = "cuda"
dataset_name = "twitter_complaints"

text_column = "Tweet text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50
batch_size = 8

In this example, we will use [**BLOOMZ**](https://aclanthology.org/2023.acl-long.891/) model, which is autoregressing model supporting around 46 languages, which was trained to follow the instructions. It was one of the first models, where the model was trained on human instructions, while the model was multilingual.

In [None]:
model_name_or_path = "bigscience/bloomz-560m"
tokenizer_name_or_path = "bigscience/bloomz-560m"

## Step 2: Load Pretrained Model and Tokenizer

We load a pretrained language model and tokenizer from Hugging Face for prompt tuning.

In [None]:
from datasets import load_dataset

dataset = load_dataset("ought/raft", dataset_name)

classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
print(classes)
dataset = dataset.map(
    lambda x: {"text_label": [classes[label] for label in x["Label"]]},
    batched=True,
    num_proc=1,
)
print(dataset)
dataset["train"][0]

## Step 3: Prepare Dataset

Here, we load and preprocess the dataset for training and evaluation. We need to convert the texts and labels from the dataset into tokenized inputs, especially by defining input_ids and labels, which we will use later to train and evaluate the trained soft prompt.

In [None]:
# data preprocessing
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)


def preprocess_function(examples):
    batch_size = len(examples[text_column])
    inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
    targets = [str(x) for x in examples[label_column]]
    model_inputs = tokenizer(inputs)
    labels = tokenizer(targets, add_special_tokens=False)  # don't add bos token because we concatenate with inputs
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i] + [tokenizer.eos_token_id]
        # print(i, sample_input_ids, label_input_ids)
        model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
        labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
        model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])
    # print(model_inputs)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            max_length - len(sample_input_ids)
        ) + sample_input_ids
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
            "attention_mask"
        ][i]
        labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
        labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["train"]


train_dataloader = DataLoader(
    train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)

In [None]:
def test_preprocess_function(examples):
    batch_size = len(examples[text_column])
    inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
    model_inputs = tokenizer(inputs)
    # print(model_inputs)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            max_length - len(sample_input_ids)
        ) + sample_input_ids
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
            "attention_mask"
        ][i]
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
    return model_inputs


test_dataset = dataset["test"].map(
    test_preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

test_dataloader = DataLoader(test_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
next(iter(test_dataloader))

In [None]:
next(iter(train_dataloader))

In [None]:
len(test_dataloader)

In [None]:
next(iter(test_dataloader))

## Step 4: Configure PEFT for Prompt Tuning

We configure the PEFT parameters, specifying how many prompt tokens to learn and which model layers remain frozen.

Here, we defined the PrompTuning config, which describe how the soft prompt that would be added to the model should look like. In our example, we need to define the **task_type**, with CAUSAL_LM, since we are using autoregressive model.

Next, we want to initialize the soft prompt with the given text and not using values from normal distribution. Since the task is classification, we will use specific text to _"Classify if the tweet is complain or not:"_. 

Lastly, we will set the number of prepended tokens to 8, but you can also choose bigger number, but it will increase the GPU memory necessary to traind the soft prompt.

In [None]:
peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
    tokenizer_name_or_path=model_name_or_path,
)

checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
    "/", "_"
)

## Step 5: Initialize Prompt Tuning Model

We create a PEFT model by attaching learnable soft prompts to the pretrained model.

In [None]:

# creating model
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

## Step 6: Train the Prompt-Tuned Model

Firstly, we define the optimizer and scheduler that we want to use during the training. As usually, we will use Weighted Adam optimizer.

The model is trained while updating **only** the prompt embeddings, keeping the base model frozen.

In [None]:
# model
# optimizer and lr scheduler
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

Instead of using _TrainingArguments_ and _Trainer_ from the **transformers** library, we will use our custom loop, to show you, how the basic training loop looks like.

### **Training Phase**

Firstly, we need to move the model to GPU to use efficient training on GPU instead of CPU.


Then, for each epoch, we will do the following:
1. **Set the model to training mode** with `model.train()` to enable gradient computation and dropout.
2. **Iterate over training batches** from the `train_dataloader`.
3. Move each batch of inputs and labels to the correct computation `device` (CPU/GPU).
4. Perform a **forward pass**:  
   ```python
   outputs = model(**batch)
   loss = outputs.loss
   ```
   The model computes predictions and returns the loss based on the given labels.
5. **Accuulate loss** for monitoring purposes.
6. **Backward pass**:
   ```python
   loss.backward()
   ```
   This computes gradients of the loss with respect to the model parameters (or soft prompts, if prompt-tuning is used).
7. **Update parameters**:
   ```python
   optimizer.step()
   lr_scheduler.step()
   optimizer.zero_grad()
   ```
   The optimizer updates trainable parameters (e.g., only soft prompts in PEFT), and the learning rate scheduler adjusts the learning rate dynamically.

In [None]:
# training and evaluation
model = model.to(device)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for step, batch in enumerate(tqdm(train_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        
        outputs = model(**batch)
        loss = outputs.loss
        
        total_loss += loss.detach().float()
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    model.eval()
    eval_loss = 0
    eval_preds = []
    for step, batch in enumerate(tqdm(eval_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)
        loss = outputs.loss
        eval_loss += loss.detach().float()
        eval_preds.extend(
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
        )

    eval_epoch_loss = eval_loss / len(eval_dataloader)
    eval_ppl = torch.exp(eval_epoch_loss)
    train_epoch_loss = total_loss / len(train_dataloader)
    train_ppl = torch.exp(train_epoch_loss)
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")

## Step 7: Evaluate the Model

Finally, we evaluate the prompt-tuned model on the validation set and analyze its performance.

In [None]:
model.eval()
i = 33
inputs = tokenizer(f'{text_column} : {dataset["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(dataset["test"][i]["Tweet text"])
print(inputs)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
    )
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

## Step 8: Save and Load Prompt-Tuned Adapters

We demonstrate how to save the learned prompt embeddings and reload them for inference.

In [None]:
# saving model
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
    "/", "_"
)
model.save_pretrained(peft_model_id)

In [None]:
ckpt = f"{peft_model_id}/adapter_model.bin"
!du -h $ckpt

In [None]:
from peft import PeftModel, PeftConfig
# loading the model saved  locally
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
    "/", "_"
)

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)

In [None]:
model.to(device)
model.eval()
i = 4
inputs = tokenizer(f'{text_column} : {dataset["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(dataset["test"][i]["Tweet text"])
print(inputs)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
    )
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

## References

This tutorial is inspired by the code prepared within [PEFT](https://github.com/huggingface/peft) github repository. In particular, the implementation is based on the following example: [**peft_prompt_tuning_clm.ipynb**](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb).


**Citattions:**

[1] Muennighoff et al. (2023). [Crosslingual Generalization through Multitask Finetuning](https://aclanthology.org/2023.acl-long.891/) <br/>
[2] Lester et al. (2021). [The Power of Scale: Parameter-Efficient Prompt Tuning](https://aclanthology.org/2021.emnlp-main.243/) <br/>
[3] [Hugging Face PEFT Documentation](https://huggingface.co/docs/peft/index) <br/>
[4] [Prompt Tuning with PEFT - Hugging Face Tutorial](https://huggingface.co/learn/cookbook/en/prompt_tuning_peft) <br/>
[5] [Fine-Tuning vs Prompt Tuning Explained](https://medium.com/@himanshu_72022/difference-between-fine-tuning-and-prompt-tuning-9f06e5d7ae11) <br/>
