# Sentiment Analysis on IMDb with Soft Prompting [PEFT + T5]

This project fine-tunes a T5-small model on the IMDb dataset using Soft Prompting from the [PEFT library](https://github.com/huggingface/peft).  
The goal is to perform sentiment classification [positive/negative] while training only a small set of soft prompt parameters, making it efficient and lightweight.


# Imports and Device Setup

In [None]:
!pip install --upgrade transformers

Collecting transformers
  Using cached transformers-4.56.0-py3-none-any.whl.metadata (40 kB)
Collecting tokenizers<=0.23.0,>=0.22.0 (from transformers)
  Using cached tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Using cached transformers-4.56.0-py3-none-any.whl (11.6 MB)
Using cached tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.21.4
    Uninstalling tokenizers-0.21.4:
      Successfully uninstalled tokenizers-0.21.4
  Attempting uninstall: transformers
    Found existing installation: transformers 4.55.4
    Uninstalling transformers-4.55.4:
      Successfully uninstalled transformers-4.55.4
Successfully installed tokenizers-0.22.0 transformers-4.56.0


In [None]:
%%capture
!pip install datasets peft

In [None]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.5


In [None]:
# General
import torch

# Dataset
import datasets
from datasets import load_dataset

# Transformers
from transformers import (
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    DataCollatorForSeq2Seq,
)

# PEFT
from peft import get_peft_model, PromptEncoderConfig

# Evaluation
import evaluate


In [None]:
#Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

Using device: cuda


# Load Pre-trained Model

In [None]:
#load the pre-trained t5 model
model = AutoModelForSeq2SeqLM.from_pretrained(
    "google-t5/t5-small",
    torch_dtype=torch.float16,
    device_map="auto"
    )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

# Load IMDb Dataset

In [None]:
# loading dataset
dataset = load_dataset("imdb", split=['train', 'test', 'unsupervised'])
dataset = datasets.DatasetDict({"train": dataset[0],"test": dataset[1], "unsupervised": dataset[2]})
dataset["train"], dataset["validation"] = dataset["train"].train_test_split(test_size=0.2, seed=42).values()

README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

plain_text/test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

plain_text/unsupervised-00000-of-00001.p(…):   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

# Tokenization

In [None]:
tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") #load the pre-trained tokenizer
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id #if padding token is not set, use eos_token_id as padding token
classes = ["positive", "negative", "neutral"] #class labels for sentiment analysis
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length) #number of tokens for any class label

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

2


In [None]:
def tokenize(examples):
    # add task prompt
    inputs = [f"review: {x} sentiment:" for x in examples["text"]]

    # natural language labels
    classes = ["negative", "positive"]
    targets = [classes[label] for label in examples["label"]]

    model_inputs = tokenizer(
        inputs,
        padding="max_length",
        truncation=True,
        max_length=512
    )

    labels = tokenizer(
        text_target=targets,
        truncation=True,
        max_length=2
    )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs




In [None]:
tokenized_ds = dataset.map(
    tokenize,
    batched=True,
    num_proc=4,
    remove_columns=dataset["train"].column_names,
)

Map (num_proc=4):   0%|          | 0/20000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/25000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/50000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/5000 [00:00<?, ? examples/s]

In [None]:
train_ds = tokenized_ds["train"] #train dataset
eval_ds = tokenized_ds["test"] #test dataset

#Prompt Encoder Setup

In [None]:
#configure prompt encoder
prompt_config = PromptEncoderConfig(task_type="SEQ_2_SEQ_LM", num_virtual_tokens=50, encoder_hidden_size=64)#,intially num_virtual_tokens=20,encoder_hidden_size=128
prompt_model = get_peft_model(model, prompt_config) #integrate model with prompt configuration
prompt_model.print_trainable_parameters() #print parameter statistics

trainable params: 121,472 || all params: 60,628,096 || trainable%: 0.2004


#Data Collator & Evaluation Metric

In [None]:
# Data collator and metric
data_collator = DataCollatorForSeq2Seq(tokenizer, model=None)
accuracy_metric = evaluate.load("accuracy")

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
def compute_metrics(eval_preds):
    preds, labels = eval_preds
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    pred_labels = [1 if pred.strip() == "positive" else 0 for pred in decoded_preds]
    true_labels = [1 if lab.strip() == "positive" else 0 for lab in decoded_labels]
    return accuracy_metric.compute(predictions=pred_labels, references=true_labels)


#Training Setup

In [None]:
training_args = Seq2SeqTrainingArguments(
        output_dir="./soft_prompt_imdb_results",
        #evaluation_strategy="steps", #since I am using an older version
        do_eval=True,
        save_strategy="steps",
        eval_steps=500,
        save_steps=500,
        logging_steps=500,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        num_train_epochs=3,
        learning_rate=5e-5,
        weight_decay=0.01,
        predict_with_generate=True,
        fp16=torch.cuda.is_available(),
    )
# Initialize Trainer
prompt_trainer = Seq2SeqTrainer(
    model=prompt_model, # Pass the PEFT model to the trainer
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

  prompt_trainer = Seq2SeqTrainer(


#Train the Model

In [None]:
prompt_trainer.train()

  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkhushipujara17[0m ([33mkhushipujara17-indian-institute-of-technology-kanpur[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
500,7.8688
1000,0.397
1500,0.2773
2000,0.2312
2500,0.2036
3000,0.1933
3500,0.1862
4000,0.1881
4500,0.1803
5000,0.1853


TrainOutput(global_step=7500, training_loss=0.7200985931396484, metrics={'train_runtime': 1385.2152, 'train_samples_per_second': 43.315, 'train_steps_per_second': 5.414, 'total_flos': 8120508088320000.0, 'train_loss': 0.7200985931396484, 'epoch': 3.0})

#Evaluation

In [None]:
results = prompt_trainer.evaluate()
print(results)

{'eval_loss': 0.15266896784305573, 'eval_accuracy': 0.87236, 'eval_runtime': 410.4523, 'eval_samples_per_second': 60.908, 'eval_steps_per_second': 7.614, 'epoch': 3.0}


In [None]:
sample = dataset["test"][0]
inputs = tokenizer(f"review: {sample['text']} sentiment:", return_tensors="pt").to("cuda")
outputs = prompt_model.generate(**inputs, max_new_tokens=3)
print("Prediction:", tokenizer.decode(outputs[0], skip_special_tokens=True))
print("Ground truth:", "positive" if sample['label']==1 else "negative")


Prediction: negative
Ground truth: negative


#Sample Predictions

In [None]:
samples = eval_ds.select(range(5))

inputs = {
    "input_ids": torch.tensor(samples["input_ids"]).to(prompt_model.device),
    "attention_mask": torch.tensor(samples["attention_mask"]).to(prompt_model.device),
}

with torch.no_grad():
    outputs = prompt_model.generate(**inputs, max_new_tokens=3)

decoded_preds = tokenizer.batch_decode(outputs, skip_special_tokens=True)
decoded_labels = tokenizer.batch_decode(samples["labels"], skip_special_tokens=True)

print("Predictions:", decoded_preds)
print("Labels:", decoded_labels)


Predictions: ['negative', 'negative', 'negative', 'negative', 'positive']
Labels: ['negative', 'negative', 'negative', 'negative', 'negative']


#Trainable Parameters

In [None]:
total = sum(p.numel() for p in prompt_model.parameters())
trainable = sum(p.numel() for p in prompt_model.parameters() if p.requires_grad)
pct = 100.0 * trainable / total if total > 0 else 0.0
print(f"Trainable: {trainable:,} / Total: {total:,} ({pct:.6f}%)")

Trainable: 121,472 / Total: 60,628,096 (0.200356%)


#Conclusion

The model achieves ~87% accuracy on IMDB sentiment classification while training only 0.2% of parameters.

Only ~121K parameters were updated, compared to >60M if full fine-tuning was used. The model achieves competitive results which makes prompt-tuning an excellent alternative to full fine-tuning in low-resource or deployment-constrained settings.