# Prompt Tuning with SmolLM: A Beginner's Guide

This notebook demonstrates how to set up and use prompt tuning with a SmolLM model using the PEFT library. Prompt tuning is a parameter-efficient method that adds trainable "soft prompts" while keeping the base model frozen.

## What You'll Learn
- How to load a SmolLM model
- Configure prompt tuning with PEFT
- Train soft prompts for sentiment classification
- Compare performance with and without prompt tuning

## 1. Install Dependencies

First, let's install the required packages:

In [25]:
!pip install transformers peft torch datasets accelerate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




## 2. Import Libraries

In [26]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import PromptTuningConfig, TaskType, get_peft_model
from datasets import Dataset
import numpy as np

## 3. Load SmolLM Model and Tokenizer

We'll use SmolLM2-135M, a small but capable language model:

In [27]:
# Load the base model and tokenizer
model_name = "HuggingFaceTB/SmolLM2-135M"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print(f"Model loaded: {model_name}")
print(f"Model parameters: {model.num_parameters():,}")

Model loaded: HuggingFaceTB/SmolLM2-135M
Model parameters: 134,515,008


## 4. Test Base Model (Before Prompt Tuning)

Let's see how the model performs on sentiment classification without any training:

In [29]:
def test_model(model, tokenizer, text, max_length=50):
    """Test the model with a given input text"""
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    # Move inputs to the same device as the model
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            do_sample=True,
            temperature=0.7,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Make sure your model is on CPU to avoid MPS issues
model = model.to('cpu')

# Test with sentiment classification prompt
test_prompt = "Classify the sentiment: 'This movie is amazing!' Sentiment:"
print("Base model response:")
print(test_model(model, tokenizer, test_prompt))

Base model response:
Classify the sentiment: 'This movie is amazing!' Sentiment: 'This movie is great!'

[15] The only one who did that was C.J. Walker, who said: 'I am a film critic. I


## 5. Configure Prompt Tuning

Now let's set up prompt tuning configuration:

### Parameter Breakdown:

- **`task_type=TaskType.CAUSAL_LM`**: Specifies causal language modeling (next token prediction)
- **`num_virtual_tokens=16`**: Number of trainable soft prompt vectors (typically 8-32)
- **`prompt_tuning_init="TEXT"`**: Initialize from text embeddings (vs "RANDOM")
- **`prompt_tuning_init_text`**: Text that gets converted to initial soft prompt embeddings
- **`tokenizer_name_or_path`**: Tokenizer for encoding the initialization text

The init text gets tokenized and embedded to create starting values for the 16 trainable vectors.

In [30]:
# Configure prompt tuning
peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    num_virtual_tokens=16,  # Number of soft prompt tokens
    prompt_tuning_init="TEXT",  # Initialize from text
    prompt_tuning_init_text="Classify the sentiment as positive, negative, or neutral:",
    tokenizer_name_or_path=model_name,
)

print("Prompt tuning configuration:")
print(f"Virtual tokens: {peft_config.num_virtual_tokens}")
print(f"Initialization: {peft_config.prompt_tuning_init}")
print(f"Init text: {peft_config.prompt_tuning_init_text}")

Prompt tuning configuration:
Virtual tokens: 16
Initialization: TEXT
Init text: Classify the sentiment as positive, negative, or neutral:


## 6. Create Prompt-Tunable Model

Apply the prompt tuning configuration to our model:

### What `get_peft_model()` Does:

`get_peft_model()` is a function from the PEFT (Parameter-Efficient Fine-Tuning) library that wraps your base model with parameter-efficient fine-tuning capabilities. Here's what it does:

1. **Wraps the base model**: Takes your original model and adds a PEFT adapter layer on top of it
2. **Freezes base parameters**: Keeps the original model weights frozen (non-trainable)
3. **Adds trainable parameters**: Introduces a small set of new trainable parameters based on your `peft_config`
4. **Creates efficient training**: Enables fine-tuning with dramatically fewer parameters

### In Prompt Tuning Specifically:

When you use prompt tuning configuration, `get_peft_model()`:
- Adds learnable "soft prompts" (continuous embeddings) to your model
- These soft prompts are prepended to your input embeddings
- Only these prompt embeddings are trainable - the rest of the model stays frozen

### The Dramatic Parameter Reduction:

```python
# Before PEFT: entire model parameters (millions/billions)
# After PEFT: only prompt tokens × embedding dimension

# Example: If you have 16 prompt tokens and 576 embedding dimension
# Trainable parameters = 16 × 576 = 9,216 parameters
# vs. original model with ~134M parameters
```

### Benefits:

- **Memory efficient**: Only stores gradients for a tiny fraction of parameters
- **Fast training**: Much quicker to train than full fine-tuning
- **Storage efficient**: Only need to save the small adapter weights
- **Modular**: Can swap different adapters for different tasks

The `print_trainable_parameters()` method shows you exactly how many parameters are trainable vs. frozen, demonstrating the efficiency gains of this approach.

In [31]:
# Create the prompt-tunable model
model = get_peft_model(model, peft_config)

# Print model info
model.print_trainable_parameters()

trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print("\nPrompt tuning model created!")
print(f"Only {trainable_params:,} parameters will be trained")

trainable params: 9,216 || all params: 134,524,224 || trainable%: 0.0069

Prompt tuning model created!
Only 9,216 parameters will be trained


## 7. Prepare Training Data

Create a simple sentiment classification dataset:

In [32]:
# Create training data
train_data = [
    {"text": "I love this movie! It's fantastic.", "label": "positive"},
    {"text": "This film is terrible and boring.", "label": "negative"},
    {"text": "The movie was okay, nothing special.", "label": "neutral"},
    {"text": "Amazing acting and great story!", "label": "positive"},
    {"text": "Worst movie I've ever seen.", "label": "negative"},
    {"text": "It's an average film.", "label": "neutral"},
    {"text": "Brilliant cinematography and direction!", "label": "positive"},
    {"text": "Complete waste of time.", "label": "negative"},
    {"text": "The movie is decent.", "label": "neutral"},
    {"text": "Absolutely loved every minute!", "label": "positive"},
]

def format_prompt(example):
    """Format the training examples"""
    return f"Text: {example['text']} Sentiment: {example['label']}"

# Format the data
formatted_data = [format_prompt(ex) for ex in train_data]

print("Training examples:")
for i, example in enumerate(formatted_data[:3]):
    print(f"{i+1}. {example}")

Training examples:
1. Text: I love this movie! It's fantastic. Sentiment: positive
2. Text: This film is terrible and boring. Sentiment: negative
3. Text: The movie was okay, nothing special. Sentiment: neutral


## 8. Tokenize Data

In [33]:
def tokenize_function(examples):
    """Tokenize the training data"""
    # Ensure we're working with a list of strings
    texts = examples["text"] if isinstance(examples["text"], list) else [examples["text"]]
    
    tokenized = tokenizer(
        texts,
        truncation=True,
        padding=True,  # Changed from "max_length" to True
        max_length=128,
        return_tensors="pt"
    )
    
    # Convert to lists for dataset compatibility
    return {
        "input_ids": tokenized["input_ids"].tolist(),
        "attention_mask": tokenized["attention_mask"].tolist()
    }

# Create dataset
dataset = Dataset.from_dict({"text": formatted_data})
tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# Add labels (for language modeling, labels = input_ids)
def add_labels(example):
    example["labels"] = example["input_ids"].copy()
    return example

tokenized_dataset = tokenized_dataset.map(add_labels)

print(f"Dataset created with {len(tokenized_dataset)} examples")

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Dataset created with 10 examples


## 9. Set Up Training

Configure the training arguments and trainer:

In [34]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./prompt_tuning_results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    learning_rate=1e-3,  # Higher learning rate for prompt tuning
    logging_steps=5,
    save_strategy="no",
    remove_unused_columns=False,
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
)

print("Trainer configured successfully!")

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Trainer configured successfully!


## 10. Train the Model

Now let's train the soft prompts:

In [35]:
print("Starting prompt tuning training...")
trainer.train()
print("Training completed!")

Starting prompt tuning training...


Step,Training Loss
5,5.3779
10,4.9994
15,4.8342


Training completed!


## 11. Test the Trained Model

Let's see how the model performs after prompt tuning:

In [36]:
# Test the trained model
test_examples = [
    "Text: This movie is incredible! Sentiment:",
    "Text: I hated this film. Sentiment:",
    "Text: The movie was fine. Sentiment:",
]

print("Testing prompt-tuned model:")
print("=" * 50)

for i, example in enumerate(test_examples):
    response = test_model(model, tokenizer, example, max_length=len(example.split()) + 5)
    print(f"Test {i+1}:")
    print(f"Input: {example}")
    print(f"Output: {response}")
    print("-" * 30)

Testing prompt-tuned model:
Test 1:
Input: Text: This movie is incredible! Sentiment:
Output: Text: This movie is incredible! Sentiment: +
------------------------------
Test 2:
Input: Text: I hated this film. Sentiment:
Output: Text: I hated this film. Sentiment: +
------------------------------
Test 3:
Input: Text: The movie was fine. Sentiment:
Output: Text: The movie was fine. Sentiment:Positive
------------------------------


## 12. Save the Prompt-Tuned Model

Save only the prompt parameters (very small file!):

In [39]:
# Save the prompt tuning adapter
model.save_pretrained("./smol_prompt_tuned")

print("Prompt tuning adapter saved!")
print("Only the soft prompt parameters are saved, not the entire model.")

# Check file size - try both possible formats
import os

# Check what files were actually saved
saved_files = os.listdir("./smol_prompt_tuned")
print(f"Files saved: {saved_files}")

# Try to find the adapter file
adapter_file = None
possible_names = ["adapter_model.safetensors", "adapter_model.bin", "pytorch_model.bin"]

for filename in possible_names:
    filepath = f"./smol_prompt_tuned/{filename}"
    if os.path.exists(filepath):
        adapter_file = filepath
        break

if adapter_file:
    adapter_size = os.path.getsize(adapter_file)
    print(f"Adapter file: {os.path.basename(adapter_file)}")
    print(f"Adapter file size: {adapter_size / 1024:.2f} KB")
else:
    print("Could not find adapter file")

Prompt tuning adapter saved!
Only the soft prompt parameters are saved, not the entire model.
Files saved: ['adapter_model.safetensors', 'README.md', 'adapter_config.json']
Adapter file: adapter_model.safetensors
Adapter file size: 36.12 KB


## 13. Load and Use Saved Model

Demonstrate how to load the prompt-tuned model:

In [40]:
from peft import PeftModel

# Load base model again
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the prompt tuning adapter
loaded_model = PeftModel.from_pretrained(base_model, "./smol_prompt_tuned")

print("Model loaded successfully!")

# Test the loaded model
test_text = "Text: This is an amazing experience! Sentiment:"
result = test_model(loaded_model, tokenizer, test_text)
print(f"\nTest with loaded model:")
print(f"Input: {test_text}")
print(f"Output: {result}")

Model loaded successfully!

Test with loaded model:
Input: Text: This is an amazing experience! Sentiment:
Output: Text: This is an amazing experience! Sentiment: Happy. For example, you might define a sentiment as a type of emotion. It may be negative, neutral, or positive. When we say that a sentiment is “positive,” we mean it


## 14. Summary

### What We Accomplished:

1. **Loaded SmolLM2-135M** - A small but capable language model
2. **Configured Prompt Tuning** - Set up 16 virtual tokens initialized from text
3. **Trained Soft Prompts** - Only trained ~8K parameters instead of 135M
4. **Achieved Task Adaptation** - Model learned sentiment classification
5. **Saved Efficiently** - Adapter file is only a few KB

### Key Benefits of Prompt Tuning:

- **Parameter Efficient**: Only 0.006% of model parameters trained
- **Fast Training**: Quick convergence with small dataset
- **Easy Deployment**: Tiny adapter files for different tasks
- **Task Switching**: Can quickly switch between different prompt-tuned tasks

### Next Steps:

- Try different initialization strategies (random vs text)
- Experiment with different numbers of virtual tokens
- Test on more complex tasks
- Compare with LoRA and full fine-tuning approaches