# 1️⃣ LoRA fine-tuning of Qwen2.5 (0.5B)

**Goal:** This notebook will guide you through fine-tuning of **Qwen2.5 (0.5B)** with **LoRA** (via `peft` library) on a small custom dataset. 

**Notes / prerequisites:**

- Runtime: choose **GPU** in Colab (preferably a T4). Free Colab GPUs have limited memory; the notebook uses 4-bit quantization and LoRA to fit.
- This notebook contains **TODO** cells where you will implement short pieces of code.


You can also open this example in Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ivanvykopal/peft-kinit-2025/blob/master/practice/01_peft_lora_autoregressive.ipynb)

## Installation & Imports

Run the cell below in Colab. It installs `transformers`, `accelerate`, `peft`, `bitsandbytes`, `datasets`, `safetensors`, and `evaluate`.

In [None]:
# NOTE: installing in Colab may take a few minutes
!pip install -q --upgrade pip
!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q accelerate bitsandbytes peft safetensors evaluate
!pip install -q datasets==3.6.0

# Optional helpful tools
!pip install -q transformers[torch] sentencepiece

In [None]:
# Check versions (useful for debugging)
import transformers, peft, accelerate, bitsandbytes, datasets, torch
print("transformers", transformers.__version__)
print("peft", peft.__version__)
print("accelerate", accelerate.__version__)
print("bitsandbytes", bitsandbytes.__version__)
print("torch", torch.__version__)
print("datasets", datasets.__version__)

## Model selection and memory strategy

We'll attempt to load **Qwen2.5 (0.5B)** in 4-bit mode to fit into a free Colab GPU. 

**Important:** Loading in `load_in_4bit` mode requires `bitsandbytes` and may require additional CPU RAM during init.

In [None]:
# 3) Load tokenizer and model in 4-bit + prepare for LoRA
model_name = "Qwen/Qwen2.5-0.5B"

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# ensure pad token exists
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Model: load in 4-bit to fit in Colab GPU memory
# NOTE: this will try to use bitsandbytes 4-bit quantization. If you run into issues, try loading without 4-bit
from transformers import BitsAndBytesConfig

# bnb_config = BitsAndBytesConfig(
#     load_in_4bit=True, # This enables 4-bit quantization
#     bnb_4bit_use_double_quant=True,
#     bnb_4bit_quant_type="nf4", # "nf4" is a common choice for 4-bit quantization
#     bnb_4bit_compute_dtype=torch.float16
# )

print("Loading model (this may take a minute)...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    # quantization_config=bnb_config, # Here we specify the 4-bit quantization config
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

model.config.use_cache = False  # required for training with some transformers versions
print("Model loaded. Parameters:", sum(p.numel() for p in model.parameters()) )


## Task and dataset

The `twitter_complaints` dataset contains real tweets from customers directed at various companies or services. Each tweet is annotated with a label indicating whether it contains a **complaint** or not.

### Why this dataset?
- It is realistic and practical — customer complaints are a common example of short-text classification problems in NLP.
- It provides a supervised learning setup where, given a **prompt** (the tweet text), the model learns to predict the **label** (complaint or non-complaint).

### Task definition
Our goal is to train the model to **read a tweet and classify whether it expresses a complaint**.  
We frame this as a **generative task** where:
- The model receives the tweet text as the input prompt.
- The model generates the corresponding label text (e.g., “complaint” or “non-complaint”) as the output.

In [None]:
from datasets import load_dataset
dataset = load_dataset("ought/raft", "twitter_complaints")
print(dataset["train"][0])

### Tokenization / preprocessing

Participants should implement the tokenization logic below. The pattern used here is **instruction-style**: we concatenate the `input` into a prompt and use the `output` as the target continuation.

**TODO**: fill `tokenize_fn` which returns `input_ids`, `attention_mask`, and `labels` (labels should be -100 for prompt tokens to avoid computing loss on them, only compute loss on target tokens).

In [None]:
#@title Hint 1 { display-mode: "form" }

"""
Remember to tokenize the prompts and labels separately using the tokenizer.
Use `tokenizer(text, max_length=max_length)` on both inputs and targets.
Make sure to get the 'input_ids' from the tokenizer output.
"""

In [None]:
#@title Hint 2 { display-mode: "form" }

"""
For each example, concatenate the prompt token IDs and label token IDs into one list.
Add the tokenizer’s `eos_token_id` at the end of the label tokens.
This combined list will be your final input_ids.
"""

In [None]:
#@title Hint 3 { display-mode: "form" }

"""
When preparing the labels for loss computation:
- Mask the prompt token positions with -100 to avoid computing loss on them.
- Keep the label token IDs intact (including the eos token).
- Pad input_ids, attention_mask, and labels to max_length with tokenizer.pad_token_id and -100 accordingly.
"""

In [None]:
# TODO: implement the tokenize_fn for instruction-style examples
max_length = 256

# First, we will identify which classes are presents in the datasets as the labels.
label_names = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
print("Label names:", label_names)

# Next, we will map the dataset to convert the labels from integers to their corresponding string names.
dataset = dataset.map(
    lambda x: {"text_label": [label_names[label] for label in x["Label"]]},
    batched=True,
    num_proc=1,
)

# Now we will format the examples to create a prompt for each Twitter post.
def format_example(example):
    prompt = (
        "Your task is to classity Twitter posts into categories.\n"
        f"Possible categories: {', '.join(label_names)}\n\n"
        f"Twitter post: {example['Tweet text']}\n"
        "Label:"
    )
    label_text = example["text_label"]
    return {"prompt": prompt, "label_text": label_text}

dataset = dataset.map(format_example)

def tokenize_fn(examples):
    """
    Tokenizes the input examples for the model.
    Converts the text to input IDs and attention masks.
    """
    ## --- TODO 1: YOUR CODE HERE ---
    
    
    
    
    # --- END YOUR CODE --- #
    
    input_ids_list = []
    attention_mask_list = []
    labels_list = []
    
    for prompt_ids, label_ids in zip(model_inputs["input_ids"], labels["input_ids"]):
        ## --- TODO 2: YOUR CODE HERE ---
        
        
        # --- END YOUR CODE --- #
        
        padding_length = max_length - len(input_ids)
        if padding_length > 0:
            input_ids += [tokenizer.pad_token_id] * padding_length
            attention_mask += [0] * padding_length
            labels += [-100] * padding_length
        else:
            input_ids = input_ids[:max_length]
            attention_mask = attention_mask[:max_length]
            labels = labels[:max_length]
            
        input_ids_list.append(input_ids)
        attention_mask_list.append(attention_mask)
        labels_list.append(labels)
    
    return {
        'input_ids': input_ids_list,
        'attention_mask': attention_mask_list,
        'labels': labels_list
    }

# Run mapping
tokenized_datasets = dataset.map(
    tokenize_fn,
    batched=True,
    remove_columns=dataset["train"].column_names # We need to remove original columns to avoid issues
)
tokenized_datasets.set_format("torch")
# create eval split from train split and train_test_split
train_split = tokenized_datasets["train"].train_test_split(test_size=0.1, seed=42)

tokenized_datasets["train"] = train_split["train"]
tokenized_datasets["validation"] = train_split["test"]

print("Tokenized dataset example:", tokenized_datasets["train"][0])

## LoRA configuration

We'll wrap the model with `peft.get_peft_model`. PKeep `r` and `lora_alpha` small for limited GPU.

**TODO**: fill `r`, `lora_alpha`, and `lora_dropout` with reasonable small values. Define alsoa the LoraConfig.

In [None]:
#@title Hint 1 { display-mode: "form" }

"""
- `r` controls the rank of the low-rank adaptation matrices; typical values are 4 or 8.
- `lora_alpha` scales the update and is often set to 16.
- Larger `r` can improve accuracy but increases trainable parameters and compute.
- Choose values that balance fine-tuning quality and resource constraints.
"""

In [None]:
#@title Hint 2 { display-mode: "form" }

"""
- Target modules like 'q_proj', 'v_proj', 'k_proj', and 'o_proj' correspond to attention projection layers, common targets for LoRA.
- `lora_dropout` adds regularization; typical values are between 0.0 and 0.1.
- Bias is often set to 'none' for causal language modeling.
- The task type should be 'CAUSAL_LM' when fine-tuning autoregressive models.
"""

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Prepare the model for k-bit training (this updates some layers to avoid weight updating issues)
model = prepare_model_for_kbit_training(model)

## --- TODO 3: YOUR CODE HERE ---

# --- END YOUR CODE --- #

model = get_peft_model(model, lora_config)

In [None]:
#@title Hint 1 { display-mode: "form" }

"""
- You can use the predefined function to check the number of trainable parameters.
- Look at the `model.print_trainable_parameters()` method to see how many parameters are trainable.
"""

In [None]:
## --- TODO 4: YOUR CODE HERE ---

# --- END YOUR CODE --- #

## Training

We'll use `transformers.Trainer` for simplicity. Settings are pre-defined so this can run in free Colab. You can later switch to `accelerate` + `transformers` native training for more control.

**Note:** Early stopping and checkpointing are minimal here. The goal is to demonstrate training with LoRA.

In [None]:
from transformers import TrainingArguments, Trainer

# Data collator - for causal LM it will just pass through tensors
def data_collator(batch):
    import torch
    input_ids = torch.nn.utils.rnn.pad_sequence([b['input_ids'] for b in batch], batch_first=True, padding_value=tokenizer.pad_token_id)
    attention_mask = (input_ids != tokenizer.pad_token_id).long()
    labels = torch.nn.utils.rnn.pad_sequence([b['labels'] for b in batch], batch_first=True, padding_value=-100)
    return {"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels}

training_args = TrainingArguments(
    output_dir="./llama3_lora_results",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    evaluation_strategy="no",
    save_strategy="no",
    report_to="none",
    remove_unused_columns=False
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    data_collator=data_collator,
    tokenizer=tokenizer
)

# Small quick test: run training (this will actually fine-tune the LoRA adapters)
trainer.train()


## Inference / generation

After fine-tuning, use the model to generate descriptions. Participants should implement the prompt and generation settings.

**TODO**: experiment with generation parameters (`max_new_tokens`, `temperature`, `top_k`, `top_p`).

In [None]:
#@title Hint 1 { display-mode: "form" }

"""
- Tokenize your prompt with `return_tensors="pt"` and move inputs to the model's device.
- Use `torch.no_grad()` during generation to avoid storing gradients.
- Use `model.generate()` to generate text based on the input prompt.
- Finally, decode the output tokens to text using `tokenizer.decode()` with `skip_special_tokens=True`.
"""

In [None]:
model.eval()  # Set the model to evaluation mode

# Example: generate from a new input record
def generate_from_record(record_text, max_new_tokens=60, temperature=0.7):
    prompt = (
        "Your task is to classity Twitter posts into categories.\n"
        f"Possible categories: {', '.join(label_names)}\n\n"
        f"Twitter post: {record_text}\n"
        "Label:"
    )
    ## --- TODO 5: YOUR CODE HERE ---

    return ...
    # --- END YOUR CODE --- #

# Test the generation function with a sample record
sample_record = dataset["test"][0]["Tweet text"]
print("Sample record:", sample_record)
print("Generated output:", generate_from_record(sample_record, max_new_tokens=60, temperature=0.7))

## Save and load LoRA adapters only

To avoid saving the full base model, we save only the LoRA adapter weights (much smaller).

In [None]:
# Save adapter weights (only the PEFT/LoRA weights)
adapter_save_path = "./lora_adapter"
model.save_pretrained(adapter_save_path)
print("Adapter saved to", adapter_save_path)

In [None]:
from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained(adapter_save_path)

base = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
peft_model = PeftModel.from_pretrained(base, adapter_save_path)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval

for i in range(10):
    print(f"Example {i+1}:")
    sample_record = dataset["test"][i]["Tweet text"]
    generated_text = generate_from_record(sample_record, max_new_tokens=60)
    print(generated_text)
    print("-" * 50)  # Separator for clarity
