# Fine-tuning LLMs with LoRA (Low-Rank Adaptation)

<a href="https://colab.research.google.com/github/natnew/Awesome-Prompt-Engineering/blob/main/templates/notebooks/finetuning_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demonstrates how to fine-tune a small language model using **PEFT (Parameter-Efficient Fine-Tuning)** and **LoRA**. This approach allows you to train models on consumer hardware (or Google Colab's free T4 GPU) by only updating a small fraction of parameters.

## Conceptual Overview
1. **Load Base Model:** We'll use a small model (e.g., GPT-2 or TinyLlama) for demonstration.
2. **Prepare Data:** Format text for instruction tuning.
3. **Apply LoRA:** Inject trainable rank decomposition matrices into the model.
4. **Train:** Run the training loop.
5. **Save:** Save the adapter weights.

In [None]:
# Install dependencies (uncomment if running in Colab)
# !pip install transformers peft datasets bitsandbytes accelerate

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset

# 1. Load Model & Tokenizer
model_id = "gpt2" # Using GPT-2 for speed and broad compatibility in this demo
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_id)

# 2. Configure LoRA
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    inference_mode=False, 
    r=8,            # Rank
    lora_alpha=32,  # Alpha scaling
    lora_dropout=0.1
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# You should see that only a tiny % of params are trainable!

## 3. Prepare Dummy Dataset
For this demo, we'll create a tiny dataset to show the format.

In [None]:
data = [
    {"text": "Question: What is RAG? Answer: Retrieval Augmented Generation combining search with LLMs."},
    {"text": "Question: What is LoRA? Answer: Low-Rank Adaptation for efficient fine-tuning."},
    {"text": "Question: Who is the author? Answer: The Awesome Prompt Engineering community."}
]

dataset = Dataset.from_list(data)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

## 4. Train
We use the Hugging Face `Trainer` to handle the training loop.

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    logging_steps=1,
    use_cpu=True # Set to False if you have a GPU
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    data_collator=data_collator,
)

trainer.train()

## 5. Inference
After training, we can use the model to generate text.

In [None]:
inputs = tokenizer("Question: What is LoRA? Answer:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))