# Building and Training Your Own LoRA Adapter

This notebook will walk you through the complete process of creating, training, and using a LoRA (Low-Rank Adaptation) adapter for fine-tuning language models efficiently.

## What You'll Learn
- How to set up LoRA configuration
- How to prepare datasets for training
- How to train a LoRA adapter
- How to save and load your trained adapter
- How to use your adapter for inference

## Prerequisites
Make sure you have the required packages installed:
```bash
pip install transformers peft datasets torch accelerate bitsandbytes
```

## Step 1: Import Required Libraries

In [1]:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
#from datasets import load_dataset, Dataset
#from peft import LoraConfig, get_peft_model, TaskType, PeftModel
'''from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)'''
import json
import os

os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")

Using device: cpu
PyTorch version: 2.7.1
Transformers version: 4.56.0.dev0


## Step 2: Choose and Load Base Model

We'll use a small model for this tutorial to ensure it runs on most hardware. You can replace this with any compatible model.

In [None]:
# Model configuration
#model_name = "microsoft/DialoGPT-small"  # Small model for tutorial
model_name = "./DialoGPT-small" # using local
# Alternative options:
# model_name = "HuggingFaceTB/SmolLM2-135M"  # From the smol course
# model_name = "gpt2"  # Classic choice

print(f"Loading model: {model_name}")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model with 8-bit quantization to save memory
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto",
    torch_dtype=torch.float16
)

print(f"Model loaded successfully!")
print(f"Model parameters: {model.num_parameters():,}")

## Step 3: Configure LoRA Parameters

LoRA works by adding small trainable matrices to existing layers. Let's configure the LoRA parameters:

In [None]:
# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Causal language modeling
    r=16,                          # Rank of adaptation (higher = more parameters)
    lora_alpha=32,                 # LoRA scaling parameter (typically 2x the rank)
    lora_dropout=0.1,              # Dropout for LoRA layers
    target_modules=["c_attn"],     # Target attention modules (varies by model)
    bias="none",                   # Whether to train bias parameters
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

print("\nLoRA configuration applied successfully!")

## Step 4: Prepare Training Data

For this tutorial, we'll create a simple dataset. In practice, you'd use your own domain-specific data.

In [None]:
# Create a simple training dataset
# This is just for demonstration - use your own data in practice
training_texts = [
    "The weather today is beautiful and sunny.",
    "I love learning about machine learning and AI.",
    "LoRA is an efficient way to fine-tune large language models.",
    "Python is a great programming language for data science.",
    "Transformers have revolutionized natural language processing.",
    "Fine-tuning allows models to adapt to specific tasks.",
    "Parameter-efficient methods reduce computational costs.",
    "Small language models can be very effective for focused tasks.",
    "The Hugging Face ecosystem makes ML more accessible.",
    "Open source AI tools democratize machine learning."
]

# Alternative: Load a real dataset
# dataset = load_dataset("imdb", split="train[:1000]")  # Small subset
# training_texts = dataset["text"]

print(f"Training on {len(training_texts)} examples")
print(f"Example text: {training_texts[0]}")

## Step 5: Tokenize the Data

In [None]:
def tokenize_function(examples):
    """Tokenize the input texts"""
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=128,  # Adjust based on your data
        return_tensors="pt"
    )

# Create dataset
train_dataset = Dataset.from_dict({"text": training_texts})

# Tokenize
tokenized_dataset = train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"]
)

print(f"Tokenized dataset: {tokenized_dataset}")
print(f"Example tokenized: {tokenized_dataset[0]}")

## Step 6: Set Up Training Arguments

In [None]:
# Training configuration
output_dir = "./my-lora-adapter"

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,    # Small batch size for tutorial
    per_device_eval_batch_size=2,
    num_train_epochs=3,               # Number of training epochs
    learning_rate=3e-4,               # Learning rate (higher for LoRA)
    warmup_steps=10,                  # Warmup steps
    logging_steps=5,                  # Log every N steps
    save_steps=50,                    # Save checkpoint every N steps
    save_total_limit=2,               # Keep only 2 checkpoints
    prediction_loss_only=True,        # Only compute loss for evaluation
    remove_unused_columns=False,      # Keep all columns
    dataloader_pin_memory=False,      # Disable pin memory for compatibility
    gradient_checkpointing=True,      # Save memory
    fp16=True,                        # Use mixed precision
)

print("Training arguments configured!")

## Step 7: Create Data Collator and Trainer

In [None]:
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We're doing causal LM, not masked LM
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

print("Trainer created successfully!")

## Step 8: Train the LoRA Adapter

Now let's train our LoRA adapter!

In [None]:
print("Starting training...")
print("This may take a few minutes depending on your hardware.")

# Start training
trainer.train()

print("\nTraining completed!")

## Step 9: Save the LoRA Adapter

In [None]:
# Save the LoRA adapter
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"LoRA adapter saved to: {output_dir}")

# Check what files were saved
saved_files = os.listdir(output_dir)
print(f"Saved files: {saved_files}")

# Check adapter size
adapter_file = os.path.join(output_dir, "adapter_model.safetensors")
if os.path.exists(adapter_file):
    size_mb = os.path.getsize(adapter_file) / (1024 * 1024)
    print(f"Adapter size: {size_mb:.2f} MB")

## Step 10: Load and Test Your Trained Adapter

In [None]:
# Load the base model again (simulating a fresh start)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto",
    torch_dtype=torch.float16
)

# Load your trained LoRA adapter
model_with_adapter = PeftModel.from_pretrained(base_model, output_dir)

print("LoRA adapter loaded successfully!")

## Step 11: Test Your Model

In [None]:
def generate_text(model, tokenizer, prompt, max_length=50):
    """Generate text using the model"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Test prompts
test_prompts = [
    "The weather today is",
    "Machine learning is",
    "LoRA adapters are"
]

print("Testing your trained model:")
print("=" * 50)

for prompt in test_prompts:
    generated = generate_text(model_with_adapter, tokenizer, prompt)
    print(f"Prompt: {prompt}")
    print(f"Generated: {generated}")
    print("-" * 30)

## Step 12: Compare with Base Model (Optional)

In [None]:
# Compare with base model (without adapter)
print("Comparing with base model:")
print("=" * 50)

test_prompt = "Machine learning is"

# Base model
base_generated = generate_text(base_model, tokenizer, test_prompt)
print(f"Base model: {base_generated}")

# Model with adapter
adapter_generated = generate_text(model_with_adapter, tokenizer, test_prompt)
print(f"With adapter: {adapter_generated}")

## Step 13: Advanced - Multiple Adapters

In [None]:
# You can create multiple adapters for different tasks
# and switch between them easily

# Example: Create a second adapter for a different task
# (This is just for demonstration)

print("Creating a second adapter configuration...")

# Different LoRA config for a different task
lora_config_2 = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,  # Different rank
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["c_attn"],
    bias="none",
)

print("You could train this on different data for a different task!")
print("Then switch between adapters as needed.")

## Summary and Next Steps

Congratulations! You've successfully:

1. ✅ Configured a LoRA adapter
2. ✅ Prepared training data
3. ✅ Trained your own LoRA adapter
4. ✅ Saved and loaded the adapter
5. ✅ Tested the adapted model

### Key Takeaways:
- LoRA adapters are much smaller than full model weights
- Training is faster and requires less memory
- You can create multiple adapters for different tasks
- Adapters can be easily shared and distributed

### Next Steps:
1. **Try with your own data**: Replace the sample data with your domain-specific dataset
2. **Experiment with parameters**: Try different `r`, `lora_alpha`, and `target_modules`
3. **Use larger models**: Try with SmolLM2 or other models from the smol course
4. **Combine with other techniques**: Explore combining LoRA with other PEFT methods
5. **Deploy your adapter**: Use your adapter in production applications

### Resources:
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [Smol Course Module 3](../3_parameter_efficient_finetuning/)

Happy fine-tuning! 🚀