# Basic Fine-Tuning of Llama 3.3 with LoRA

This notebook demonstrates how to fine-tune a Llama 3.3 model using Low-Rank Adaptation (LoRA). LoRA is a parameter-efficient fine-tuning technique that significantly reduces memory requirements while maintaining performance.

## Prerequisites

Before running this notebook, ensure you have:
1. Installed all required dependencies (see `requirements.txt`)
2. Access to a GPU with at least 16GB VRAM (for 8B model)
3. Prepared your training dataset in the appropriate format

Let's get started!

## 1. Setup and Environment Check

In [None]:
# Import required libraries
import os
import sys
import torch
import numpy as np
import pandas as pd
import yaml
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    TrainingArguments, 
    Trainer, 
    DataCollatorForLanguageModeling
)
from datasets import load_dataset

# Add the repository root to the path so we can import modules
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("WARNING: No GPU detected. Fine-tuning will be extremely slow on CPU.")

## 2. Configuration

Let's define our fine-tuning configuration. You can adjust these parameters based on your specific requirements and hardware constraints.

In [None]:
# Define configuration
config = {
    "model": {
        "base_model": "meta-llama/Llama-3.3-8B",
        "output_dir": "../data/models/llama-3-lora-basic/",
    },
    "training": {
        "learning_rate": 2e-5,
        "batch_size": 4,
        "gradient_accumulation_steps": 4,
        "num_train_epochs": 3,
        "warmup_ratio": 0.05,
        "weight_decay": 0.01,
        "lr_scheduler_type": "cosine",
        "bf16": True,  # Use bfloat16 for mixed precision training
        "gradient_checkpointing": True,  # Enable gradient checkpointing to save memory
    },
    "data": {
        "train_file": "../data/processed/dataset/train.jsonl",
        "validation_file": "../data/processed/dataset/val.jsonl",
        "max_seq_length": 2048,
    },
    "lora": {
        "r": 16,  # LoRA attention dimension
        "alpha": 32,  # LoRA alpha parameter
        "dropout": 0.05,
        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],  # Attention modules for 8B model
    }
}

# Create output directory if it doesn't exist
os.makedirs(config["model"]["output_dir"], exist_ok=True)

# Save configuration to output directory for reproducibility
with open(os.path.join(config["model"]["output_dir"], "config.yaml"), "w") as f:
    yaml.dump(config, f)

## 3. Data Preparation

Load and prepare the dataset for fine-tuning. We expect datasets in JSONL format with an instruction format structure.

In [None]:
# Check if data files exist
train_file = config["data"]["train_file"]
validation_file = config["data"]["validation_file"]

if not os.path.exists(train_file):
    raise FileNotFoundError(f"Training file not found: {train_file}")
if not os.path.exists(validation_file):
    raise FileNotFoundError(f"Validation file not found: {validation_file}")

print(f"Loading dataset from {train_file} and {validation_file}")

# Load datasets
dataset = load_dataset('json', data_files={
    'train': train_file,
    'validation': validation_file
})

# Display dataset information
print(f"Dataset format: {dataset}")
print(f"Training examples: {len(dataset['train'])}")
print(f"Validation examples: {len(dataset['validation'])}")
print("\nSample data:")
print(dataset["train"][0])

### Format the Data

For instruction fine-tuning, we need to format our data in a standardized way that the model can understand.

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    config["model"]["base_model"],
    trust_remote_code=True,
)

# Set padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Define prompt template for instruction tuning
PROMPT_TEMPLATE = """### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}
"""

# Function to format examples
def format_instruction(example):
    instruction = example.get("instruction", "")
    input_text = example.get("input", "")
    output = example.get("output", "")
    
    # Format according to the template
    text = PROMPT_TEMPLATE.format(
        instruction=instruction,
        input=input_text,
        output=output
    )
    
    return {"text": text}

# Apply formatting to dataset
formatted_dataset = dataset.map(format_instruction, remove_columns=dataset["train"].column_names)
print("\nFormatted sample:")
print(formatted_dataset["train"][0]["text"])

### Tokenize the Dataset

Now we'll tokenize our formatted dataset.

In [None]:
# Function to tokenize examples
def tokenize_function(examples):
    result = tokenizer(
        examples["text"],
        padding=False,
        truncation=True,
        max_length=config["data"]["max_seq_length"]
    )
    return result

# Tokenize dataset
print("Tokenizing dataset...")
tokenized_dataset = formatted_dataset.map(
    tokenize_function,
    batched=True,
    num_proc=4,
    remove_columns=["text"]
)

# Display tokenized dataset info
print(f"Tokenized dataset: {tokenized_dataset}")
print(f"Features: {tokenized_dataset['train'].features}")

## 4. Model Preparation

Load the base model and prepare it for LoRA fine-tuning.

In [None]:
# Load base model
print(f"Loading base model: {config['model']['base_model']}")
model = AutoModelForCausalLM.from_pretrained(
    config["model"]["base_model"],
    torch_dtype=torch.bfloat16 if config["training"]["bf16"] else torch.float16,
    trust_remote_code=True,
)

# Display model size
total_params = sum(p.numel() for p in model.parameters()) / 1e9
print(f"Model loaded with {total_params:.2f} billion parameters")

# Set up LoRA configuration
peft_config = LoraConfig(
    r=config["lora"]["r"],
    lora_alpha=config["lora"]["alpha"],
    lora_dropout=config["lora"]["dropout"],
    target_modules=config["lora"]["target_modules"],
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to the model
model = get_peft_model(model, peft_config)

# Enable gradient checkpointing if configured
if config["training"]["gradient_checkpointing"]:
    model.gradient_checkpointing_enable()

# Print trainable parameters info
model.print_trainable_parameters()

## 5. Training Setup

Configure the training arguments and data collator.

In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir=config["model"]["output_dir"],
    per_device_train_batch_size=config["training"]["batch_size"],
    per_device_eval_batch_size=config["training"]["batch_size"],
    gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
    learning_rate=config["training"]["learning_rate"],
    num_train_epochs=config["training"]["num_train_epochs"],
    warmup_ratio=config["training"]["warmup_ratio"],
    weight_decay=config["training"]["weight_decay"],
    lr_scheduler_type=config["training"]["lr_scheduler_type"],
    bf16=config["training"]["bf16"],
    fp16=not config["training"]["bf16"],
    evaluation_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=500,
    save_total_limit=3,
    load_best_model_at_end=True,
    logging_steps=50,
    report_to="tensorboard",
    remove_unused_columns=False
)

# Set up data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, 
    mlm=False  # Causal language modeling, not masked language modeling
)

## 6. Training

Now we'll set up the trainer and run the fine-tuning process.

In [None]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

# Run training
print("Starting training...")
trainer.train()

## 7. Save Model

After training, save the model and verify the outputs.

In [None]:
# Save the fine-tuned model
print(f"Saving model to {config['model']['output_dir']}")
trainer.save_model()

# Save the tokenizer
tokenizer.save_pretrained(config["model"]["output_dir"])

# List files in output directory
print("\nFiles in output directory:")
for root, dirs, files in os.walk(config["model"]["output_dir"]):
    level = root.replace(config["model"]["output_dir"], '').count(os.sep)
    indent = ' ' * 4 * level
    print(f"{indent}{os.path.basename(root)}/")
    sub_indent = ' ' * 4 * (level + 1)
    for f in files:
        print(f"{sub_indent}{f}")

## 8. Test Model

Let's test our fine-tuned model with a few examples.

In [None]:
# Load fine-tuned model
from peft import PeftModel, PeftConfig

# Load the saved LoRA model
print("Loading fine-tuned model for testing...")
peft_config = PeftConfig.from_pretrained(config["model"]["output_dir"])
base_model = AutoModelForCausalLM.from_pretrained(
    peft_config.base_model_name_or_path,
    torch_dtype=torch.bfloat16 if config["training"]["bf16"] else torch.float16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, config["model"]["output_dir"])
model.eval()

# Function for generating text
def generate_text(instruction, input_text=""):
    # Format the prompt
    prompt = PROMPT_TEMPLATE.format(
        instruction=instruction,
        input=input_text,
        output=""
    )
    
    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate text
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_new_tokens=512,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode the output
    output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract just the generated response (after the prompt)
    response = output_text[len(prompt):]
    
    return response.strip()

# Test with a few examples
test_examples = [
    {"instruction": "Explain the concept of fine-tuning in machine learning.", "input": ""},
    {"instruction": "Summarize the following text", "input": "Llama 3 is Meta's latest text generation AI model family. It's available in two sizes and delivers improvements in multiple dimensions."},
    # Add more examples specific to your fine-tuning domain
]

for i, example in enumerate(test_examples):
    print(f"\nExample {i+1}:")
    print(f"Instruction: {example['instruction']}")
    if example['input']:
        print(f"Input: {example['input']}")
    print("\nGenerated Response:")
    response = generate_text(example['instruction'], example['input'])
    print(response)

## 9. Conclusion

Congratulations! You've successfully fine-tuned a Llama 3.3 model using LoRA. Here's a summary of what we've accomplished:

1. Set up the environment and configured the training parameters
2. Prepared and processed the training dataset
3. Set up the model with LoRA for parameter-efficient fine-tuning
4. Trained the model on our custom dataset
5. Saved and tested the fine-tuned model

### Next Steps

Now that you have a basic understanding of how to fine-tune Llama 3.3 models, you can:

1. Experiment with different hyperparameters to improve performance
2. Try different LoRA configurations (changing rank, target modules, etc.)
3. Expand your training dataset for better results
4. Explore more advanced techniques in our other notebooks

For more advanced fine-tuning approaches, check out the [memory-efficient fine-tuning notebook](./02_memory_efficient_fine_tuning.ipynb) or explore the [evaluation and testing notebook](./03_evaluation_and_testing.ipynb) to assess your model's performance.