# LLM Fine-Tuning: A Comprehensive Guide

## Introduction
Fine-tuning is the process of training a pre-trained Large Language Model (LLM) on a specific dataset to adapt it for a particular task, style, or domain.

### Fine-Tuning vs. RAG
- **RAG (Retrieval-Augmented Generation)**: Provides *context* to the model at inference time. The model's weights do not change. Good for up-to-date knowledge.
- **Fine-Tuning**: Modifies the model's *internal weights*. Good for changing the *behavior*, *style*, or teaching a strict format/syntax.

## Fine-Tuning Methods
We will cover the main approaches, focusing on modern efficient methods.

1. **Full Fine-Tuning**: Updates all billions of parameters. Extremely computationally expensive (requires massive GPU clusters).
2. **PEFT (Parameter-Efficient Fine-Tuning)**: Updates only a small number of extra parameters while freezing the main model. 
   - **LoRA (Low-Rank Adaptation)**: The most popular method. Injects small rank-decomposition matrices into layers.
   - **QLoRA (Quantized LoRA)**: Uses 4-bit quantization for the base model to drastically reduce memory usage (run 7B models on free Colab/consumer GPUs).
   - **Prefix Tuning / Prompt Tuning**: Optimizes a continuous vector (virtual tokens) added to the input.
3. **Alignment Tuning**:
   - **RLHF (Reinforcement Learning from Human Feedback)**: Uses a reward model to train the LLM.
   - **DPO (Direct Preference Optimization)**: A more stable alternative to RLHF that optimizes essentially the same objective without a separate reward model.

---

## 1. Setup Environment
We need `transformers`, `peft`, `trl` (Transformer Reinforcement Learning - library by HuggingFace for SFT/DPO), `bitsandbytes`, and `accelerate`.

In [None]:
%pip install -q transformers peft trl bitsandbytes accelerate datasets

## 2. Load Dataset
Standard instruction datasets usually have columns like `instruction`, `input`, `output` (Alpaca format) or `messages` (ChatML format).

In [None]:
from datasets import load_dataset

# Example: Using a tiny subset of an instruction dataset or creating a dummy one
data = [
    {"text": "Human: What is the capital of France?\nAssistant: The capital of France is Paris."},
    {"text": "Human: Explain Quantum Computing.\nAssistant: It uses quantum bits (qubits) to perform complex calculations."}
]
from datasets import Dataset
dataset = Dataset.from_list(data)
print(dataset[0])

## 3. QLoRA: Efficient Fine-Tuning
We will load a model in 4-bit precision.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# 4-bit Quantization Configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

# Load Base Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

## 4. LoRA Configuration
Define which modules we are targeting (usually `q_proj`, `v_proj`).

In [None]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=8,        # Rank
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

## 5. Training with SFTTrainer
The `SFTTrainer` (Supervised Fine-tuning Trainer) from `trl` simplifies the training loop.

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=10,
    learning_rate=2e-4,
    max_steps=50, # Short run for demo
    fp16=True,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    peft_config=peft_config,
    args=training_args
)

# trainer.train() # Uncomment to run training

## 6. Saving and Merging
After training, you have "adapter weights". To use the model normally, you merge these weights back into the base model.

```python
trainer.model.save_pretrained("final_adapter")
# Reload base and adapter, then merge
base_model = AutoModelForCausalLM.from_pretrained(model_id, ...)
model = PeftModel.from_pretrained(base_model, "final_adapter")
model = model.merge_and_unload()
model.save_pretrained("merged_model")
```

## 7. Other Methods Briefly Explained

### Instruction Tuning
This is fine-tuning on a dataset formatted as Q&A or Instructions (like the one above). It turns a raw text-completion model into a helpful assistant.

### DPO (Direct Preference Optimization)
Instead of just mimicking text (SFT), DPO takes a dataset of `(prompt, chosen_response, rejected_response)`. It increases the probability of the chosen response and decreases the rejected one. This is key for making models "safer" or aligned with human preferences.

```python
from trl import DPOTrainer
# Requires a dataset with columns: prompt, chosen, rejected
# trainer = DPOTrainer(model, ref_model, args=..., train_dataset=dpo_dataset)
```