# Lab 1: QLoRA Fine-Tuning with Mistral-7B

In this hands-on lab, you'll fine-tune the Mistral-7B model using QLoRA on a custom instruction dataset.

## Learning Objectives
- Load a model with 4-bit quantization
- Configure LoRA adapters
- Prepare training data in the correct format
- Train using SFTTrainer
- Save and test the fine-tuned model

## Requirements
- GPU with 16GB+ VRAM (or use Google Colab with T4/A100)
- Python 3.10+
- PyTorch 2.0+

## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install -q transformers datasets accelerate peft trl bitsandbytes

## Step 2: Import Libraries

In [None]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig
from datasets import Dataset

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Step 3: Configure Quantization

We'll use 4-bit quantization with NormalFloat4 (nf4) format to reduce memory usage.

In [None]:
# BitsAndBytes configuration for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                     # Load model in 4-bit
    bnb_4bit_quant_type="nf4",             # Use NormalFloat4 quantization
    bnb_4bit_compute_dtype=torch.bfloat16, # Compute in bfloat16
    bnb_4bit_use_double_quant=True,        # Nested quantization for extra savings
)

print("Quantization config created!")

## Step 4: Load the Base Model

We'll load Mistral-7B with our quantization configuration.

In [None]:
# Model name - you can change this to other models
model_name = "mistralai/Mistral-7B-v0.1"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"Tokenizer loaded: vocab size = {tokenizer.vocab_size}")

In [None]:
# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

print(f"Model loaded! Total parameters: {model.num_parameters():,}")

## Step 5: Configure LoRA

LoRA adds small trainable matrices to the attention layers.

In [None]:
# LoRA configuration
lora_config = LoraConfig(
    r=16,                                       # Rank of the update matrices
    lora_alpha=32,                              # Scaling factor
    target_modules=[                            # Which modules to adapt
        "q_proj",
        "k_proj", 
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0.05,                          # Dropout for regularization
    bias="none",                                # Don't train biases
    task_type="CAUSAL_LM",                      # Task type
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

## Step 6: Prepare Training Data

We'll create a small sample dataset for demonstration. In practice, you'd load your own dataset.

In [None]:
# Sample training data - Replace with your own dataset
training_data = [
    {
        "instruction": "Explain machine learning in simple terms.",
        "response": "Machine learning is a type of artificial intelligence where computers learn patterns from data instead of being explicitly programmed. Think of it like teaching a child to recognize cats by showing them many pictures of cats, rather than giving them a list of rules about what makes a cat."
    },
    {
        "instruction": "What is the difference between supervised and unsupervised learning?",
        "response": "Supervised learning uses labeled data where we know the correct answers, like predicting house prices from features. Unsupervised learning finds patterns in unlabeled data, like grouping customers by behavior without predefined categories."
    },
    {
        "instruction": "Explain what a neural network is.",
        "response": "A neural network is a computing system inspired by biological brains. It consists of layers of interconnected nodes (neurons) that process information. Each connection has a weight that adjusts during training, allowing the network to learn complex patterns from data."
    },
    {
        "instruction": "What is overfitting in machine learning?",
        "response": "Overfitting occurs when a model learns the training data too well, including its noise and outliers. The model performs excellently on training data but poorly on new, unseen data. It's like memorizing answers instead of understanding concepts."
    },
    {
        "instruction": "Explain gradient descent.",
        "response": "Gradient descent is an optimization algorithm that finds the minimum of a function by iteratively moving in the direction of steepest descent. In machine learning, it adjusts model parameters to minimize the difference between predictions and actual values."
    },
]

print(f"Training samples: {len(training_data)}")

In [None]:
# Format data for training
def format_instruction(sample):
    """Format a sample into the instruction template."""
    return f"""### Instruction:
{sample['instruction']}

### Response:
{sample['response']}"""

# Create formatted texts
formatted_data = [{"text": format_instruction(sample)} for sample in training_data]

# Create Hugging Face dataset
dataset = Dataset.from_list(formatted_data)

print("Sample formatted data:")
print(dataset[0]["text"])

## Step 7: Configure Training

Set up training arguments using SFTConfig.

In [None]:
# Training configuration
training_args = SFTConfig(
    output_dir="./results",                     # Output directory
    num_train_epochs=3,                         # Number of training epochs
    per_device_train_batch_size=1,              # Batch size per GPU
    gradient_accumulation_steps=4,              # Accumulate gradients
    learning_rate=2e-4,                         # Learning rate
    weight_decay=0.01,                          # Weight decay
    warmup_ratio=0.03,                          # Warmup ratio
    lr_scheduler_type="cosine",                 # LR scheduler
    logging_steps=10,                           # Log every N steps
    save_strategy="epoch",                      # Save every epoch
    fp16=True,                                  # Use FP16
    max_seq_length=512,                         # Maximum sequence length
    dataset_text_field="text",                  # Field containing text
    packing=False,                              # Don't pack sequences
)

print("Training configuration set!")

## Step 8: Create Trainer and Train

In [None]:
# Create SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    tokenizer=tokenizer,
)

print("Trainer created! Ready to train.")

In [None]:
# Train the model
print("Starting training...")
trainer.train()
print("Training complete!")

## Step 9: Save the Model

In [None]:
# Save the LoRA adapter
adapter_path = "./lora-adapter"
model.save_pretrained(adapter_path)
tokenizer.save_pretrained(adapter_path)

print(f"Adapter saved to {adapter_path}")

## Step 10: Test the Fine-Tuned Model

In [None]:
# Test the model
def generate_response(instruction):
    """Generate a response for an instruction."""
    prompt = f"""### Instruction:
{instruction}

### Response:
"""
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            do_sample=True,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

# Test with a sample question
test_instruction = "What is transfer learning?"
response = generate_response(test_instruction)

print(f"Instruction: {test_instruction}")
print(f"\nResponse: {response}")

## Exercises

1. **Modify the dataset**: Add more training examples relevant to your domain
2. **Adjust hyperparameters**: Try different values for `r`, `lora_alpha`, and learning rate
3. **Change target modules**: Experiment with different target modules
4. **Increase epochs**: Train for more epochs and observe the effect
5. **Compare results**: Test the same prompts on the base model vs fine-tuned model

In [None]:
# YOUR CODE HERE - Add your own experiments!
