# ü¶é CameleonCV - LoRA Fine-Tuning

This notebook fine-tunes LLaMA 3 (8B) using LoRA for style-aware CV transformation.

**What you'll need:**
- Google Colab Pro (A100 GPU recommended)
- Training data uploaded to Google Drive
- ~1-2 hours for training

**What this notebook does:**
1. Installs Unsloth (efficient LoRA training library)
2. Loads your 1,050 training examples
3. Fine-tunes LLaMA 3 8B with LoRA adapters
4. Saves the adapter to Google Drive
5. Tests inference with a sample

---

‚ö†Ô∏è **Before starting:** Make sure you've selected a GPU runtime!

`Runtime ‚Üí Change runtime type ‚Üí A100 GPU` (or T4 if A100 unavailable)

## Step 1: Install Dependencies

This installs Unsloth (fast LoRA training) and required libraries. Takes ~2-3 minutes.

In [None]:
%%capture
# Install Unsloth for efficient LoRA training
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

print("‚úÖ Dependencies installed!")

In [None]:
# Verify GPU is available
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå No GPU found! Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

## Step 2: Mount Google Drive & Load Data

Upload your `train.jsonl` and `validation.jsonl` to Google Drive first!

**Recommended folder structure:**
```
My Drive/
  CameleonCV/
    data/
      train.jsonl
      validation.jsonl
      test.jsonl
    outputs/
      (adapter will be saved here)
```

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set paths - UPDATE THESE IF YOUR FOLDER STRUCTURE IS DIFFERENT
DATA_DIR = "/content/drive/MyDrive/CameleonCV/data"
OUTPUT_DIR = "/content/drive/MyDrive/CameleonCV/outputs"

# Create output directory if it doesn't exist
import os
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"üìÅ Data directory: {DATA_DIR}")
print(f"üìÅ Output directory: {OUTPUT_DIR}")

In [None]:
# Verify data files exist
import os

required_files = ['train.jsonl', 'validation.jsonl']
for f in required_files:
    path = os.path.join(DATA_DIR, f)
    if os.path.exists(path):
        size = os.path.getsize(path) / 1024 / 1024
        print(f"‚úÖ {f} found ({size:.2f} MB)")
    else:
        print(f"‚ùå {f} NOT FOUND at {path}")
        print("   Please upload your data files to Google Drive!")

In [None]:
# Load and inspect training data
import json
from collections import Counter

def load_jsonl(filepath):
    """Load JSONL file into list of dicts"""
    with open(filepath, 'r', encoding='utf-8') as f:
        return [json.loads(line) for line in f]

# Load datasets
train_data = load_jsonl(os.path.join(DATA_DIR, 'train.jsonl'))
val_data = load_jsonl(os.path.join(DATA_DIR, 'validation.jsonl'))

print(f"\nüìä Dataset loaded:")
print(f"   Training examples: {len(train_data)}")
print(f"   Validation examples: {len(val_data)}")

# Show distribution
styles = Counter(ex['metadata']['target_style'] for ex in train_data)
print(f"\nüìà Style distribution in training set:")
for style, count in sorted(styles.items()):
    print(f"   {style}: {count}")

In [None]:
# Preview one training example
example = train_data[0]
print("\nüìù Sample training example:")
print(f"\nID: {example['example_id']}")
print(f"Style: {example['metadata']['target_style']}")
print(f"Section: {example['metadata']['section_type']}")
print(f"\nOriginal (first 200 chars):")
print(f"  {example['input']['original_section'][:200]}...")
print(f"\nTarget output (first 200 chars):")
print(f"  {example['target_output'][:200]}...")

## Step 3: Load Base Model with Unsloth

We'll use LLaMA 3 8B with 4-bit quantization (QLoRA) for memory efficiency.

**Why these settings:**
- `load_in_4bit=True` ‚Üí Fits in GPU memory
- `max_seq_length=2048` ‚Üí Enough for CV sections + context
- LLaMA 3 8B ‚Üí Good balance of capability and trainability

In [None]:
from unsloth import FastLanguageModel
import torch

# Model configuration
max_seq_length = 2048  # Enough for CV sections
dtype = None  # Auto-detect (float16 for T4, bfloat16 for A100)
load_in_4bit = True  # Use QLoRA for memory efficiency

# Load LLaMA 3 8B
print("‚è≥ Loading LLaMA 3 8B (this takes 1-2 minutes)...")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",  # Pre-quantized for efficiency
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print("‚úÖ Base model loaded!")

## Step 4: Configure LoRA Adapters

**LoRA settings explained:**
- `r=16` ‚Üí Rank (capacity) - higher = more capacity, more memory
- `lora_alpha=32` ‚Üí Scaling factor (typically 2√ó rank)
- `lora_dropout=0.05` ‚Üí Light regularization
- `target_modules` ‚Üí Which layers to adapt (attention + MLP)

In [None]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Rank - balance between capacity and efficiency
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
        "gate_proj", "up_proj", "down_proj",      # MLP layers
    ],
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    use_gradient_checkpointing="unsloth",  # Memory optimization
    random_state=42,
)

# Print trainable parameters
def count_parameters(model):
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total = sum(p.numel() for p in model.parameters())
    return trainable, total

trainable, total = count_parameters(model)
print(f"\nüìä Parameter count:")
print(f"   Trainable: {trainable:,} ({trainable/total*100:.2f}%)")
print(f"   Total: {total:,}")
print(f"\n‚úÖ LoRA adapters configured!")

## Step 5: Format Training Data

Convert our JSONL examples into the prompt format the model will learn.

In [None]:
# Define prompt template
PROMPT_TEMPLATE = """### TASK
Rewrite the following CV section according to the specified style and constraints.

### ORIGINAL CV SECTION
{original_section}

### TARGET JOB CONTEXT
{job_posting_excerpt}

### INSTRUCTIONS
{instructions}

### REWRITTEN SECTION
{target_output}"""

# Add EOS token to signal end of generation
EOS_TOKEN = tokenizer.eos_token

def format_example(example):
    """Convert a training example to formatted text"""
    text = PROMPT_TEMPLATE.format(
        original_section=example['input']['original_section'],
        job_posting_excerpt=example['input']['job_posting_excerpt'],
        instructions=example['input']['instructions'],
        target_output=example['target_output']
    )
    return {"text": text + EOS_TOKEN}

# Preview formatted example
sample = format_example(train_data[0])
print("üìù Formatted training example (first 800 chars):")
print(sample['text'][:800])
print("...")

In [None]:
from datasets import Dataset

# Format all examples
print("‚è≥ Formatting training data...")
train_formatted = [format_example(ex) for ex in train_data]
val_formatted = [format_example(ex) for ex in val_data]

# Convert to HuggingFace Dataset
train_dataset = Dataset.from_list(train_formatted)
val_dataset = Dataset.from_list(val_formatted)

print(f"‚úÖ Datasets ready:")
print(f"   Training: {len(train_dataset)} examples")
print(f"   Validation: {len(val_dataset)} examples")

## Step 6: Configure Training

**Training settings explained:**
- `num_train_epochs=3` ‚Üí Train for 3 passes through data
- `per_device_train_batch_size=2` ‚Üí Process 2 examples at a time
- `gradient_accumulation_steps=4` ‚Üí Effective batch size = 8
- `learning_rate=2e-4` ‚Üí Standard for LoRA fine-tuning
- `warmup_steps=50` ‚Üí Gradual learning rate increase

**Estimated time:** ~60-90 minutes on A100, ~2-3 hours on T4

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

# Training configuration
training_args = TrainingArguments(
    # Output
    output_dir="./outputs",
    
    # Training duration
    num_train_epochs=3,
    
    # Batch size (effective = per_device √ó accumulation = 2 √ó 4 = 8)
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    
    # Learning rate
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_steps=50,
    
    # Optimization
    optim="adamw_8bit",
    weight_decay=0.01,
    max_grad_norm=1.0,
    
    # Memory optimization
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    
    # Logging
    logging_steps=10,
    logging_dir="./logs",
    
    # Evaluation
    eval_strategy="steps",
    eval_steps=100,
    
    # Checkpointing
    save_strategy="steps",
    save_steps=200,
    save_total_limit=3,
    
    # Other
    seed=42,
    report_to="none",  # Disable wandb
)

print("‚úÖ Training configuration ready!")
print(f"\nüìä Training settings:")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   Total steps: ~{len(train_dataset) * training_args.num_train_epochs // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps)}")

In [None]:
# Create trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Don't pack multiple examples
    args=training_args,
)

print("‚úÖ Trainer initialized!")

## Step 7: Train! üöÄ

This is the main training loop. Watch the loss decrease!

**What to expect:**
- Initial loss: ~2.5-3.0
- Final loss: ~0.5-1.0 (good) or ~1.0-1.5 (acceptable)
- Validation loss should track training loss (if much higher = overfitting)

In [None]:
# Check GPU memory before training
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"üîß GPU: {gpu_stats.name}")
print(f"üíæ Memory reserved: {start_gpu_memory} GB / {max_memory} GB")
print(f"\nüöÄ Starting training...\n")
print("=" * 60)

In [None]:
# TRAIN!
trainer_stats = trainer.train()

print("\n" + "=" * 60)
print("üéâ Training complete!")
print(f"\nüìä Final stats:")
print(f"   Training loss: {trainer_stats.training_loss:.4f}")
print(f"   Training time: {trainer_stats.metrics['train_runtime']/60:.1f} minutes")
print(f"   Samples/second: {trainer_stats.metrics['train_samples_per_second']:.2f}")

In [None]:
# Check final memory usage
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
print(f"\nüíæ Peak GPU memory: {used_memory} GB / {max_memory} GB ({used_memory/max_memory*100:.1f}%)")

## Step 8: Save the Trained Adapter

We save only the LoRA adapter (~50-100MB), not the full model (~16GB).

This adapter can later be loaded on top of the base LLaMA 3 model.

In [None]:
# Save adapter to Google Drive
import os
from datetime import datetime

# Create timestamped folder
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
adapter_name = f"cameleon_lora_{timestamp}"
save_path = os.path.join(OUTPUT_DIR, adapter_name)

print(f"üíæ Saving adapter to: {save_path}")

# Save LoRA adapter only (not full model)
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

# Check saved files
saved_files = os.listdir(save_path)
total_size = sum(os.path.getsize(os.path.join(save_path, f)) for f in saved_files)

print(f"\n‚úÖ Adapter saved!")
print(f"   Files: {len(saved_files)}")
print(f"   Total size: {total_size / 1024 / 1024:.1f} MB")
print(f"\nüìÅ Saved files:")
for f in saved_files:
    size = os.path.getsize(os.path.join(save_path, f)) / 1024 / 1024
    print(f"   {f}: {size:.2f} MB")

## Step 9: Test Inference üß™

Let's test the fine-tuned model with a sample CV section!

In [None]:
# Enable inference mode
FastLanguageModel.for_inference(model)

print("‚úÖ Model ready for inference!")

In [None]:
# Test prompt
test_original = """Managed customer service team of 8 people handling approximately 200 calls per day. 
Trained new hires on company procedures and phone etiquette. Reduced average call time from 6 minutes 
to 4 minutes by creating quick reference guides. Received employee of the month award twice in 2023."""

test_job = """**Customer Experience Manager**
Lead our customer support team to deliver exceptional service. You'll manage team performance,
develop training programs, and drive efficiency improvements across support channels."""

# Create inference prompt (without target output)
INFERENCE_TEMPLATE = """### TASK
Rewrite the following CV section according to the specified style and constraints.

### ORIGINAL CV SECTION
{original_section}

### TARGET JOB CONTEXT
{job_posting_excerpt}

### INSTRUCTIONS
{instructions}

### REWRITTEN SECTION
"""

def test_style(style):
    """Generate a CV transformation for a given style"""
    prompt = INFERENCE_TEMPLATE.format(
        original_section=test_original,
        job_posting_excerpt=test_job,
        instructions=f"Rewrite this CV experience section in {style} style. Preserve all facts exactly."
    )
    
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
    
    # Decode and extract only the generated part
    full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    generated = full_output[len(prompt):].strip()
    
    return generated

print("üß™ Testing inference...\n")

In [None]:
# Test all 5 styles
styles_to_test = ['confident', 'professional', 'concise', 'academic', 'playful']

print("üìù ORIGINAL:")
print(test_original)
print("\n" + "="*60 + "\n")

for style in styles_to_test:
    print(f"üéØ {style.upper()} STYLE:")
    result = test_style(style)
    print(result)
    print("\n" + "-"*60 + "\n")

## Step 10: Export for Deployment (Optional)

If you want to merge the adapter with the base model for easier deployment:

In [None]:
# Optional: Save merged model (full model, not just adapter)
# This is larger (~16GB) but easier to deploy

SAVE_MERGED = False  # Set to True if you want to save merged model

if SAVE_MERGED:
    merged_path = os.path.join(OUTPUT_DIR, f"cameleon_merged_{timestamp}")
    print(f"üíæ Saving merged model to: {merged_path}")
    print("‚è≥ This will take several minutes and ~16GB of space...")
    
    # Save in 16-bit for deployment
    model.save_pretrained_merged(
        merged_path,
        tokenizer,
        save_method="merged_16bit",
    )
    print(f"‚úÖ Merged model saved!")
else:
    print("‚ÑπÔ∏è Skipping merged model export. Set SAVE_MERGED = True to enable.")

---

# üéâ Training Complete!

## What you've accomplished:
- ‚úÖ Fine-tuned LLaMA 3 8B with LoRA
- ‚úÖ Trained on 840 style transformation examples
- ‚úÖ Saved adapter to Google Drive
- ‚úÖ Tested inference on all 5 styles

## Your saved files:
- **Adapter:** `CameleonCV/outputs/cameleon_lora_[timestamp]/`
- **Size:** ~50-100MB

## Next steps:
1. **Evaluate** - Run systematic evaluation on test set
2. **Compare** - Test base model (zero-shot) vs fine-tuned
3. **Deploy** - Set up inference API or demo
4. **Integrate** - Add Claude API for job relevance

---

**Questions?** Check your training loss curves and validation metrics to ensure the model learned well!