# üéØ Project 7: The JSON Specialist

**Objective:** Fine-tune a small language model to consistently output valid JSON.

## üìñ Why Fine-Tune?

Base models sometimes:
- Output invalid JSON syntax
- Add markdown formatting (```json)
- Include explanatory text
- Miss required fields

Fine-tuning teaches consistent formatting!

## üéØ QLoRA Benefits

- **Memory Efficient:** 4-bit quantization reduces VRAM needs
- **Fast:** Only train small adapter layers
- **Effective:** Performance close to full fine-tuning
- **Merge-able:** Can merge adapters back into base model

In [None]:
# Install required packages (run in Colab or local GPU environment)
# !pip install transformers datasets peft bitsandbytes accelerate trl

In [None]:
import torch
import json
from datasets import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer

print(f"‚úÖ PyTorch version: {torch.__version__}")
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")

## Task 1: Create Training Dataset

In [None]:
# Create diverse JSON training examples
training_data = [
    {
        "instruction": "Get the weather for San Francisco",
        "output": {"tool": "get_weather", "args": {"city": "San Francisco"}}
    },
    {
        "instruction": "Calculate 125 multiplied by 8",
        "output": {"tool": "calculator", "args": {"expression": "125 * 8"}}
    },
    {
        "instruction": "Search for information about transformers",
        "output": {"tool": "web_search", "args": {"query": "transformers"}}
    },
    {
        "instruction": "Create a user profile for John Doe, age 30",
        "output": {"action": "create_profile", "data": {"name": "John Doe", "age": 30}}
    },
    {
        "instruction": "List the top 5 AI frameworks",
        "output": {
            "type": "list",
            "items": ["TensorFlow", "PyTorch", "JAX", "Keras", "Scikit-learn"]
        }
    },
    # Add more examples for better fine-tuning
    {
        "instruction": "Book a flight from NYC to LAX on May 15th",
        "output": {
            "tool": "book_flight",
            "args": {
                "origin": "NYC",
                "destination": "LAX",
                "date": "2024-05-15"
            }
        }
    },
    {
        "instruction": "Send an email to john@example.com with subject 'Meeting'",
        "output": {
            "tool": "send_email",
            "args": {
                "to": "john@example.com",
                "subject": "Meeting",
                "body": ""
            }
        }
    },
]

# Format for training
def format_instruction(example):
    """
    Format as: Instruction ‚Üí JSON output
    """
    output_json = json.dumps(example['output'], indent=2)
    return {
        "text": f"""### Instruction:
{example['instruction']}

### Response:
{output_json}"""
    }

formatted_data = [format_instruction(ex) for ex in training_data]
dataset = Dataset.from_list(formatted_data)

print(f"‚úÖ Created dataset with {len(dataset)} examples\n")
print("Example:")
print(dataset[0]['text'])

## Task 2: Load Model with 4-bit Quantization

In [None]:
# Model configuration
model_name = "mistralai/Mistral-7B-v0.1"  # Or "meta-llama/Llama-2-7b-hf"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Load model
print("üì• Loading model (this may take a few minutes)...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

print("‚úÖ Model loaded successfully")
print(f"   Memory footprint: {model.get_memory_footprint() / 1e9:.2f} GB")

## Task 3: Configure LoRA

In [None]:
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,  # Rank
    lora_alpha=32,  # Alpha parameter
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Which layers to adapt
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("\n‚úÖ LoRA adapters added")

## Task 4: Train the Model

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./json_specialist",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    optim="paged_adamw_8bit",
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=lora_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_args,
)

# Start training
print("üöÄ Starting training...\n")
trainer.train()

print("\n‚úÖ Training complete!")

## Task 5: Test and Benchmark

In [None]:
def generate_json(instruction: str, model, tokenizer):
    """
    Generate JSON from instruction.
    """
    prompt = f"""### Instruction:
{instruction}

### Response:
"""
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract just the response part
    response = result.split("### Response:")[-1].strip()
    
    return response

# Test cases
test_instructions = [
    "Get weather for Tokyo",
    "Calculate 456 divided by 12",
    "Create a task with title 'Learn LoRA' and priority high",
]

print("üß™ Testing fine-tuned model:\n")
print("="*80)

for instruction in test_instructions:
    print(f"\nInstruction: {instruction}")
    print("-"*80)
    
    output = generate_json(instruction, model, tokenizer)
    print(f"Output:\n{output}")
    
    # Validate JSON
    try:
        parsed = json.loads(output)
        print("‚úÖ Valid JSON")
    except:
        print("‚ùå Invalid JSON")
    
    print("="*80)

## Task 6: Save Model

In [None]:
# Save the fine-tuned adapter
model.save_pretrained("./json_specialist_adapter")
tokenizer.save_pretrained("./json_specialist_adapter")

print("‚úÖ Model saved to ./json_specialist_adapter")
print("\nTo load later:")
print("  model = AutoModelForCausalLM.from_pretrained(...)")
print("  model = PeftModel.from_pretrained(model, './json_specialist_adapter')")

## üéì Key Takeaways

### QLoRA Benefits:
- **Memory:** 7B model fits in 6GB VRAM (vs 28GB for full)
- **Speed:** Faster than full fine-tuning
- **Quality:** Minimal performance loss

### When to Fine-Tune:
- ‚úÖ Specific output format (JSON, code, etc.)
- ‚úÖ Domain-specific language
- ‚úÖ Consistent behavior patterns
- ‚ùå General knowledge (use RAG instead)

### Production Tips:
- Collect real failure cases for training data
- Validate outputs in production
- A/B test base vs fine-tuned
- Monitor for distribution drift

### Next: Deployment!
Module 8 wraps this in a production API.