# Flow SDK Fine-tuning Notebook

Fine-tune large language models using Flow SDK with parameter-efficient methods like LoRA and QLoRA.

This notebook covers:
- Quick LoRA fine-tuning
- QLoRA for memory efficiency
- Custom dataset preparation
- Multi-GPU fine-tuning
- Evaluation and deployment

## Setup

First, let's install and configure the Flow SDK:

In [None]:
# Install Flow SDK
!pip install flow-sdk --upgrade

# Import required libraries
import flow
from flow import TaskConfig
import json
import time
import pandas as pd
from typing import Dict, List
import matplotlib.pyplot as plt

In [None]:
# Initialize Flow client
flow_client = flow.Flow()

# Check authentication
print("✓ Flow SDK initialized")
print(f"API Endpoint: {flow_client.api_endpoint}")

# Set HuggingFace token if you have one
import os
HF_TOKEN = os.environ.get('HF_TOKEN', '')
if HF_TOKEN:
    print("✓ HuggingFace token detected")
else:
    print("⚠️  No HF_TOKEN found. Some models may not be accessible.")

## 1. Quick LoRA Fine-tuning

Fine-tune Mistral-7B on a custom dataset using LoRA:

In [None]:
# Create sample training data
training_data = [
    {
        "instruction": "What is machine learning?",
        "input": "",
        "output": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
    },
    {
        "instruction": "Explain neural networks",
        "input": "in simple terms",
        "output": "Neural networks are computing systems inspired by biological brains, consisting of interconnected nodes that process information in layers."
    },
    {
        "instruction": "What are the benefits of cloud computing?",
        "input": "",
        "output": "Cloud computing offers scalability, cost-efficiency, accessibility from anywhere, automatic updates, and reduced IT maintenance."
    }
]

# Save as JSON
with open("/tmp/training_data.json", "w") as f:
    json.dump(training_data, f, indent=2)

print(f"✓ Created training dataset with {len(training_data)} examples")

In [None]:
# LoRA fine-tuning script
lora_script = """
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import json
import os

# Configuration
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
output_dir = "./mistral-lora-finetuned"

print(f"Loading model: {model_name}")

# Load model in 4-bit for QLoRA
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
    token=os.environ.get("HF_TOKEN")
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    token=os.environ.get("HF_TOKEN")
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,  # LoRA rank
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    bias="none"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Load dataset
with open('/workspace/training_data.json', 'r') as f:
    data = json.load(f)

# Format data for training
def format_instruction(sample):
    instruction = sample['instruction']
    input_text = sample.get('input', '')
    output = sample['output']
    
    if input_text:
        text = f"### Instruction: {instruction}\n### Input: {input_text}\n### Response: {output}"
    else:
        text = f"### Instruction: {instruction}\n### Response: {output}"
    
    return {"text": text}

# Create dataset
from datasets import Dataset
dataset = Dataset.from_list([format_instruction(item) for item in data])

print(f"Dataset size: {len(dataset)} examples")

# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    save_strategy="epoch",
    logging_steps=1,
    report_to="none",
    save_total_limit=2,
    warmup_ratio=0.1,
)

# Create trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=512,
    packing=False,
)

# Train
print("Starting training...")
trainer.train()

# Save LoRA weights
trainer.save_model(f"{output_dir}/final")
print(f"\nLoRA weights saved to {output_dir}/final")

# Test the fine-tuned model
print("\nTesting fine-tuned model:")
test_prompt = "### Instruction: What is deep learning?\n### Response:"
inputs = tokenizer(test_prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
"""

# Save script
with open("/tmp/lora_finetune.py", "w") as f:
    f.write(lora_script)

print("✓ LoRA fine-tuning script created")

In [None]:
# Run LoRA fine-tuning
lora_config = TaskConfig(
    name="mistral-lora-finetuning",
    command="""
    pip install transformers accelerate peft datasets bitsandbytes trl
    python /workspace/lora_finetune.py
    """,
    instance_type="a100",  # Single A100 80GB
    upload_files={
        "/tmp/training_data.json": "training_data.json",
        "/tmp/lora_finetune.py": "lora_finetune.py"
    },
    download_patterns=["mistral-lora-finetuned/*"],
    environment={"HF_TOKEN": HF_TOKEN} if HF_TOKEN else {},
    max_price_per_hour=10.00,
    max_run_time_hours=2
)

print("🚀 Starting LoRA fine-tuning...")
lora_task = flow_client.run(lora_config)
print(f"Task ID: {lora_task.task_id}")
print(f"Status: {lora_task.status}")

## 2. Custom Dataset Fine-tuning

Fine-tune on a larger custom dataset:

In [None]:
# Generate a larger synthetic dataset
import random

# Templates for generating training data
templates = [
    {
        "instruction": "Summarize the following text",
        "topics": ["AI advancements", "climate change", "space exploration", "medical breakthroughs"],
        "generate_output": lambda topic: f"This text discusses {topic}, highlighting recent developments and future implications."
    },
    {
        "instruction": "Translate to a professional tone",
        "inputs": ["hey whats up", "thx for ur help", "gonna be late sry"],
        "outputs": ["Hello, how are you?", "Thank you for your assistance.", "I apologize, but I will be arriving late."]
    },
    {
        "instruction": "Generate a Python function that",
        "tasks": ["sorts a list", "finds prime numbers", "calculates fibonacci"],
        "generate_output": lambda task: f"def function():\n    # Implementation for {task}\n    pass"
    }
]

# Generate dataset
large_dataset = []
for _ in range(100):
    template = random.choice(templates)
    
    if "topics" in template:
        topic = random.choice(template["topics"])
        large_dataset.append({
            "instruction": template["instruction"],
            "input": f"Recent research shows progress in {topic}...",
            "output": template["generate_output"](topic)
        })
    elif "inputs" in template:
        idx = random.randint(0, len(template["inputs"]) - 1)
        large_dataset.append({
            "instruction": template["instruction"],
            "input": template["inputs"][idx],
            "output": template["outputs"][idx]
        })
    elif "tasks" in template:
        task = random.choice(template["tasks"])
        large_dataset.append({
            "instruction": template["instruction"] + " " + task,
            "input": "",
            "output": template["generate_output"](task)
        })

# Save dataset
with open("/tmp/large_training_data.json", "w") as f:
    json.dump(large_dataset, f, indent=2)

print(f"✓ Generated large dataset with {len(large_dataset)} examples")

# Show sample
print("\nSample data:")
for i in range(3):
    sample = large_dataset[i]
    print(f"\nExample {i+1}:")
    print(f"Instruction: {sample['instruction']}")
    if sample['input']:
        print(f"Input: {sample['input']}")
    print(f"Output: {sample['output'][:100]}...")

In [None]:
# Advanced fine-tuning with validation
advanced_script = """
import torch
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    TrainingArguments,
    DataCollatorForLanguageModeling
)
from datasets import load_dataset, Dataset
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
from trl import SFTTrainer
import json
import numpy as np
from sklearn.model_selection import train_test_split

# Load data
with open('/workspace/large_training_data.json', 'r') as f:
    data = json.load(f)

# Split into train/validation
train_data, val_data = train_test_split(data, test_size=0.1, random_state=42)
print(f"Train: {len(train_data)}, Validation: {len(val_data)}")

# Model configuration
model_name = "microsoft/phi-2"  # Smaller model for faster iteration
output_dir = "./phi2-custom-finetuned"

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token

# Enhanced LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=32,  # Higher rank for more capacity
    lora_alpha=64,
    lora_dropout=0.05,
    target_modules=["Wqkv", "fc1", "fc2"],  # Phi-2 specific
    bias="none"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Format datasets
def format_instruction(sample):
    instruction = sample['instruction']
    input_text = sample.get('input', '')
    output = sample['output']
    
    # Use a chat-like format
    if input_text:
        text = f"User: {instruction}\nInput: {input_text}\nAssistant: {output}"
    else:
        text = f"User: {instruction}\nAssistant: {output}"
    
    return {"text": text}

# Create datasets
train_dataset = Dataset.from_list([format_instruction(item) for item in train_data])
val_dataset = Dataset.from_list([format_instruction(item) for item in val_data])

# Advanced training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=3,
    learning_rate=5e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    logging_steps=5,
    evaluation_strategy="steps",
    eval_steps=20,
    save_strategy="steps",
    save_steps=50,
    save_total_limit=3,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    report_to="none",
    push_to_hub=False,
)

# Custom compute metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    # Simple perplexity calculation
    loss = np.mean(predictions)
    perplexity = np.exp(loss)
    return {"perplexity": perplexity}

# Create trainer with validation
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=512,
    packing=True,  # Enable packing for efficiency
)

# Train
print("\nStarting training with validation...")
train_result = trainer.train()

# Save model and metrics
trainer.save_model(f"{output_dir}/final")
trainer.save_state()

# Save training metrics
metrics = train_result.metrics
with open(f"{output_dir}/training_metrics.json", "w") as f:
    json.dump(metrics, f, indent=2)

print(f"\nTraining complete!")
print(f"Final train loss: {metrics.get('train_loss', 'N/A')}")
print(f"Model saved to {output_dir}/final")

# Test the model
print("\nTesting fine-tuned model on new prompts:")
test_prompts = [
    "User: What is quantum computing?\nAssistant:",
    "User: Translate to a professional tone\nInput: gonna grab lunch brb\nAssistant:",
    "User: Generate a Python function that reverses a string\nAssistant:"
]

for prompt in test_prompts:
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=50, 
            temperature=0.7,
            do_sample=True,
            top_p=0.9
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"\n{response}")
    print("-" * 50)
"""

# Save script
with open("/tmp/advanced_finetune.py", "w") as f:
    f.write(advanced_script)

print("✓ Advanced fine-tuning script created")

In [None]:
# Run advanced fine-tuning
advanced_config = TaskConfig(
    name="phi2-advanced-finetuning",
    command="""
    pip install transformers accelerate peft datasets bitsandbytes trl scikit-learn
    python /workspace/advanced_finetune.py
    """,
    instance_type="a100",
    upload_files={
        "/tmp/large_training_data.json": "large_training_data.json",
        "/tmp/advanced_finetune.py": "advanced_finetune.py"
    },
    download_patterns=["phi2-custom-finetuned/*"],
    max_price_per_hour=10.00,
    max_run_time_hours=3
)

print("🚀 Starting advanced fine-tuning with validation...")
advanced_task = flow_client.run(advanced_config)
print(f"Task ID: {advanced_task.task_id}")

## 3. Multi-GPU Fine-tuning

Scale up fine-tuning with distributed training:

In [None]:
# Distributed fine-tuning configuration
distributed_config = TaskConfig(
    name="distributed-llama-finetuning",
    command="""
    pip install transformers accelerate peft datasets deepspeed
    
    # Create DeepSpeed config
    cat > ds_config.json << 'EOF'
    {
        "fp16": {
            "enabled": true
        },
        "zero_optimization": {
            "stage": 3,
            "offload_optimizer": {
                "device": "cpu",
                "pin_memory": true
            },
            "offload_param": {
                "device": "cpu",
                "pin_memory": true
            },
            "overlap_comm": true,
            "contiguous_gradients": true,
            "sub_group_size": 1e9
        },
        "gradient_accumulation_steps": 4,
        "train_batch_size": "auto",
        "train_micro_batch_size_per_gpu": "auto"
    }
    EOF
    
    # Run distributed training
    accelerate launch \
        --config_file accelerate_config.yaml \
        --num_processes 4 \
        --num_machines 1 \
        --mixed_precision fp16 \
        --deepspeed_config_file ds_config.json \
        finetune_distributed.py
    """,
    instance_type="4xa100",  # 4x A100 for distributed training
    environment={"HF_TOKEN": HF_TOKEN} if HF_TOKEN else {},
    max_price_per_hour=40.00,
    max_run_time_hours=6
)

print("🚀 Distributed fine-tuning configuration created")
print("This would fine-tune LLaMA-13B across 4x A100 GPUs")

## 4. Fine-tuning Pipeline

Complete pipeline with data preprocessing, training, and evaluation:

In [None]:
# Complete fine-tuning pipeline
pipeline_script = """
import os
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
import evaluate
import numpy as np

class FineTuningPipeline:
    def __init__(self, model_name, dataset_name, output_dir):
        self.model_name = model_name
        self.dataset_name = dataset_name
        self.output_dir = output_dir
        
    def prepare_data(self):
        """Load and preprocess dataset."""
        print("Loading dataset...")
        
        # Load from HuggingFace or local file
        if os.path.exists(self.dataset_name):
            with open(self.dataset_name, 'r') as f:
                data = json.load(f)
            dataset = Dataset.from_list(data)
        else:
            dataset = load_dataset(self.dataset_name, split="train")
        
        # Split dataset
        split = dataset.train_test_split(test_size=0.1)
        self.train_dataset = split["train"]
        self.eval_dataset = split["test"]
        
        print(f"Train: {len(self.train_dataset)}, Eval: {len(self.eval_dataset)}")
        
    def setup_model(self):
        """Initialize model with LoRA."""
        print(f"Loading model: {self.model_name}")
        
        # Model setup
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            load_in_8bit=True,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        
        # LoRA configuration
        lora_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM,
            r=16,
            lora_alpha=32,
            lora_dropout=0.1,
            target_modules=["q_proj", "v_proj"]
        )
        
        self.model = get_peft_model(self.model, lora_config)
        self.model.print_trainable_parameters()
        
    def train(self):
        """Fine-tune the model."""
        from transformers import TrainingArguments
        from trl import SFTTrainer
        
        training_args = TrainingArguments(
            output_dir=self.output_dir,
            per_device_train_batch_size=4,
            gradient_accumulation_steps=4,
            num_train_epochs=3,
            learning_rate=2e-4,
            fp16=True,
            save_strategy="epoch",
            evaluation_strategy="epoch",
            logging_steps=10,
            report_to="none",
            load_best_model_at_end=True,
        )
        
        trainer = SFTTrainer(
            model=self.model,
            train_dataset=self.train_dataset,
            eval_dataset=self.eval_dataset,
            tokenizer=self.tokenizer,
            args=training_args,
            dataset_text_field="text",
            max_seq_length=512,
        )
        
        print("Starting training...")
        trainer.train()
        
        # Save model
        trainer.save_model(f"{self.output_dir}/final")
        
    def evaluate(self):
        """Evaluate the fine-tuned model."""
        print("\nEvaluating model...")
        
        # Load evaluation metrics
        perplexity = evaluate.load("perplexity")
        
        # Test on evaluation set
        test_texts = self.eval_dataset["text"][:10]
        
        results = []
        for text in test_texts:
            inputs = self.tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
            
            with torch.no_grad():
                outputs = self.model(**inputs, labels=inputs["input_ids"])
                loss = outputs.loss.item()
                
            results.append({"text": text[:100], "loss": loss})
        
        avg_loss = np.mean([r["loss"] for r in results])
        print(f"Average loss: {avg_loss:.4f}")
        print(f"Perplexity: {np.exp(avg_loss):.2f}")
        
        # Save evaluation results
        with open(f"{self.output_dir}/evaluation_results.json", "w") as f:
            json.dump({
                "avg_loss": avg_loss,
                "perplexity": np.exp(avg_loss),
                "sample_results": results[:5]
            }, f, indent=2)
        
    def run(self):
        """Run complete pipeline."""
        self.prepare_data()
        self.setup_model()
        self.train()
        self.evaluate()
        print(f"\nPipeline complete! Model saved to {self.output_dir}")

# Run pipeline
if __name__ == "__main__":
    pipeline = FineTuningPipeline(
        model_name="gpt2",  # Using smaller model for demo
        dataset_name="imdb",  # or path to local JSON
        output_dir="./pipeline-output"
    )
    pipeline.run()
"""

# Save pipeline script
with open("/tmp/finetune_pipeline.py", "w") as f:
    f.write(pipeline_script)

print("✓ Complete fine-tuning pipeline created")

## 5. Model Comparison & Evaluation

Compare different fine-tuning approaches:

In [None]:
# Parameter-efficient fine-tuning comparison
methods = {
    "LoRA": {
        "trainable_params": "0.1-1%",
        "memory": "Low",
        "performance": "90-95% of full fine-tuning",
        "training_time": "Fast",
        "use_case": "Most common, general purpose"
    },
    "QLoRA": {
        "trainable_params": "0.1-1%",
        "memory": "Very Low (4-bit)",
        "performance": "85-90% of full fine-tuning",
        "training_time": "Fast",
        "use_case": "Large models on limited GPU"
    },
    "Prefix Tuning": {
        "trainable_params": "0.01-0.1%",
        "memory": "Very Low",
        "performance": "80-90% of full fine-tuning",
        "training_time": "Very Fast",
        "use_case": "Task-specific adaptations"
    },
    "Full Fine-tuning": {
        "trainable_params": "100%",
        "memory": "Very High",
        "performance": "100% (baseline)",
        "training_time": "Slow",
        "use_case": "When maximum performance needed"
    }
}

# Create comparison dataframe
comparison_df = pd.DataFrame(methods).T
print("🔍 Fine-tuning Methods Comparison\n")
print(comparison_df.to_string())

# Visualize comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Memory usage comparison
memory_scores = {"LoRA": 2, "QLoRA": 1, "Prefix Tuning": 1, "Full Fine-tuning": 5}
ax1.bar(memory_scores.keys(), memory_scores.values(), color=['green', 'darkgreen', 'darkgreen', 'red'])
ax1.set_title("Relative Memory Usage")
ax1.set_ylabel("Memory (relative)")

# Training time comparison
time_scores = {"LoRA": 2, "QLoRA": 2, "Prefix Tuning": 1, "Full Fine-tuning": 5}
ax2.bar(time_scores.keys(), time_scores.values(), color=['green', 'green', 'darkgreen', 'red'])
ax2.set_title("Relative Training Time")
ax2.set_ylabel("Time (relative)")

plt.tight_layout()
plt.show()

## 6. Deploy Fine-tuned Model

Deploy your fine-tuned model for inference:

In [None]:
# Deployment script for fine-tuned model
deploy_script = """
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from fastapi import FastAPI
from pydantic import BaseModel
import torch
import uvicorn

app = FastAPI()

# Load fine-tuned model
base_model_name = "mistralai/Mistral-7B-Instruct-v0.1"
adapter_path = "/workspace/model/mistral-lora-finetuned/final"

print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(model, adapter_path)
model = model.merge_and_unload()  # Merge LoRA weights

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

class GenerateRequest(BaseModel):
    prompt: str
    max_tokens: int = 100
    temperature: float = 0.7

@app.post("/generate")
async def generate(request: GenerateRequest):
    # Format prompt
    formatted_prompt = f"### Instruction: {request.prompt}\n### Response:"
    
    # Tokenize
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=request.max_tokens,
            temperature=request.temperature,
            do_sample=True,
            top_p=0.9
        )
    
    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response = response.split("### Response:")[-1].strip()
    
    return {"response": response}

@app.get("/health")
async def health():
    return {"status": "healthy", "model": "fine-tuned-mistral"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
"""

# Deployment configuration
deploy_config = TaskConfig(
    name="deploy-finetuned-model",
    command="""
    # Assume model artifacts are uploaded
    pip install transformers peft fastapi uvicorn accelerate
    python serve_finetuned.py
    """,
    instance_type="a100",
    ports=[8000],
    # upload_files would include model artifacts
    max_price_per_hour=10.00,
    max_run_time_hours=24
)

print("✓ Deployment configuration created")
print("\nTo deploy your fine-tuned model:")
print("1. Download model artifacts from fine-tuning task")
print("2. Upload with deployment task")
print("3. Access at http://<endpoint>:8000")

## 7. Cost Analysis

Analyze fine-tuning costs and optimization strategies:

In [None]:
# Fine-tuning cost estimates
cost_estimates = [
    {
        "model": "Llama-2-7B",
        "method": "LoRA",
        "dataset_size": "1K samples",
        "instance": "a100",
        "time_hours": 1,
        "notes": "Quick experimentation"
    },
    {
        "model": "Llama-2-7B",
        "method": "LoRA",
        "dataset_size": "10K samples",
        "instance": "a100",
        "time_hours": 3,
        "notes": "Production fine-tuning"
    },
    {
        "model": "Llama-2-13B",
        "method": "QLoRA",
        "dataset_size": "10K samples",
        "instance": "a100",
        "time_hours": 5,
        "notes": "Larger model with QLoRA"
    },
    {
        "model": "Llama-2-70B",
        "method": "LoRA",
        "dataset_size": "1K samples",
        "instance": "4xa100",
        "time_hours": 2,
        "notes": "Multi-GPU required"
    },
]

# Create cost analysis table
cost_df = pd.DataFrame(cost_estimates)

print("💰 Fine-tuning Time Estimates\n")
print(cost_df.to_string(index=False))

print("\n📊 Cost Optimization Tips:")
print("1. Use QLoRA for models > 7B parameters")
print("2. Start with small dataset samples for validation")
print("3. Use gradient checkpointing to reduce memory")
print("4. Enable model sharding for very large models")
print("5. Monitor GPU utilization and adjust batch size")
print("\n💡 Remember: Mithril uses dynamic pricing. Check current rates with 'flow instances'")

## Summary

In this notebook, you learned how to:

1. **Fine-tune with LoRA** - Quick and efficient adaptation
2. **Use QLoRA** - Memory-efficient fine-tuning for large models
3. **Prepare custom datasets** - Format data for instruction tuning
4. **Scale with multi-GPU** - Distributed fine-tuning strategies
5. **Deploy fine-tuned models** - Serve your custom models
6. **Optimize costs** - Choose the right approach for your budget

### Key Takeaways

- LoRA typically achieves 90-95% of full fine-tuning performance
- QLoRA enables fine-tuning 70B models on a single A100
- Start with small datasets to validate your approach
- Always use validation sets to prevent overfitting
- Monitor training metrics to catch issues early

### Next Steps

- Try fine-tuning on your own dataset
- Experiment with different LoRA ranks (r=8, 16, 32, 64)
- Explore advanced techniques like DPO and RLHF
- Deploy your model with the [Inference Notebook](inference.ipynb)

### Resources

- [PEFT Documentation](https://huggingface.co/docs/peft)
- [TRL Documentation](https://huggingface.co/docs/trl)
- [Flow SDK Examples](../../examples/)