# Prometheus Prompt Enhancement - LoRA Fine-Tuning

This notebook fine-tunes Mistral-7B using LoRA on the Prometheus training dataset to enhance prompts with model-specific styles (ChatGPT, Claude, Gemini).

**Dataset**: 1,000 training examples with model-specific enhancements  
**Base Model**: Mistral-7B-Instruct-v0.1  
**Method**: QLoRA (4-bit quantization + LoRA)  
**Expected Time**: ~2 hours on T4 GPU

---

## üìã Prerequisites

Before running:
1. Enable GPU: **Runtime** ‚Üí **Change runtime type** ‚Üí **T4 GPU**
2. Upload `training_dataset.jsonl` to Google Drive at: `/MyDrive/Prometheus/training_data/`
3. Run cells sequentially from top to bottom

---

## 1. Environment Setup

Install compatible packages for Google Colab's current environment.

In [None]:
import sys
import subprocess

print("üîß Prometheus Fine-Tuning Environment Setup")
print("=" * 80)

# Uninstall ALL conflicting packages
print("\nüóëÔ∏è  Step 1: Removing old packages...\n")
packages_to_remove = [
    "bitsandbytes", "transformers", "peft", "accelerate", 
    "datasets", "trl", "triton", "torch"
]

for pkg in packages_to_remove:
    subprocess.run(
        [sys.executable, "-m", "pip", "uninstall", "-y", "-q", pkg],
        capture_output=True
    )
    print(f"  ‚úì Removed {pkg}")

print("\nüì¶ Step 2: Installing working package set...\n")

# Install PyTorch with CUDA 12.1 support
print("  ‚Üí Installing PyTorch 2.5.1+cu121...")
subprocess.check_call([
    sys.executable, "-m", "pip", "install", "-q",
    "torch==2.5.1", 
    "torchvision==0.20.1", 
    "torchaudio==2.5.1",
    "--index-url", "https://download.pytorch.org/whl/cu121"
])
print("  ‚úì PyTorch 2.5.1+cu121\n")

# Install compatible ML packages
ml_packages = [
    "transformers==4.46.0",
    "peft==0.13.2",
    "bitsandbytes==0.44.1",
    "accelerate==1.1.1",
    "datasets==3.1.0",
    "trl==0.12.1",
    "scipy",
    "sentencepiece",
    "protobuf"
]

for pkg in ml_packages:
    try:
        print(f"  ‚Üí Installing {pkg}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])
        print(f"  ‚úì {pkg}")
    except subprocess.CalledProcessError as e:
        print(f"  ‚úó Failed: {pkg}")
        # Try without version constraint
        base_pkg = pkg.split("==")[0]
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", base_pkg])
            print(f"  ‚úì {base_pkg} (fallback)")
        except:
            print(f"  ‚úó Could not install {base_pkg}")

print("\n" + "=" * 80)
print("‚úÖ Installation complete!")
print("=" * 80)
print("\n‚ö†Ô∏è  CRITICAL: RESTART RUNTIME NOW!")
print("\nüìã Next Steps:")
print("  1. Runtime ‚Üí Restart runtime")
print("  2. Run Cell 2 (Mount Drive & Verify)")
print("  3. Continue from Cell 3 onwards\n")

## 2. Mount Google Drive & Verify Environment

**‚ö†Ô∏è Run this after restarting runtime**

In [None]:
import os
import sys

print("üîå Mounting Google Drive...\n")

try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)
    print("‚úÖ Google Drive mounted successfully\n")
except Exception as e:
    print(f"‚ùå Failed to mount Google Drive: {e}")
    print("   Please run: drive.mount('/content/drive', force_remount=True)")
    sys.exit(1)

# Import required packages
print("üìö Importing packages...\n")

try:
    import torch
    from transformers import (
        AutoTokenizer,
        AutoModelForCausalLM,
        BitsAndBytesConfig,
        TrainingArguments,
        Trainer
    )
    from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
    from datasets import load_dataset
    import json
    
    print("‚úÖ All packages imported successfully\n")
    
except ImportError as e:
    print(f"‚ùå Import error: {e}")
    print("\nüîÑ Solution: Restart runtime and re-run from Section 2")
    print("   (Runtime ‚Üí Restart runtime)")
    sys.exit(1)

# Verify GPU availability
print("üñ•Ô∏è  System Information:")
print(f"  Python version: {sys.version.split()[0]}")
print(f"  PyTorch version: {torch.__version__}")
print(f"  CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"  GPU: {torch.cuda.get_device_name(0)}")
    print(f"  CUDA version: {torch.version.cuda}")
    print(f"  Total VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print("\n‚úÖ GPU is ready for training\n")
else:
    print("\n‚ùå GPU not available!")
    print("   Enable GPU: Runtime ‚Üí Change runtime type ‚Üí T4 GPU ‚Üí Save")
    sys.exit(1)

## 3. Configuration

Set paths and hyperparameters.

In [None]:
# ===== PATHS (Adjust if needed) =====
DATASET_PATH = "/content/drive/MyDrive/Prometheus/training_data/training_dataset.jsonl"
OUTPUT_DIR = "/content/drive/MyDrive/Prometheus/models/prometheus-lora"
CHECKPOINT_DIR = "/content/drive/MyDrive/Prometheus/checkpoints"

# ===== MODEL CONFIGURATION =====
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

# ===== LORA CONFIGURATION =====
LORA_R = 16  # Rank (higher = more capacity but slower)
LORA_ALPHA = 32  # Scaling factor (typically 2√ó rank)
LORA_DROPOUT = 0.05
TARGET_MODULES = [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
]

# ===== TRAINING HYPERPARAMETERS =====
BATCH_SIZE = 4
GRADIENT_ACCUMULATION_STEPS = 4  # Effective batch size = 16
LEARNING_RATE = 2e-4
NUM_EPOCHS = 3
MAX_SEQ_LENGTH = 512
WARMUP_STEPS = 100

# Verify paths exist
print("‚úÖ Configuration loaded\n")
print(f"üìä Training Configuration:")
print(f"  Base Model: {MODEL_NAME}")
print(f"  LoRA Rank: {LORA_R}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Effective Batch: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"  Learning Rate: {LEARNING_RATE}")
print(f"  Epochs: {NUM_EPOCHS}")
print(f"  Max Sequence Length: {MAX_SEQ_LENGTH}")

# Check if dataset exists
if not os.path.exists(DATASET_PATH):
    print(f"\n‚ùå Dataset not found: {DATASET_PATH}")
    print("\nüì§ Please upload your training_dataset.jsonl to:")
    print(f"   {os.path.dirname(DATASET_PATH)}/")
    print("\n   In Google Drive web interface, create the folder structure if needed.")
else:
    file_size = os.path.getsize(DATASET_PATH) / (1024 * 1024)
    print(f"\n‚úÖ Dataset found: {file_size:.2f} MB")

# Create output directories
for directory in [OUTPUT_DIR, CHECKPOINT_DIR]:
    os.makedirs(directory, exist_ok=True)
    print(f"‚úÖ Directory ready: {directory}")

## 4. Load and Prepare Dataset

Load the Prometheus training dataset with error handling.

In [None]:
print("üìÇ Loading dataset...\n")

try:
    # Load dataset
    dataset = load_dataset('json', data_files=DATASET_PATH, split='train')
    print(f"‚úÖ Loaded {len(dataset)} examples\n")
    
    # Verify dataset structure
    required_fields = ['input_prompt', 'enhanced_prompt', 'target_model']
    sample = dataset[0]
    
    missing_fields = [field for field in required_fields if field not in sample]
    if missing_fields:
        raise ValueError(f"Dataset missing required fields: {missing_fields}")
    
    print("‚úÖ Dataset structure validated\n")
    print("üìã Dataset fields:", list(sample.keys()))
    
    # Split into train/validation (90/10)
    dataset = dataset.train_test_split(test_size=0.1, seed=42)
    train_dataset = dataset['train']
    eval_dataset = dataset['test']
    
    print(f"\nüìä Dataset Split:")
    print(f"  Training examples: {len(train_dataset)}")
    print(f"  Validation examples: {len(eval_dataset)}")
    
    # Show sample
    print(f"\nüìù Sample entry:")
    print(json.dumps({
        'input_prompt': train_dataset[0]['input_prompt'][:100] + '...',
        'enhanced_prompt': train_dataset[0]['enhanced_prompt'][:100] + '...',
        'target_model': train_dataset[0]['target_model']
    }, indent=2))
    
    # Count target model distribution
    from collections import Counter
    model_counts = Counter(train_dataset['target_model'])
    print(f"\nüìä Target Model Distribution:")
    for model, count in sorted(model_counts.items()):
        print(f"  {model}: {count} examples ({count/len(train_dataset)*100:.1f}%)")
    
except FileNotFoundError:
    print(f"‚ùå Dataset file not found: {DATASET_PATH}")
    print("\n   Please upload training_dataset.jsonl to Google Drive")
    sys.exit(1)
    
except Exception as e:
    print(f"‚ùå Error loading dataset: {e}")
    print(f"\n   Error type: {type(e).__name__}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 5. Format Data for Instruction Tuning

Convert dataset to Mistral's instruction format.

In [None]:
print("üîÑ Formatting dataset for instruction tuning...\n")

def format_instruction(example):
    """
    Format training example as Mistral instruction-following conversation.
    
    Handles missing fields gracefully with defaults.
    """
    try:
        target_model = example.get('target_model', 'ChatGPT')
        input_prompt = example.get('input_prompt', '')
        enhanced_prompt = example.get('enhanced_prompt', '')
        
        if not input_prompt or not enhanced_prompt:
            # Skip examples with missing prompts
            return None
        
        # Mistral instruction template
        instruction = f"""<s>[INST] You are Prometheus, an AI assistant specialized in enhancing prompts for {target_model}.

Given a user's initial prompt, enhance it following {target_model}'s best practices while preserving the original intent.

User's prompt: {input_prompt} [/INST]

{enhanced_prompt}</s>"""
        
        return {"text": instruction}
        
    except Exception as e:
        print(f"‚ö†Ô∏è  Error formatting example: {e}")
        return None

try:
    # Apply formatting
    train_dataset = train_dataset.map(format_instruction, remove_columns=train_dataset.column_names)
    eval_dataset = eval_dataset.map(format_instruction, remove_columns=eval_dataset.column_names)
    
    # Filter out None values (failed formatting)
    train_dataset = train_dataset.filter(lambda x: x['text'] is not None)
    eval_dataset = eval_dataset.filter(lambda x: x['text'] is not None)
    
    print(f"‚úÖ Dataset formatting complete")
    print(f"  Training examples: {len(train_dataset)}")
    print(f"  Validation examples: {len(eval_dataset)}")
    
    # Show formatted sample
    print(f"\nüìù Sample formatted instruction (first 600 chars):")
    print(train_dataset[0]['text'][:600] + "...\n")
    
    # Check average length
    avg_length = sum(len(ex['text']) for ex in train_dataset) / len(train_dataset)
    print(f"üìè Average instruction length: {avg_length:.0f} characters")
    
    if avg_length > MAX_SEQ_LENGTH * 4:  # Rough estimate (1 token ‚âà 4 chars)
        print(f"\n‚ö†Ô∏è  Warning: Average length may exceed MAX_SEQ_LENGTH={MAX_SEQ_LENGTH}")
        print(f"   Consider increasing MAX_SEQ_LENGTH or truncating prompts")
    
except Exception as e:
    print(f"‚ùå Error formatting dataset: {e}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 6. Load Model with 4-bit Quantization

Load Mistral-7B with QLoRA configuration. **This takes 2-3 minutes.**

In [None]:
print("ü§ñ Loading Mistral-7B model with 4-bit quantization...\n")
print("‚è≥ This will take 2-3 minutes to download (~4.5GB)...\n")

try:
    # Configure 4-bit quantization
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    
    print("‚úÖ Quantization config created\n")
    
    # Load base model
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
        torch_dtype=torch.bfloat16,
        use_cache=False  # Required for gradient checkpointing
    )
    
    print("\n‚úÖ Model loaded successfully\n")
    
    # Enable gradient checkpointing to save memory
    model.gradient_checkpointing_enable()
    print("‚úÖ Gradient checkpointing enabled\n")
    
    # Prepare for k-bit training
    model = prepare_model_for_kbit_training(model)
    print("‚úÖ Model prepared for QLoRA training\n")
    
    # Show model info
    print("üìä Model Information:")
    print(f"  Memory footprint: {model.get_memory_footprint() / 1e9:.2f} GB")
    print(f"  Device map: {model.hf_device_map}")
    
    # Check if model fits in VRAM
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / 1e9
        reserved = torch.cuda.memory_reserved(0) / 1e9
        total = torch.cuda.get_device_properties(0).total_memory / 1e9
        
        print(f"\nüíæ GPU Memory:")
        print(f"  Allocated: {allocated:.2f} GB")
        print(f"  Reserved: {reserved:.2f} GB")
        print(f"  Total: {total:.2f} GB")
        print(f"  Free: {total - reserved:.2f} GB")
        
        if reserved > total * 0.9:
            print(f"\n‚ö†Ô∏è  Warning: GPU memory usage is high ({reserved/total*100:.1f}%)")
            print(f"   Training may fail. Consider reducing BATCH_SIZE.")
    
except torch.cuda.OutOfMemoryError:
    print("\n‚ùå GPU Out of Memory!")
    print("\nüîß Solutions:")
    print("   1. Restart runtime to clear GPU memory")
    print("   2. Reduce BATCH_SIZE to 2 in Section 3")
    print("   3. Reduce MAX_SEQ_LENGTH to 384 in Section 3")
    print("   4. Use 8-bit quantization instead (load_in_8bit=True)")
    sys.exit(1)
    
except Exception as e:
    print(f"\n‚ùå Error loading model: {e}")
    print(f"   Error type: {type(e).__name__}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 7. Configure LoRA Adapters

Add LoRA adapters for parameter-efficient fine-tuning.

In [None]:
print("üîß Configuring LoRA adapters...\n")

try:
    # Create LoRA configuration
    lora_config = LoraConfig(
        r=LORA_R,
        lora_alpha=LORA_ALPHA,
        target_modules=TARGET_MODULES,
        lora_dropout=LORA_DROPOUT,
        bias="none",
        task_type="CAUSAL_LM"
    )
    
    print("‚úÖ LoRA config created\n")
    
    # Add LoRA adapters to model
    model = get_peft_model(model, lora_config)
    
    print("‚úÖ LoRA adapters added to model\n")
    
    # Print trainable parameters
    print("üìä Model Parameters:")
    model.print_trainable_parameters()
    
    # Calculate efficiency
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    all_params = sum(p.numel() for p in model.parameters())
    trainable_pct = 100 * trainable_params / all_params
    
    print(f"\nüí° Training Efficiency:")
    print(f"  Trainable: {trainable_params:,} parameters ({trainable_pct:.2f}%)")
    print(f"  Frozen: {all_params - trainable_params:,} parameters ({100-trainable_pct:.2f}%)")
    print(f"\n  This means only {trainable_pct:.2f}% of parameters will be updated during training!")
    
except Exception as e:
    print(f"‚ùå Error configuring LoRA: {e}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 8. Load Tokenizer and Tokenize Dataset

Prepare data for training.

In [None]:
print("üî§ Loading tokenizer and tokenizing dataset...\n")

try:
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    
    print("‚úÖ Tokenizer loaded\n")
    
    # Tokenization function
    def tokenize_function(examples):
        """Tokenize examples with error handling."""
        try:
            outputs = tokenizer(
                examples["text"],
                truncation=True,
                max_length=MAX_SEQ_LENGTH,
                padding="max_length",
                return_tensors=None  # Don't convert to tensors yet
            )
            outputs["labels"] = outputs["input_ids"].copy()
            return outputs
        except Exception as e:
            print(f"‚ö†Ô∏è  Tokenization error: {e}")
            return {"input_ids": [], "attention_mask": [], "labels": []}
    
    # Tokenize datasets
    print("‚è≥ Tokenizing training set...")
    tokenized_train = train_dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=train_dataset.column_names,
        desc="Tokenizing train"
    )
    
    print("‚è≥ Tokenizing validation set...")
    tokenized_eval = eval_dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=eval_dataset.column_names,
        desc="Tokenizing eval"
    )
    
    print("\n‚úÖ Tokenization complete!\n")
    
    print("üìä Tokenized Dataset:")
    print(f"  Training examples: {len(tokenized_train)}")
    print(f"  Validation examples: {len(tokenized_eval)}")
    
    # Calculate token statistics
    sample_lengths = [len([t for t in ex['input_ids'] if t != tokenizer.pad_token_id]) 
                     for ex in tokenized_train.select(range(min(100, len(tokenized_train))))]
    avg_tokens = sum(sample_lengths) / len(sample_lengths)
    max_tokens = max(sample_lengths)
    
    print(f"\nüìè Token Statistics (sample of {len(sample_lengths)}):")
    print(f"  Average tokens: {avg_tokens:.0f}")
    print(f"  Max tokens: {max_tokens}")
    print(f"  Max allowed: {MAX_SEQ_LENGTH}")
    
    if max_tokens >= MAX_SEQ_LENGTH:
        print(f"\n‚ö†Ô∏è  Some examples were truncated to {MAX_SEQ_LENGTH} tokens")
    
except Exception as e:
    print(f"‚ùå Error during tokenization: {e}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 9. Configure Training Arguments

Set up training configuration.

In [None]:
print("‚öôÔ∏è  Configuring training arguments...\n")

try:
    training_args = TrainingArguments(
        output_dir=CHECKPOINT_DIR,
        num_train_epochs=NUM_EPOCHS,
        per_device_train_batch_size=BATCH_SIZE,
        per_device_eval_batch_size=BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        learning_rate=LEARNING_RATE,
        warmup_steps=WARMUP_STEPS,
        logging_steps=10,
        eval_steps=50,
        save_steps=100,
        evaluation_strategy="steps",
        save_strategy="steps",
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        fp16=True,
        optim="paged_adamw_8bit",
        lr_scheduler_type="cosine",
        report_to="none",
        save_total_limit=3,
        push_to_hub=False,
        dataloader_num_workers=2,
        remove_unused_columns=False,
        label_names=["labels"]
    )
    
    print("‚úÖ Training arguments configured\n")
    
    # Calculate training steps
    steps_per_epoch = len(tokenized_train) // (BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS)
    total_steps = steps_per_epoch * NUM_EPOCHS
    
    print("üìä Training Plan:")
    print(f"  Total epochs: {NUM_EPOCHS}")
    print(f"  Steps per epoch: {steps_per_epoch}")
    print(f"  Total training steps: {total_steps}")
    print(f"  Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
    print(f"  Learning rate: {LEARNING_RATE}")
    print(f"  Warmup steps: {WARMUP_STEPS}")
    
    # Estimate time
    # Rough estimate: 3-5 seconds per step on T4
    estimated_seconds = total_steps * 4  # Conservative estimate
    estimated_minutes = estimated_seconds / 60
    
    print(f"\n‚è±Ô∏è  Estimated training time: {estimated_minutes:.0f} minutes ({estimated_minutes/60:.1f} hours)")
    print(f"   (Based on ~4 seconds per step on T4 GPU)")
    
except Exception as e:
    print(f"‚ùå Error configuring training: {e}")
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 10. Train the Model

**‚è∞ This will take 1-2 hours on T4 GPU**

‚ö†Ô∏è **Do not close this tab or Colab will stop training!**

In [None]:
import time
from datetime import datetime

print("üöÄ Starting training...\n")
print(f"‚è∞ Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"\n‚ö†Ô∏è  IMPORTANT: Keep this tab open during training!\n")
print("=" * 80)

try:
    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval
    )
    
    print("‚úÖ Trainer initialized\n")
    
    # Start training
    start_time = time.time()
    
    train_result = trainer.train()
    
    end_time = time.time()
    training_time = end_time - start_time
    
    print("\n" + "=" * 80)
    print("\n‚úÖ Training complete!\n")
    print(f"‚è∞ Finished at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"‚è±Ô∏è  Total training time: {training_time/60:.1f} minutes ({training_time/3600:.2f} hours)")
    
    # Show training metrics
    print(f"\nüìä Final Training Metrics:")
    print(f"  Final loss: {train_result.training_loss:.4f}")
    print(f"  Total steps: {train_result.global_step}")
    
except KeyboardInterrupt:
    print("\n\n‚ö†Ô∏è  Training interrupted by user")
    print("   Checkpoints have been saved to:", CHECKPOINT_DIR)
    print("   You can resume training from the last checkpoint")
    
except torch.cuda.OutOfMemoryError:
    print("\n‚ùå GPU Out of Memory during training!")
    print("\nüîß Solutions:")
    print("   1. Restart runtime")
    print("   2. Reduce BATCH_SIZE to 2 in Section 3")
    print("   3. Increase GRADIENT_ACCUMULATION_STEPS to 8 (keeps effective batch size)")
    print("   4. Reduce MAX_SEQ_LENGTH to 384")
    sys.exit(1)
    
except Exception as e:
    print(f"\n‚ùå Training error: {e}")
    print(f"   Error type: {type(e).__name__}")
    import traceback
    traceback.print_exc()
    
    print(f"\nüíæ Checkpoints saved to: {CHECKPOINT_DIR}")
    print("   You may be able to resume from the last checkpoint")
    sys.exit(1)

## 11. Save the Model

Save LoRA adapters to Google Drive.

In [None]:
print("üíæ Saving model...\n")

try:
    # Save LoRA adapters
    model.save_pretrained(OUTPUT_DIR)
    print(f"‚úÖ Model saved to: {OUTPUT_DIR}\n")
    
    # Save tokenizer
    tokenizer.save_pretrained(OUTPUT_DIR)
    print(f"‚úÖ Tokenizer saved\n")
    
    # List saved files
    saved_files = os.listdir(OUTPUT_DIR)
    print("üìÅ Saved files:")
    for file in sorted(saved_files):
        file_path = os.path.join(OUTPUT_DIR, file)
        if os.path.isfile(file_path):
            size = os.path.getsize(file_path) / (1024 * 1024)
            print(f"  {file} ({size:.2f} MB)")
    
    total_size = sum(os.path.getsize(os.path.join(OUTPUT_DIR, f)) 
                    for f in saved_files if os.path.isfile(os.path.join(OUTPUT_DIR, f)))
    print(f"\nüìä Total model size: {total_size / (1024 * 1024):.2f} MB")
    
    print(f"\n‚úÖ Model ready for download from Google Drive!")
    print(f"   Location: {OUTPUT_DIR}")
    
except Exception as e:
    print(f"‚ùå Error saving model: {e}")
    import traceback
    traceback.print_exc()

## 12. Test the Fine-tuned Model

Generate enhanced prompts for all three target models.

In [None]:
print("üß™ Testing fine-tuned model...\n")
print("=" * 80)

def test_enhancement(raw_prompt, target_model="ChatGPT"):
    """
    Test prompt enhancement with error handling.
    """
    try:
        test_instruction = f"""<s>[INST] You are Prometheus, an AI assistant specialized in enhancing prompts for {target_model}.

Given a user's initial prompt, enhance it following {target_model}'s best practices while preserving the original intent.

User's prompt: {raw_prompt} [/INST]

"""
        
        inputs = tokenizer(test_instruction, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=256,
                temperature=0.7,
                top_p=0.9,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
        
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        enhanced = response.split("[/INST]")[-1].strip()
        
        print(f"\nüéØ Target Model: {target_model}")
        print(f"üìù Raw Prompt: {raw_prompt}")
        print(f"‚ú® Enhanced Prompt:\n{enhanced}")
        print("\n" + "-" * 80)
        
        return enhanced
        
    except Exception as e:
        print(f"‚ùå Error testing {target_model}: {e}")
        return None

# Test examples for each model
test_cases = [
    ("Write a Python function to sort a list", "ChatGPT"),
    ("Analyze customer feedback data", "Claude"),
    ("Create a meeting agenda", "Gemini")
]

print("\nüî¨ Running test cases...\n")

for raw_prompt, target_model in test_cases:
    test_enhancement(raw_prompt, target_model)

print("\n‚úÖ Testing complete!\n")
print("=" * 80)

## 13. Evaluation Metrics

Analyze training and validation loss.

In [None]:
print("üìä Analyzing training metrics...\n")

try:
    # Get training history
    metrics = trainer.state.log_history
    
    # Extract losses
    train_losses = [m['loss'] for m in metrics if 'loss' in m]
    eval_losses = [m['eval_loss'] for m in metrics if 'eval_loss' in m]
    
    if not train_losses:
        print("‚ö†Ô∏è  No training metrics found")
    else:
        print("‚úÖ Training Metrics Summary:\n")
        print(f"  Initial training loss: {train_losses[0]:.4f}")
        print(f"  Final training loss: {train_losses[-1]:.4f}")
        print(f"  Loss reduction: {train_losses[0] - train_losses[-1]:.4f}")
        print(f"  Improvement: {(1 - train_losses[-1]/train_losses[0])*100:.1f}%")
        
        if eval_losses:
            print(f"\n  Initial validation loss: {eval_losses[0]:.4f}")
            print(f"  Final validation loss: {eval_losses[-1]:.4f}")
            print(f"  Best validation loss: {min(eval_losses):.4f}")
            
            # Check for overfitting
            gap = eval_losses[-1] - train_losses[-1]
            print(f"\n  Train-Val gap: {gap:.4f}")
            if gap > 0.5:
                print(f"  ‚ö†Ô∏è  Large gap suggests possible overfitting")
            else:
                print(f"  ‚úÖ Train-Val gap is acceptable")
        
        print(f"\n  Total training steps: {len(train_losses)}")
    
    # Plot losses if matplotlib available
    try:
        import matplotlib.pyplot as plt
        
        if train_losses:
            plt.figure(figsize=(12, 5))
            
            plt.subplot(1, 2, 1)
            plt.plot(train_losses, label='Training Loss', alpha=0.7, linewidth=2)
            plt.xlabel('Steps')
            plt.ylabel('Loss')
            plt.title('Training Loss Over Time')
            plt.legend()
            plt.grid(alpha=0.3)
            
            if eval_losses:
                plt.subplot(1, 2, 2)
                eval_steps = [i * (len(train_losses) // len(eval_losses)) for i in range(len(eval_losses))]
                plt.plot(train_losses, label='Training Loss', alpha=0.5, linewidth=1)
                plt.plot(eval_steps, eval_losses, label='Validation Loss', 
                        marker='o', linewidth=2, markersize=6)
                plt.xlabel('Steps')
                plt.ylabel('Loss')
                plt.title('Training vs Validation Loss')
                plt.legend()
                plt.grid(alpha=0.3)
            
            plt.tight_layout()
            plt.show()
            
            print("\n‚úÖ Loss curves plotted above")
    
    except ImportError:
        print("\nüí° Install matplotlib to visualize loss curves:")
        print("   !pip install matplotlib")
    
except Exception as e:
    print(f"‚ùå Error analyzing metrics: {e}")
    import traceback
    traceback.print_exc()

## 14. Next Steps - Integration Guide

### üì• Download Model from Google Drive

1. **Navigate to Google Drive**
   - Open https://drive.google.com
   - Go to `MyDrive/Prometheus/models/prometheus-lora/`

2. **Download all files** (~80-150 MB total)
   - `adapter_model.bin`
   - `adapter_config.json`
   - `tokenizer.json`
   - `tokenizer_config.json`
   - `special_tokens_map.json`

3. **Move to local project**
   ```bash
   cd /run/media/kabe/Kabe_s\ Personal/Projects/Prometheus
   mkdir -p models/prometheus-lora
   # Move downloaded files to models/prometheus-lora/
   ```

---

### üîß Backend Integration

**1. Create model inference module:**

```bash
mkdir -p backend/app/model
touch backend/app/model/__init__.py
touch backend/app/model/inference.py
```

**2. Add dependencies to `backend/requirements.txt`:**

```
transformers>=4.41.0
peft>=0.11.0
bitsandbytes>=0.43.0
accelerate>=0.30.0
torch>=2.0.0
```

**3. Update `backend/app/main.py` to load model on startup**

**4. Implement `/augment` endpoint with model inference**

---

### üöÄ Deployment Options

**Option A: Local GPU (Development)**
- Requires NVIDIA GPU with 8GB+ VRAM
- Fast iteration, free

**Option B: Cloud GPU (Production)**
- AWS SageMaker, Google Cloud AI Platform, Azure ML
- Auto-scaling, managed infrastructure
- Cost: ~$0.50-2.00 per hour (depending on GPU)

**Option C: CPU-only (Low traffic)**
- Slower inference (~10-20 seconds per request)
- No special hardware needed
- Fine for prototyping

---

### ‚úÖ Success Criteria

Your fine-tuned model is ready when:
- ‚úÖ Training loss decreased below 1.5
- ‚úÖ Validation loss decreased and stayed close to training loss
- ‚úÖ Test outputs show model-specific styles (ChatGPT: conversational, Claude: XML tags, Gemini: concise)
- ‚úÖ Enhanced prompts are significantly more detailed than inputs
- ‚úÖ Model files saved successfully to Google Drive

---

### üéØ Performance Tips

**For faster inference:**
1. Merge LoRA weights into base model (optional)
2. Use `torch.compile()` for 2x speedup (PyTorch 2.0+)
3. Implement batching for multiple requests
4. Cache tokenizer and model on server startup

**For better quality:**
1. Generate with temperature=0.7-0.9 for variety
2. Use top_p=0.9 for nucleus sampling
3. Set max_new_tokens=256-512 depending on needed length
4. Generate multiple variations and let user choose

---

## üéâ Congratulations!

You've successfully fine-tuned Prometheus! The model can now:
- Enhance prompts with ChatGPT's conversational style
- Structure prompts with Claude's XML formatting
- Create concise, actionable prompts for Gemini

**Next:** Integrate into backend and test with real user queries!

---

# Prometheus Prompt Enhancement - LoRA Fine-tuning

This notebook fine-tunes a language model using LoRA (Low-Rank Adaptation) on the Prometheus training dataset to enhance prompts with model-specific styles (ChatGPT, Claude, Gemini).

**Dataset**: 1,000 training examples with model-specific enhancements  
**Base Model**: Mistral-7B-Instruct-v0.1  
**Method**: QLoRA (4-bit quantization + LoRA)  
**Task**: Text-to-text prompt enhancement

---

## 1. Environment Setup

Install required libraries for QLoRA fine-tuning.

In [None]:
!pip install -q transformers==4.36.0 peft==0.7.1 bitsandbytes==0.41.3 accelerate==0.25.0 datasets==2.15.0 trl==0.7.4

## 2. Mount Google Drive & Import Libraries

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
import json
import os

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

## 3. Configuration

Set paths and hyperparameters.

In [None]:
# Paths (adjust these to your Google Drive structure)
DATASET_PATH = "/content/drive/MyDrive/Prometheus/training_data/training_dataset.jsonl"
OUTPUT_DIR = "/content/drive/MyDrive/Prometheus/models/prometheus-lora"
CHECKPOINT_DIR = "/content/drive/MyDrive/Prometheus/checkpoints"

# Model configuration
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

# LoRA configuration
LORA_R = 16  # Rank
LORA_ALPHA = 32  # Scaling factor
LORA_DROPOUT = 0.05
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

# Training hyperparameters
BATCH_SIZE = 4
GRADIENT_ACCUMULATION_STEPS = 4  # Effective batch size = 16
LEARNING_RATE = 2e-4
NUM_EPOCHS = 3
MAX_SEQ_LENGTH = 512
WARMUP_STEPS = 100

print("Configuration loaded successfully!")

## 4. Load and Prepare Dataset

Load the Prometheus training dataset and format it for instruction fine-tuning.

In [None]:
# Load dataset
dataset = load_dataset('json', data_files=DATASET_PATH, split='train')

# Split into train/validation (90/10)
dataset = dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = dataset['train']
eval_dataset = dataset['test']

print(f"Training examples: {len(train_dataset)}")
print(f"Validation examples: {len(eval_dataset)}")
print(f"\nSample entry:")
print(json.dumps(train_dataset[0], indent=2))

## 5. Format Data for Instruction Tuning

Create instruction-following format: system prompt + user input ‚Üí model output.

In [None]:
def format_instruction(example):
    """Format training example as instruction-following conversation."""
    target_model = example['target_model']
    raw_prompt = example['raw_prompt']
    enhanced_prompt = example['enhanced_prompt']
    
    # Instruction template
    instruction = f"""<s>[INST] You are Prometheus, an AI assistant specialized in enhancing prompts for {target_model}.

Given a user's initial prompt, enhance it following {target_model}'s best practices while preserving the original intent.

User's prompt: {raw_prompt} [/INST]

{enhanced_prompt}</s>"""
    
    return {"text": instruction}

# Apply formatting
train_dataset = train_dataset.map(format_instruction)
eval_dataset = eval_dataset.map(format_instruction)

print("Sample formatted instruction:")
print(train_dataset[0]['text'][:500] + "...")

## 6. Load Model with 8-bit Quantization

Load Mistral-7B with 8-bit quantization for stable, memory-efficient training. 8-bit quantization is more stable than 4-bit and requires ~7GB VRAM.

In [None]:
print("ü§ñ Loading Mistral-7B model with 8-bit quantization...")
print("   (8-bit is more stable than 4-bit)\n")
print("‚è≥ This will take 2-3 minutes to download (~4.5GB)...\n")

try:
    # Configure 8-bit quantization (more stable than 4-bit)
    bnb_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False
    )
    
    print("‚úÖ 8-bit quantization config created\n")
    
    # Load base model
    print("üì• Downloading model from HuggingFace...\n")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
        use_cache=False  # Required for gradient checkpointing
    )
    
    print("\n‚úÖ Model loaded successfully!\n")
    
    # Verify model is loaded correctly
    print(f"üìã Model type: {type(model).__name__}")
    assert not isinstance(model, str), "ERROR: Model is a string, not a model object!"
    
    # Enable gradient checkpointing to save memory
    model.gradient_checkpointing_enable()
    print("‚úÖ Gradient checkpointing enabled\n")
    
    # Prepare for k-bit training
    model = prepare_model_for_kbit_training(model)
    print("‚úÖ Model prepared for LoRA training\n")
    
    # Show model info
    print("üìä Model Information:")
    print(f"  Memory footprint: {model.get_memory_footprint() / 1e9:.2f} GB")
    print(f"  Device map: {model.hf_device_map}")
    
    # Count parameters
    total_params = sum(p.numel() for p in model.parameters())
    print(f"  Total parameters: {total_params:,}")
    
    # Check GPU memory
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / 1e9
        reserved = torch.cuda.memory_reserved(0) / 1e9
        total = torch.cuda.get_device_properties(0).total_memory / 1e9
        
        print(f"\nüíæ GPU Memory:")
        print(f"  Allocated: {allocated:.2f} GB")
        print(f"  Reserved: {reserved:.2f} GB")
        print(f"  Total: {total:.2f} GB")
        print(f"  Free: {total - reserved:.2f} GB")
        
        if reserved > total * 0.85:
            print(f"\n‚ö†Ô∏è  High memory usage ({reserved/total*100:.1f}%)")
            print(f"   Consider reducing BATCH_SIZE to 2")
        else:
            print(f"\n‚úÖ Sufficient memory available for training")
    
    print("\n" + "=" * 80)
    print("‚úÖ MODEL READY - Proceed to Cell 7 (Configure LoRA)")
    print("=" * 80)
    
except torch.cuda.OutOfMemoryError:
    print("\n‚ùå GPU Out of Memory!")
    print("\nüîß Solutions:")
    print("   1. Restart runtime to clear GPU memory")
    print("   2. Reduce BATCH_SIZE to 2 in Cell 3")
    print("   3. Reduce MAX_SEQ_LENGTH to 384 in Cell 3")
    sys.exit(1)
    
except Exception as e:
    print(f"\n‚ùå Error loading model: {e}")
    print(f"   Error type: {type(e).__name__}")
    
    # Show debug info
    print(f"\nüîç Debug info:")
    try:
        print(f"   model variable type: {type(model)}")
        if isinstance(model, str):
            print(f"   ERROR: model is still a string: {model}")
            print(f"   This means the model didn't load properly")
    except:
        print(f"   model variable not defined")
    
    import traceback
    traceback.print_exc()
    sys.exit(1)

## 7. Configure LoRA Adapters

Add LoRA adapters to the model for parameter-efficient fine-tuning.

In [None]:
# LoRA configuration
lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM"
)

# Add LoRA adapters
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

## 8. Load Tokenizer and Tokenize Dataset

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Tokenization function
def tokenize_function(examples):
    outputs = tokenizer(
        examples["text"],
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        padding="max_length"
    )
    outputs["labels"] = outputs["input_ids"].copy()
    return outputs

# Tokenize datasets
tokenized_train = train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=train_dataset.column_names
)

tokenized_eval = eval_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=eval_dataset.column_names
)

print("Tokenization complete!")
print(f"Training tokens: {len(tokenized_train)}")
print(f"Validation tokens: {len(tokenized_eval)}")

## 9. Configure Training Arguments

In [None]:
training_args = TrainingArguments(
    output_dir=CHECKPOINT_DIR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    learning_rate=LEARNING_RATE,
    warmup_steps=WARMUP_STEPS,
    logging_steps=10,
    eval_steps=50,
    save_steps=100,
    evaluation_strategy="steps",
    save_strategy="steps",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    fp16=True,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    report_to="none",
    save_total_limit=3,
    push_to_hub=False
)

print("Training configuration:")
print(f"  Total epochs: {NUM_EPOCHS}")
print(f"  Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"  Learning rate: {LEARNING_RATE}")
print(f"  Total training steps: ~{len(tokenized_train) // (BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS) * NUM_EPOCHS}")

## 10. Train the Model

Start LoRA fine-tuning. This will take approximately 1-2 hours on a T4 GPU.

In [None]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval
)

# Start training
print("Starting training...")
trainer.train()

print("\n‚úÖ Training complete!")

## 11. Save the Model

In [None]:
# Save LoRA adapters
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

print(f"‚úÖ Model saved to: {OUTPUT_DIR}")
print("\nSaved files:")
print(os.listdir(OUTPUT_DIR))

## 12. Test the Fine-tuned Model

Generate enhanced prompts using the fine-tuned model.

In [None]:
def test_enhancement(raw_prompt, target_model="ChatGPT"):
    """Test prompt enhancement with the fine-tuned model."""
    test_instruction = f"""<s>[INST] You are Prometheus, an AI assistant specialized in enhancing prompts for {target_model}.

Given a user's initial prompt, enhance it following {target_model}'s best practices while preserving the original intent.

User's prompt: {raw_prompt} [/INST]

"""
    
    inputs = tokenizer(test_instruction, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    enhanced = response.split("[/INST]")[-1].strip()
    
    print(f"Target Model: {target_model}")
    print(f"Raw Prompt: {raw_prompt}")
    print(f"Enhanced Prompt:\n{enhanced}")
    print("-" * 80)
    
    return enhanced

# Test examples
test_enhancement("Write a Python function to sort a list", "ChatGPT")
test_enhancement("Analyze customer feedback data", "Claude")
test_enhancement("Create a meeting agenda", "Gemini")

## 13. Evaluation Metrics

Check training and validation loss.

In [None]:
# Get training metrics
metrics = trainer.state.log_history

# Extract losses
train_losses = [m['loss'] for m in metrics if 'loss' in m]
eval_losses = [m['eval_loss'] for m in metrics if 'eval_loss' in m]

print("Training Metrics Summary:")
print(f"  Final training loss: {train_losses[-1]:.4f}")
print(f"  Final validation loss: {eval_losses[-1]:.4f}")
print(f"  Best validation loss: {min(eval_losses):.4f}")
print(f"  Total training steps: {len(train_losses)}")

# Optional: Plot losses if matplotlib is available
try:
    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label='Training Loss', alpha=0.7)
    plt.plot(range(0, len(train_losses), len(train_losses)//len(eval_losses)), 
             eval_losses, label='Validation Loss', marker='o')
    plt.xlabel('Steps')
    plt.ylabel('Loss')
    plt.title('Prometheus LoRA Fine-tuning - Loss Curves')
    plt.legend()
    plt.grid(alpha=0.3)
    plt.show()
except ImportError:
    print("\nInstall matplotlib to visualize loss curves: !pip install matplotlib")

## 14. Next Steps

**To use this model in the Prometheus backend:**

1. **Export the LoRA adapters** from Google Drive to your local machine
2. **Update `backend/app/main.py`** to load the fine-tuned model:
   ```python
   from peft import PeftModel, PeftConfig
   
   config = PeftConfig.from_pretrained("path/to/prometheus-lora")
   base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
   model = PeftModel.from_pretrained(base_model, "path/to/prometheus-lora")
   ```
3. **Implement the `/augment` endpoint** to use this model for generation
4. **Optional:** Merge LoRA weights into base model for faster inference:
   ```python
   merged_model = model.merge_and_unload()
   merged_model.save_pretrained("prometheus-merged")
   ```

**Performance tips:**
- For production, consider hosting on GPU-enabled cloud (AWS/GCP/Azure)
- Use `torch.compile()` for faster inference (PyTorch 2.0+)
- Implement batching for multiple enhancement requests
- Monitor response quality and iterate on training data if needed

---

‚úÖ **Fine-tuning complete! Model ready for integration into Prometheus backend.**