# prompt optimization

How to use Gemini 2.5 pro as the prompt optimization agent to finetune the prompt to be used in Gemma3:1b and Gemma3:4b to turn citation string into xml, also use Gemma3:1b and Gemma3:4b from local Ollama service for verification?

I'll help you create a prompt optimization system using Gemini 2.5 Pro to fine-tune prompts for your local Gemma models to convert citation strings to XML. Let's create a comprehensive notebook that leverages both the cloud Gemini API and your local Ollama service.

## Step-by-step Solution:

1. **Set up Gemini 2.5 Pro API connection** for prompt optimization
2. **Create prompt optimization agent** using Gemini 2.5 Pro
3. **Implement local Ollama verification** using Gemma3:1b and Gemma3:4b
4. **Build iterative optimization loop** to improve prompt performance
5. **Add evaluation metrics** for citation XML accuracy

Create a comprehensive prompt optimization system using Gemini 2.5 Pro for fine-tuning and local Gemma models for verification.



In [None]:
# Install required dependencies
import subprocess
import sys
import os
import json
import requests
import time
from datetime import datetime
import re
from typing import List, Dict, Tuple, Optional
import statistics

# Install required packages
def install_packages():
    packages = ['google-generativeai', 'requests', 'pandas', 'matplotlib', 'seaborn']
    for package in packages:
        try:
            __import__(package)
        except ImportError:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install_packages()

import google.generativeai as genai
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configuration
GEMINI_API_KEY = "your-gemini-api-key-here"  # Replace with your actual API key
OLLAMA_BASE_URL = "http://localhost:11434"
OLLAMA_API_KEY = "sk-your-api-key-here"  # From your local setup

# Load reference data
with open('ref_markup_optV2.json', 'r', encoding='utf-8') as f:
    reference_data = json.load(f)[0]

class GeminiOptimizer:
    """Gemini 2.5 Pro prompt optimization agent"""
    
    def __init__(self, api_key: str):
        genai.configure(api_key=api_key)
        self.model = genai.GenerativeModel('gemini-2.0-flash-exp')
        # self.model = genai.GenerativeModel('gemini-2.0-flash-thinking-exp-1219')
        
    def analyze_prompt_performance(self, prompt: str, test_results: List[Dict]) -> str:
        """Analyze prompt performance and suggest improvements"""
        
        # Calculate success rate
        total_tests = len(test_results)
        successful_tests = sum(1 for result in test_results if result['success'])
        success_rate = successful_tests / total_tests if total_tests > 0 else 0
        
        # Collect error patterns
        errors = [result['error'] for result in test_results if not result['success']]
        
        analysis_prompt = f"""
You are an expert prompt engineer specializing in XML markup tasks for Buddhist text citations. 

CURRENT PROMPT PERFORMANCE:
- Success Rate: {success_rate:.2%} ({successful_tests}/{total_tests})
- Failed Cases: {len(errors)}

CURRENT PROMPT:
{prompt}

FAILED TEST CASES AND ERRORS:
"""
        
        for i, result in enumerate(test_results):
            if not result['success']:
                analysis_prompt += f"""
Case {i+1}:
Input: {result['input']}
Expected: {result['expected']}
Actual: {result['actual']}
Error: {result['error']}
---
"""
        
        analysis_prompt += """

TASK:
Analyze the current prompt and failed cases, then provide:

1. **Error Pattern Analysis**: What are the common patterns in failures?
2. **Prompt Issues**: What specific issues in the current prompt cause these failures?
3. **Improvement Strategy**: How should the prompt be modified to address these issues?
4. **Optimized Prompt**: Provide a complete rewritten prompt that addresses the identified issues.

Focus on:
- Clarity of XML tagging rules
- Handling of OCR corrections
- Proper handling of citation ranges
- Consistent application of tagging rules
- Edge case handling

Provide your analysis and optimized prompt in a structured format.
"""
        
        try:
            response = self.model.generate_content(analysis_prompt)
            return response.text
        except Exception as e:
            return f"Error in Gemini analysis: {e}"
    
    def generate_optimized_prompt(self, base_prompt: str, error_analysis: str) -> str:
        """Generate an optimized prompt based on error analysis"""
        
        optimization_prompt = f"""
Based on the error analysis, create an optimized version of this prompt for XML citation markup:

ORIGINAL PROMPT:
{base_prompt}

ERROR ANALYSIS:
{error_analysis}

REQUIREMENTS:
1. Maintain the core structure and XML tags (<ref>, <canon>, <v>, <w>, <p>, <c>, <l>)
2. Improve clarity on OCR correction rules
3. Add specific guidance for handling citation ranges
4. Include better examples for edge cases
5. Ensure consistent tagging behavior

Generate an improved prompt that addresses the identified issues while maintaining compatibility with the existing XML schema.
"""
        
        try:
            response = self.model.generate_content(optimization_prompt)
            return response.text
        except Exception as e:
            return f"Error generating optimized prompt: {e}"

class OllamaVerifier:
    """Local Ollama model verification system"""
    
    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.api_key = api_key
        self.headers = {'Authorization': f'Bearer {api_key}'}
    
    def test_model_with_prompt(self, model: str, prompt: str, test_cases: List[Dict]) -> List[Dict]:
        """Test a model with given prompt on test cases"""
        results = []
        
        for test_case in test_cases:
            input_text = test_case['prompt']
            expected_output = test_case['completion']
            
            # Construct full prompt
            full_prompt = f"{prompt}\n\nInput: {input_text}\nOutput:"
            
            try:
                # Call Ollama API
                response = requests.post(
                    f"{self.base_url}/ai/chat/completions",
                    headers=self.headers,
                    json={
                        "model": model,
                        "messages": [
                            {"role": "user", "content": full_prompt}
                        ],
                        "max_tokens": 500,
                        "temperature": 0.1
                    },
                    timeout=120
                )
                
                if response.status_code == 200:
                    result_data = response.json()
                    actual_output = result_data['choices'][0]['message']['content'].strip()
                    
                    # Evaluate result
                    success = self.evaluate_xml_output(expected_output, actual_output)
                    error = None if success else self.get_error_description(expected_output, actual_output)
                    
                    results.append({
                        'input': input_text,
                        'expected': expected_output,
                        'actual': actual_output,
                        'success': success,
                        'error': error,
                        'model': model
                    })
                else:
                    results.append({
                        'input': input_text,
                        'expected': expected_output,
                        'actual': f"API Error: {response.status_code}",
                        'success': False,
                        'error': f"HTTP {response.status_code}: {response.text}",
                        'model': model
                    })
                    
            except Exception as e:
                results.append({
                    'input': input_text,
                    'expected': expected_output,
                    'actual': f"Exception: {str(e)}",
                    'success': False,
                    'error': str(e),
                    'model': model
                })
                
            # Small delay to avoid overwhelming the API
            time.sleep(1)
        
        return results
    
    def evaluate_xml_output(self, expected: str, actual: str) -> bool:
        """Evaluate if XML output is correct"""
        # Normalize whitespace
        expected_clean = re.sub(r'\s+', ' ', expected.strip())
        actual_clean = re.sub(r'\s+', ' ', actual.strip())
        
        # Simple exact match for now
        # Could be enhanced with XML parsing for more sophisticated comparison
        return expected_clean == actual_clean
    
    def get_error_description(self, expected: str, actual: str) -> str:
        """Generate error description for failed cases"""
        if not actual.startswith('<ref>'):
            return "Missing <ref> wrapper"
        elif not actual.endswith('</ref>'):
            return "Missing </ref> closing tag"
        elif '<canon>' not in actual:
            return "Missing <canon> tag"
        else:
            return "XML structure or content mismatch"

class PromptOptimizationSystem:
    """Main prompt optimization system"""
    
    def __init__(self, gemini_api_key: str, ollama_base_url: str, ollama_api_key: str):
        self.gemini_optimizer = GeminiOptimizer(gemini_api_key)
        self.ollama_verifier = OllamaVerifier(ollama_base_url, ollama_api_key)
        self.test_cases = reference_data['examples']
        self.optimization_history = []
    
    def run_optimization_cycle(self, current_prompt: str, target_models: List[str], max_iterations: int = 3) -> Dict:
        """Run complete optimization cycle"""
        
        print(f"🚀 Starting prompt optimization cycle")
        print(f"📊 Target models: {', '.join(target_models)}")
        print(f"🔄 Max iterations: {max_iterations}")
        print(f"📝 Test cases: {len(self.test_cases)}")
        
        best_prompt = current_prompt
        best_performance = 0
        
        for iteration in range(max_iterations):
            print(f"\n{'='*60}")
            print(f"🔄 Iteration {iteration + 1}/{max_iterations}")
            print(f"{'='*60}")
            
            # Test current prompt on all models
            iteration_results = {}
            overall_results = []
            
            for model in target_models:
                print(f"\n🧪 Testing {model}...")
                model_results = self.ollama_verifier.test_model_with_prompt(
                    model, best_prompt, self.test_cases
                )
                iteration_results[model] = model_results
                overall_results.extend(model_results)
                
                # Calculate model-specific metrics
                success_rate = sum(1 for r in model_results if r['success']) / len(model_results)
                print(f"   Success rate: {success_rate:.2%}")
            
            # Calculate overall performance
            overall_success_rate = sum(1 for r in overall_results if r['success']) / len(overall_results)
            print(f"\n📊 Overall success rate: {overall_success_rate:.2%}")
            
            # Store results
            self.optimization_history.append({
                'iteration': iteration + 1,
                'prompt': best_prompt,
                'results': iteration_results,
                'overall_success_rate': overall_success_rate,
                'timestamp': datetime.now().isoformat()
            })
            
            # If performance is good enough or last iteration, break
            if overall_success_rate >= 0.9 or iteration == max_iterations - 1:
                if overall_success_rate >= 0.9:
                    print(f"✅ Target performance achieved: {overall_success_rate:.2%}")
                break
            
            # Use Gemini to analyze and optimize
            print(f"\n🤖 Analyzing with Gemini 2.5 Pro...")
            analysis = self.gemini_optimizer.analyze_prompt_performance(best_prompt, overall_results)
            
            print(f"\n📋 Analysis complete. Generating optimized prompt...")
            optimized_prompt = self.gemini_optimizer.generate_optimized_prompt(best_prompt, analysis)
            
            # Extract the actual prompt from Gemini's response (may include explanation)
            best_prompt = self.extract_prompt_from_response(optimized_prompt)
            
            print(f"✨ Prompt optimized for next iteration")
        
        return {
            'final_prompt': best_prompt,
            'final_performance': overall_success_rate,
            'optimization_history': self.optimization_history
        }
    
    def extract_prompt_from_response(self, response: str) -> str:
        """Extract the actual prompt from Gemini's response"""
        # Look for prompt markers or return the whole response if no clear markers
        lines = response.split('\n')
        
        # Try to find prompt between markers
        in_prompt = False
        prompt_lines = []
        
        for line in lines:
            if 'OPTIMIZED PROMPT:' in line.upper() or 'IMPROVED PROMPT:' in line.upper():
                in_prompt = True
                continue
            elif in_prompt and line.strip().startswith('---'):
                break
            elif in_prompt:
                prompt_lines.append(line)
        
        if prompt_lines:
            return '\n'.join(prompt_lines).strip()
        else:
            # If no clear markers, return the whole response
            return response.strip()
    
    def generate_report(self) -> str:
        """Generate optimization report"""
        if not self.optimization_history:
            return "No optimization history available"
        
        report = "# Prompt Optimization Report\n\n"
        
        # Performance progression
        report += "## Performance Progression\n\n"
        for entry in self.optimization_history:
            report += f"**Iteration {entry['iteration']}**: {entry['overall_success_rate']:.2%} success rate\n"
        
        # Best performing prompt
        best_entry = max(self.optimization_history, key=lambda x: x['overall_success_rate'])
        report += f"\n## Best Performing Prompt\n\n"
        report += f"**Success Rate**: {best_entry['overall_success_rate']:.2%}\n\n"
        report += f"**Prompt**:\n```\n{best_entry['prompt']}\n```\n\n"
        
        # Model-specific performance
        report += "## Model-Specific Performance\n\n"
        for model in best_entry['results'].keys():
            model_results = best_entry['results'][model]
            success_rate = sum(1 for r in model_results if r['success']) / len(model_results)
            report += f"- **{model}**: {success_rate:.2%}\n"
        
        return report
    
    def visualize_performance(self):
        """Create performance visualization"""
        if not self.optimization_history:
            print("No data to visualize")
            return
        
        # Extract data for plotting
        iterations = [entry['iteration'] for entry in self.optimization_history]
        overall_rates = [entry['overall_success_rate'] for entry in self.optimization_history]
        
        # Model-specific data
        models = list(self.optimization_history[0]['results'].keys())
        model_data = {model: [] for model in models}
        
        for entry in self.optimization_history:
            for model in models:
                model_results = entry['results'][model]
                success_rate = sum(1 for r in model_results if r['success']) / len(model_results)
                model_data[model].append(success_rate)
        
        # Create visualization
        plt.figure(figsize=(12, 6))
        
        # Overall performance
        plt.subplot(1, 2, 1)
        plt.plot(iterations, overall_rates, 'bo-', linewidth=2, markersize=8)
        plt.title('Overall Performance Progress')
        plt.xlabel('Iteration')
        plt.ylabel('Success Rate')
        plt.ylim(0, 1)
        plt.grid(True, alpha=0.3)
        
        # Model-specific performance
        plt.subplot(1, 2, 2)
        for model in models:
            plt.plot(iterations, model_data[model], 'o-', label=model, linewidth=2, markersize=6)
        plt.title('Model-Specific Performance')
        plt.xlabel('Iteration')
        plt.ylabel('Success Rate')
        plt.ylim(0, 1)
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()

# Initialize the optimization system
def setup_optimization_system():
    """Setup and configure the optimization system"""
    print("🔧 Setting up Prompt Optimization System...")
    
    # Check if API keys are configured
    if GEMINI_API_KEY == "your-gemini-api-key-here":
        print("⚠️  Please configure GEMINI_API_KEY")
        return None
    
    if OLLAMA_API_KEY == "sk-your-api-key-here":
        print("⚠️  Please configure OLLAMA_API_KEY")
        return None
    
    # Initialize system
    optimizer = PromptOptimizationSystem(
        gemini_api_key=GEMINI_API_KEY,
        ollama_base_url=OLLAMA_BASE_URL,
        ollama_api_key=OLLAMA_API_KEY
    )
    
    print("✅ Optimization system ready!")
    return optimizer

# Example usage
def run_optimization_example():
    """Run an example optimization cycle"""
    
    # Setup system
    optimizer = setup_optimization_system()
    if not optimizer:
        return
    
    # Initial prompt from reference data
    initial_prompt = reference_data['description']
    
    # Target models
    target_models = ['gemma3:1b', 'gemma3:4b']
    
    print(f"\n🎯 Starting optimization with initial prompt:")
    print(f"📝 Prompt length: {len(initial_prompt)} characters")
    print(f"🔍 Target models: {', '.join(target_models)}")
    
    # Run optimization
    results = optimizer.run_optimization_cycle(
        current_prompt=initial_prompt,
        target_models=target_models,
        max_iterations=3
    )
    
    # Generate and display report
    print(f"\n📊 OPTIMIZATION COMPLETE")
    print(f"🎉 Final performance: {results['final_performance']:.2%}")
    
    report = optimizer.generate_report()
    print(f"\n{report}")
    
    # Visualize results
    optimizer.visualize_performance()
    
    return results

# Quick test function
def quick_test_model(model: str, prompt: str, test_input: str):
    """Quick test of a single model with given prompt and input"""
    verifier = OllamaVerifier(OLLAMA_BASE_URL, OLLAMA_API_KEY)
    
    print(f"🧪 Quick test: {model}")
    print(f"📝 Input: {test_input}")
    
    # Test single case
    test_case = {'prompt': test_input, 'completion': 'N/A'}
    results = verifier.test_model_with_prompt(model, prompt, [test_case])
    
    if results:
        result = results[0]
        print(f"📤 Output: {result['actual']}")
        print(f"✅ Success: {result['success']}")
        if result['error']:
            print(f"❌ Error: {result['error']}")
    
    return results[0] if results else None

# Display current reference data
print("📚 Reference Data Loaded:")
print(f"   Title: {reference_data['title']}")
print(f"   Examples: {len(reference_data['examples'])}")
print(f"   Description length: {len(reference_data['description'])} characters")

print(f"\n🔧 To start optimization, run: run_optimization_example()")
print(f"🧪 For quick testing, run: quick_test_model('gemma3:1b', prompt, 'test input')")



This comprehensive prompt optimization system provides:

1

Made changes.

. **Gemini 2.5 Pro Integration**: Uses Gemini for intelligent prompt analysis and optimization
2. **Local Ollama Verification**: Tests optimized prompts on your local Gemma models
3. **Iterative Improvement**: Automatically runs optimization cycles to improve performance
4. **Performance Metrics**: Tracks success rates and identifies error patterns
5. **Visualization**: Creates charts showing optimization progress
6. **Reporting**: Generates detailed reports on optimization results

The system will:
- Test your current prompt on both Gemma3:1b and Gemma3:4b
- Use Gemini 2.5 Pro to analyze failures and suggest improvements
- Iteratively refine the prompt based on performance data
- Provide detailed reports and visualizations of the optimization process

To use this system, you'll need to:
1. Set up your Gemini API key
2. Ensure your local Ollama service is running
3. Configure the API keys in the notebook
4. Run the optimization cycle