# Multilingual Prompt Optimizer - Interactive Demo

This notebook demonstrates the **Multilingual Prompt Optimizer** (MPO) - a system that adapts LLM prompts for cultural and linguistic appropriateness across languages.

## üéØ What You'll Learn

1. How cultural adaptation improves LLM outputs
2. The difference between translation and cultural transformation
3. Comparing German (low-context) vs. Spanish (high-context) adaptations
4. Metrics for evaluating multilingual prompt quality

## üöÄ Setup

In [None]:
# Import required libraries
import sys
sys.path.append('../src')

from mpo.core.prompt import PromptTemplate, PromptDomain, FormalityLevel
from mpo.adapters import get_adapter, EnglishAdapter, GermanAdapter, SpanishAdapter
from mpo.providers import LocalLLMProvider
from mpo.core.evaluator import PromptEvaluator
from mpo.storage.cache_manager import CacheManager
from mpo.metrics import quantitative, qualitative

import yaml
from pathlib import Path

print("‚úÖ Imports successful!")

## üìù Part 1: Understanding Cultural Adaptation

### The Problem: Translation ‚â† Cultural Appropriateness

Consider a simple business request:

In [None]:
# English baseline prompt
english_prompt = """
I need to request an extension for the project deadline.
The current deadline is next Friday, but I need until next month.
"""

print("üá∫üá∏ English (baseline):")
print(english_prompt)

### Naive Translation vs. Cultural Adaptation

**Naive Translation** (word-for-word):
- üá©üá™ German: "Ich brauche eine Verl√§ngerung f√ºr die Projektfrist..."
- üá™üá∏ Spanish: "Necesito solicitar una extensi√≥n para la fecha l√≠mite..."

**Problem**: While semantically correct, these translations ignore:
- ‚ùå Cultural communication norms (direct vs. indirect)
- ‚ùå Formality expectations (Sie vs. du, usted vs. t√∫)
- ‚ùå Relationship dynamics (task-focused vs. relational)

**Our Approach**: Apply cultural transformation rules based on linguistic theory.

## üåç Part 2: Loading Configuration

Our system uses YAML configuration files with cultural parameters for each language:

In [None]:
# Load language configurations
with open('../config/languages.yaml') as f:
    languages_config = yaml.safe_load(f)

# Inspect German cultural parameters
print("üá©üá™ German Cultural Parameters:")
print(f"Context Level: {languages_config['languages']['de']['cultural_params']['communication_style']['context_level']}")
print(f"Directness: {languages_config['languages']['de']['cultural_params']['communication_style']['directness']}")
print(f"\nFormality Markers (Formal):")
print(languages_config['languages']['de']['cultural_params']['formality_levels']['formal'])

In [None]:
# Inspect Spanish cultural parameters
print("üá™üá∏ Spanish Cultural Parameters:")
print(f"Context Level: {languages_config['languages']['es']['cultural_params']['communication_style']['context_level']}")
print(f"Directness: {languages_config['languages']['es']['cultural_params']['communication_style']['directness']}")
print(f"\nFormality Markers (Formal):")
print(languages_config['languages']['es']['cultural_params']['formality_levels']['formal'])

## üîÑ Part 3: Cultural Adaptation in Action

Let's create a prompt template and adapt it for different languages:

In [None]:
# Create a prompt template
template = PromptTemplate(
    id="business_request",
    content="I need to request an extension for the {project_name} project. The current deadline is {current_deadline}, but due to {reason}, I would like to request moving the deadline to {requested_deadline}. Could you please consider this request and let me know if this adjustment is possible?",
    domain=PromptDomain.BUSINESS,
    placeholders={
        "project_name": "Website Redesign",
        "current_deadline": "Friday, Oct 15",
        "reason": "additional client requirements",
        "requested_deadline": "Friday, Oct 29"
    },
    description="Business email requesting project deadline extension"
)

print("üìÑ Original Template:")
print(template.content)
print(f"\nDomain: {template.domain.value}")

### Adaptation 1: German (Formal)

**Cultural Context:**
- **Low-context culture**: Information must be explicit
- **High directness**: Get to the point quickly
- **Formal pronouns**: Use "Sie" in business
- **Structured**: Clear opening, body, closing

In [None]:
# Adapt for German (formal)
de_adapter = get_adapter('de', languages_config['languages']['de'])
de_variant = de_adapter.adapt(template, FormalityLevel.FORMAL)

print("üá©üá™ German Formal Adaptation:")
print("="*60)
print(de_variant.adapted_content)
print("="*60)
print(f"\nüìã Adaptation Notes:")
print(de_variant.adaptation_notes)

### Adaptation 2: Spanish (Formal)

**Cultural Context:**
- **High-context culture**: Relationship matters
- **Medium directness**: Balance task and relationship
- **Formal pronouns**: Use "usted" in business
- **Relational preambles**: Well-being inquiry + purpose statement

In [None]:
# Adapt for Spanish (formal)
es_adapter = get_adapter('es', languages_config['languages']['es'])
es_variant = es_adapter.adapt(template, FormalityLevel.FORMAL)

print("üá™üá∏ Spanish Formal Adaptation:")
print("="*60)
print(es_variant.adapted_content)
print("="*60)
print(f"\nüìã Adaptation Notes:")
print(es_variant.adaptation_notes)

### üîç Key Differences

Compare the two adaptations:

| Aspect | German (DE) | Spanish (ES) |
|--------|-------------|-------------|
| **Opening** | Direct greeting | Well-being inquiry |
| **Preamble** | Brief context | Relational connection |
| **Body** | Task-focused | Task + relationship |
| **Closing** | Standard formal | Gratitude + formal |
| **Tone** | Professional directness | Warm professionalism |

This demonstrates **pragmatic equivalence** over **semantic equivalence**.

## ü§ñ Part 4: Generating LLM Responses (Demo Mode)

Now let's retrieve cached LLM responses generated with Gemma 2 9B:

In [None]:
# Initialize cache manager
cache = CacheManager('../data/cache')

# Retrieve cached responses
de_response = cache.get_cached_response('business_email', 'de', 'formal')
es_response = cache.get_cached_response('business_email', 'es', 'formal')
en_response = cache.get_cached_response('business_email', 'en', 'formal')

print("‚úÖ Cached responses loaded")
print(f"English: {len(en_response.content) if en_response else 0} chars")
print(f"German: {len(de_response.content) if de_response else 0} chars")
print(f"Spanish: {len(es_response.content) if es_response else 0} chars")

In [None]:
# Display German response
if de_response:
    print("üá©üá™ German LLM Response:")
    print("="*60)
    print(de_response.content)
    print("="*60)
    print(f"Tokens: {de_response.tokens_input} in / {de_response.tokens_output} out")
    print(f"Model: {de_response.model}")

In [None]:
# Display Spanish response
if es_response:
    print("üá™üá∏ Spanish LLM Response:")
    print("="*60)
    print(es_response.content)
    print("="*60)
    print(f"Tokens: {es_response.tokens_input} in / {es_response.tokens_output} out")
    print(f"Model: {es_response.model}")

## üìä Part 5: Metrics and Evaluation

Let's calculate quantitative and qualitative metrics:

In [None]:
# Calculate metrics for German response
if de_response:
    de_quant = quantitative.calculate_all_quantitative_metrics(
        de_response.content,
        de_response.tokens_output,
        'de'
    )
    
    de_qual = qualitative.calculate_all_qualitative_metrics(
        de_response.content,
        'de',
        'formal',
        'business'
    )
    
    print("üá©üá™ German Metrics:")
    print(f"  Word Count: {de_quant['length_metrics']['word_count']}")
    print(f"  Lexical Diversity: {de_quant['lexical_diversity']['type_token_ratio']:.3f}")
    print(f"  Avg Word Length: {de_quant['length_metrics']['avg_word_length']:.2f}")
    print(f"  Cultural Appropriateness: {de_qual['cultural_appropriateness']['overall_rating']}")

In [None]:
# Calculate metrics for Spanish response
if es_response:
    es_quant = quantitative.calculate_all_quantitative_metrics(
        es_response.content,
        es_response.tokens_output,
        'es'
    )
    
    es_qual = qualitative.calculate_all_qualitative_metrics(
        es_response.content,
        'es',
        'formal',
        'business'
    )
    
    print("üá™üá∏ Spanish Metrics:")
    print(f"  Word Count: {es_quant['length_metrics']['word_count']}")
    print(f"  Lexical Diversity: {es_quant['lexical_diversity']['type_token_ratio']:.3f}")
    print(f"  Avg Word Length: {es_quant['length_metrics']['avg_word_length']:.2f}")
    print(f"  Cultural Appropriateness: {es_qual['cultural_appropriateness']['overall_rating']}")

## üìà Part 6: Visualization

Let's create a simple comparison chart:

In [None]:
import plotly.graph_objects as go

# Prepare data
languages = ['English', 'German', 'Spanish']
responses = [en_response, de_response, es_response]

token_counts = [r.tokens_output if r else 0 for r in responses]
word_counts = [
    quantitative.calculate_all_quantitative_metrics(r.content, r.tokens_output, lang)['length_metrics']['word_count']
    if r else 0
    for r, lang in zip(responses, ['en', 'de', 'es'])
]

# Create bar chart
fig = go.Figure(data=[
    go.Bar(name='Tokens', x=languages, y=token_counts),
    go.Bar(name='Words', x=languages, y=word_counts)
])

fig.update_layout(
    title='Response Length Comparison',
    barmode='group',
    yaxis_title='Count',
    template='plotly_white'
)

fig.show()

## üéØ Part 7: Key Takeaways

### What We've Demonstrated:

1. **Cultural Adaptation ‚â† Translation**
   - German: Direct, structured, task-focused
   - Spanish: Relational, warm, context-rich

2. **Linguistic Theory in Practice**
   - Hall's high/low-context framework
   - Brown & Levinson's politeness theory
   - T-V distinction (formal pronouns)

3. **Measurable Quality Metrics**
   - Lexical diversity
   - Token efficiency
   - Cultural appropriateness

4. **Zero-Cost Local Inference**
   - Gemma 2 9B provides excellent multilingual quality
   - No API costs during development
   - Real-time adaptation testing

### Applications:

- üåç International business communication
- ü§ñ Culturally-aware chatbots
- üìß Automated email generation
- üéì Language learning tools
- üî¨ Cross-cultural NLP research

---

## üöÄ Next Steps

Try experimenting with:
1. Different formality levels (casual, neutral, formal)
2. Other prompt templates (technical, creative, persuasive)
3. Adding new languages (French, Japanese, etc.)
4. Custom cultural parameters

**CLI Commands:**
```bash
# Test different prompts
mpo test business_email --provider local --live -l de -f formal

# Generate HTML report
mpo html-report business_email

# Run full benchmark
mpo benchmark --provider local
```

---

**üìö Learn More:**
- Read `docs/cultural_rationale.md` for linguistic theory details
- Check `docs/architecture.md` for system design
- See `GEMMA_2_9B_RESULTS.md` for model evaluation

### üéØ Provider Comparison Summary

| Feature | Local (Gemma 2) | OpenAI (GPT-4) |
|---------|----------------|----------------|
| **Quality** | Excellent | Outstanding |
| **Speed** | Fast (local) | Medium (API) |
| **Cost** | Free | ~$0.01-0.03/request |
| **Embeddings** | ‚ùå Not available | ‚úÖ Native (1536-dim) |
| **Token Counting** | Approximation (~4 chars/token) | Exact (tiktoken) |
| **Languages** | Good multilingual | Excellent multilingual |
| **Context Window** | 8K tokens | 128K tokens |
| **Privacy** | ‚úÖ Fully local | ‚ö†Ô∏è Sent to OpenAI |
| **Setup** | Requires LMStudio | API key only |

### üí° Recommendation:

- **Development & Testing**: Use Local provider (Gemma 2 9B)
  - No costs
  - Fast iteration
  - Good quality for testing

- **Production & Research**: Use OpenAI or Anthropic
  - Higher quality
  - Native embeddings (OpenAI)
  - Better multilingual support
  - Worth the cost for important applications

### üöÄ Using OpenAI in Production:

```python
# Production setup
from mpo.providers import OpenAIProvider
from mpo.core.evaluator import PromptEvaluator

# Initialize
provider = OpenAIProvider(api_key="sk-...")
evaluator = PromptEvaluator(provider, lang_config)

# Generate
response = evaluator.evaluate_variant(variant, config)

# Get embeddings for RAG
embeddings = provider.get_embeddings(text)
```

**CLI Usage:**
```bash
# Test with OpenAI
mpo test business_email --provider openai --live -l de -f formal

# Run benchmark with OpenAI
mpo benchmark --provider openai --live
```

In [None]:
# Compare token counting methods
test_text = "I need to request an extension for the Website Redesign project."

# OpenAI (accurate with tiktoken)
openai_tokens = openai_provider.count_tokens(test_text)

# Local provider (approximation)
local_provider = LocalLLMProvider()
local_tokens = local_provider.count_tokens(test_text)

print("üî¢ Token Counting Comparison:")
print("="*40)
print(f"Text: \"{test_text}\"")
print(f"\nOpenAI (tiktoken):   {openai_tokens} tokens")
print(f"Local (approximation): {local_tokens} tokens")
print(f"Character count:       {len(test_text)} chars")
print(f"\nüí° OpenAI's tiktoken provides exact counts")
print(f"   Local uses ~4 chars/token approximation")

### üìä Token Counting Accuracy

OpenAI provider uses **tiktoken** for accurate token counting, while local provider uses approximation:

In [None]:
# Get embeddings for our prompts
import numpy as np

# Sample texts in different languages
texts = {
    'en': "I need to request a project deadline extension.",
    'de': "Ich m√∂chte h√∂flich um eine Fristverl√§ngerung bitten.",
    'es': "Me dirijo a usted para solicitar una pr√≥rroga del plazo."
}

print("üéØ Generating Embeddings...")
embeddings = {}

for lang, text in texts.items():
    emb = openai_provider.get_embeddings(text)
    embeddings[lang] = emb
    print(f"‚úÖ {lang.upper()}: {len(emb)}-dimensional vector")

# Calculate semantic similarity (cosine similarity)
def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

print("\nüìä Semantic Similarity Matrix:")
print("-" * 40)
for lang1 in texts.keys():
    for lang2 in texts.keys():
        if lang1 <= lang2:  # Avoid duplicates
            sim = cosine_similarity(embeddings[lang1], embeddings[lang2])
            print(f"{lang1.upper()} ‚Üî {lang2.upper()}: {sim:.4f}")

print("\nüí° Interpretation:")
print("   Values close to 1.0 = high semantic similarity")
print("   All three express the same intent ‚Üí high similarity expected")

### üéØ OpenAI-Specific Feature: Native Embeddings

One unique advantage of OpenAI provider is **native embeddings support** using `text-embedding-3-small` (1536 dimensions).

This enables:
- Semantic similarity comparison
- Document clustering
- Retrieval-augmented generation (RAG)
- Cross-lingual similarity matching

Let's demonstrate embeddings:

In [None]:
# Use the German formal variant we created earlier
from mpo.providers.base import GenerationConfig

# Create evaluator with OpenAI provider
openai_evaluator = PromptEvaluator(openai_provider, languages_config['languages'])

# Generate response (only if you have API key and want to spend ~$0.02)
# For demo purposes, we'll use the mock provider by default

print("üîÑ Generating response with OpenAI provider...")
print(f"   Language: German (de)")
print(f"   Formality: Formal")
print(f"   Provider: {openai_provider.provider_name}\n")

# Configuration for generation
config = GenerationConfig(
    temperature=0.7,
    max_tokens=500,
    top_p=0.95
)

# Generate response
openai_response = openai_evaluator.evaluate_variant(de_variant, config)

print("‚úÖ Response generated!")
print(f"üìä Tokens: {openai_response.tokens_input} in / {openai_response.tokens_output} out")
print(f"‚è±Ô∏è  Timestamp: {openai_response.timestamp}")
print(f"\nü§ñ GPT-4 Response:")
print("="*60)
print(openai_response.content)
print("="*60)

### Generating a Response with OpenAI

Let's generate a German formal business email using GPT-4:

In [None]:
# Import OpenAI provider
from mpo.providers import OpenAIProvider, MockOpenAIProvider
import os

# Check if API key is available
has_openai_key = os.getenv("OPENAI_API_KEY") is not None

if has_openai_key:
    print("‚úÖ OPENAI_API_KEY found - will use live OpenAI API")
    print("‚ö†Ô∏è  Note: This will make real API calls and cost ~$0.02")
    
    # Initialize OpenAI provider
    openai_provider = OpenAIProvider()
    print(f"ü§ñ Provider: {openai_provider.provider_name}")
    print(f"üì¶ Model: {openai_provider.model_name}")
    
    # Show model info
    model_info = openai_provider.get_model_info()
    print(f"üí∞ Cost: ${model_info['cost_input_per_m']}/M input, ${model_info['cost_output_per_m']}/M output tokens")
else:
    print("‚ö†Ô∏è  OPENAI_API_KEY not found - using Mock provider for demo")
    print("üí° To use real OpenAI API: Add OPENAI_API_KEY to your .env file")
    
    # Use mock provider for demonstration
    openai_provider = MockOpenAIProvider()
    print(f"üé≠ Using Mock OpenAI Provider")
    print(f"üì¶ Model: {openai_provider.model_name}")

## ü§ñ Part 8: Using OpenAI Provider (GPT-4)

### Why Use OpenAI Provider?

While the local provider (Gemma 2 9B) is excellent for development and testing, the **OpenAI provider** offers:

- ‚úÖ **Superior quality**: GPT-4 Turbo with advanced reasoning
- ‚úÖ **Native embeddings**: Built-in semantic similarity (text-embedding-3-small)
- ‚úÖ **Accurate token counting**: Using tiktoken library
- ‚úÖ **Wider language support**: Better multilingual capabilities
- ‚ö†Ô∏è **Costs money**: ~$0.01-0.03 per request

### When to Use Each Provider:

| Provider | Best For | Cost | Quality |
|----------|----------|------|---------|
| **Local (Gemma 2)** | Development, testing, demos | Free | Excellent |
| **OpenAI (GPT-4)** | Production, research, high-stakes | Paid | Outstanding |
| **Anthropic (Claude)** | Long contexts, nuanced writing | Paid | Outstanding |

Let's demonstrate the OpenAI provider capabilities: