<a href="https://colab.research.google.com/github/mihirahuja1/llmuxer/blob/main/examples/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLMuxer Quickstart

Find the cheapest LLM that meets your quality requirements for classification tasks.

**This notebook demonstrates:**
- Installing LLMuxer
- Setting up OpenRouter API key
- Running cost optimization on a simple sentiment classification task
- Understanding the results

## 1. Installation

In [None]:
# Install LLMuxer
!pip install llmuxer

## 2. Setup API Key

Get your OpenRouter API key from: https://openrouter.ai/keys

In [None]:
import os
from getpass import getpass

# Set your OpenRouter API key
api_key = getpass("Enter your OpenRouter API key: ")
os.environ['OPENROUTER_API_KEY'] = api_key

print("✅ API key configured!")

## 3. Simple Sentiment Classification Example

Let's find the cheapest model that can classify sentiment with 90% accuracy.

In [None]:
import llmuxer

# Define our test examples
examples = [
    {"input": "This product is amazing! I love it.", "label": "positive"},
    {"input": "Terrible service, very disappointed.", "label": "negative"},
    {"input": "It's okay, nothing special.", "label": "neutral"},
    {"input": "Outstanding quality and fast delivery!", "label": "positive"},
    {"input": "Worst purchase ever, total waste of money.", "label": "negative"},
    {"input": "Average product, meets expectations.", "label": "neutral"},
    {"input": "Fantastic! Exceeded all my expectations.", "label": "positive"},
    {"input": "Poor quality, broke after one day.", "label": "negative"},
    {"input": "It works fine, no complaints.", "label": "neutral"},
    {"input": "Incredible value for money!", "label": "positive"}
]

print(f"📊 Testing with {len(examples)} sentiment examples")
print("🎯 Goal: Find cheapest model with 90% accuracy")

In [None]:
# Run the optimization
result = llmuxer.optimize_cost(
    baseline="gpt-4",                    # Compare against GPT-4
    examples=examples,                   # Our test data
    task="classification",               # Classification task
    options=["positive", "negative", "neutral"],  # Valid classes
    min_accuracy=0.9                     # Require 90% accuracy
)

print("\n🎉 Optimization complete!")

## 4. Understanding the Results

In [None]:
# Display the results
if "error" in result:
    print(f"❌ Error: {result['error']}")
    print("💡 Try lowering min_accuracy or using a different baseline")
else:
    print("📋 OPTIMIZATION RESULTS")
    print("=" * 50)
    print(f"🥇 Best Model: {result['model']}")
    print(f"🎯 Accuracy: {result['accuracy']:.1%}")
    print(f"💰 Cost per Million Tokens: ${result['cost_per_million']:.2f}")
    
    if 'cost_savings' in result and result['cost_savings']:
        print(f"💸 Cost Savings: {result['cost_savings']:.1%}")
    
    print(f"⏱️  Processing Time: {result.get('time', 0):.1f} seconds")
    
    print("\n💡 What this means:")
    print(f"   • Switch from your baseline to {result['model']}")
    print(f"   • Maintain {result['accuracy']:.1%} accuracy on your task")
    if 'cost_savings' in result and result['cost_savings']:
        print(f"   • Save {result['cost_savings']:.1%} on LLM costs")
    print(f"   • Pay ${result['cost_per_million']:.2f} per million tokens")

## 5. Next Steps

🚀 **Try with your own data:**
- Replace `examples` with your classification dataset
- Adjust `options` to match your classes
- Experiment with different `min_accuracy` thresholds
- Use `sample_size=0.1` to test on 10% of data first

📚 **Learn More:**
- [GitHub Repository](https://github.com/mihirahuja1/llmuxer)
- [Full Documentation](https://github.com/mihirahuja1/llmuxer#readme)
- [Benchmark Results](https://github.com/mihirahuja1/llmuxer/blob/main/docs/benchmarks.md)

🎯 **Current Limitations:**
- Only classification tasks supported (extraction, generation coming in v0.2)
- Sequential model testing (parallel coming in v0.2)
- OpenRouter API required for model access

## 6. Bonus: Larger Dataset Example

Test with more examples for better accuracy assessment:

In [None]:
# Larger example set
large_examples = [
    # Positive examples
    {"input": "This product is amazing! I love it.", "label": "positive"},
    {"input": "Outstanding quality and fast delivery!", "label": "positive"},
    {"input": "Fantastic! Exceeded all my expectations.", "label": "positive"},
    {"input": "Incredible value for money!", "label": "positive"},
    {"input": "Best purchase I've made this year!", "label": "positive"},
    {"input": "Highly recommend this to everyone.", "label": "positive"},
    {"input": "Perfect product, exactly what I needed.", "label": "positive"},
    {"input": "Five stars! Will definitely buy again.", "label": "positive"},
    
    # Negative examples  
    {"input": "Terrible service, very disappointed.", "label": "negative"},
    {"input": "Worst purchase ever, total waste of money.", "label": "negative"},
    {"input": "Poor quality, broke after one day.", "label": "negative"},
    {"input": "Completely useless, doesn't work at all.", "label": "negative"},
    {"input": "Horrible experience, would not recommend.", "label": "negative"},
    {"input": "Overpriced and low quality.", "label": "negative"},
    {"input": "Save your money, this is garbage.", "label": "negative"},
    {"input": "Worst customer service ever encountered.", "label": "negative"},
    
    # Neutral examples
    {"input": "It's okay, nothing special.", "label": "neutral"},
    {"input": "Average product, meets expectations.", "label": "neutral"},
    {"input": "It works fine, no complaints.", "label": "neutral"},
    {"input": "Standard quality, as expected.", "label": "neutral"},
    {"input": "Does the job, nothing more nothing less.", "label": "neutral"},
    {"input": "Reasonable price for what you get.", "label": "neutral"},
    {"input": "It's functional but not impressive.", "label": "neutral"},
    {"input": "Mediocre quality, could be better.", "label": "neutral"}
]

print(f"📊 Running optimization with {len(large_examples)} examples...")
print("⏱️  This may take 2-3 minutes")

# Run with larger dataset
large_result = llmuxer.optimize_cost(
    baseline="gpt-4",
    examples=large_examples,
    task="classification",
    options=["positive", "negative", "neutral"],
    min_accuracy=0.85  # Slightly lower threshold for more examples
)

print("\n🎉 Large dataset optimization complete!")
print(f"🥇 Best Model: {large_result.get('model', 'N/A')}")
print(f"🎯 Accuracy: {large_result.get('accuracy', 0):.1%}")
if 'cost_savings' in large_result:
    print(f"💸 Cost Savings: {large_result['cost_savings']:.1%}")