Implementation of "Universal Model Routing for Efficient LLM Inference" by Jitkrittum et al. (2025)
Route queries to different LLMs based on cost-quality tradeoff using cluster-based error profiles that work with new unseen models without retraining.
- Setup:
01_unirouter_experiment.ipynb
- Dependencies and model configuration - Characterization:
02_model_characterization.ipynb
- Compute error profiles and routing - Evaluation:
03_evaluation.ipynb
- Deferral curves and adding new models
Ψ(m) Vectors: Each model represented as error rates per question cluster
- Enables routing to new models without expensive retraining
- Cost-quality tradeoff via λ parameter:
score = error_rate + λ × cost
pip install openai scikit-learn sentence-transformers datasets groq
Add your API keys:
API_KEYS = {
'openai': 'your-key-here',
'groq': 'your-key-here'
}
- Universal: Works with any new LLM by computing its error profile
- Efficient: No retraining required for new models
- Evaluation:
-
Paper: https://arxiv.org/pdf/2502.08773
Authors: Jitkrittum et al. (2025)