This is not another LLM benchmark or leaderboard. It is meant to be a simple place to find out what models to use, how they compare with others (benchmarks), and their leaderboards, which may help practitioners decide what models to use. If you find this useful, please consider leaving a star.
This repository contains useful links and information about useful models, providers, and benchmarks. I built this mostly to keep track of the models I use and the benchmarks I find useful for the respective models. It also helps to have a single page that I can refer to for all the models and leaderboards.
Inference Providers This is primarily based on cost/performance/latency and other factors. It is a good place to start if you are looking for a provider for your model.
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
OpenLLM Leaderboard | Huggingface OpenLLM | |
Helm Leaderboard | Leaderboard | |
Chat Models | LMSYS Chatbot Arena Leaderboard | |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
Text Embedding Models | MTEB Leaderboard | |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
Eval Plus | Leaderboard | |
HumanEval+ Python | Benchmark Leaderboard | |
Code Security | CyberSecEval for Code | |
Code Effectiveness | BigCode AI Benchmark | |
Code Tasks | Can AI Code Leaderboard |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
T2I Comp Bench | Benchmark | |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
Text to Video Models | Leaderboard | |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|
Automatic Speech Recognition (ASR) Models for Speech to Text | Open ASR Leaderboard | |
Text to Speech | Synthesis | |
Text to Speech | TTS Arena | |
Type | Leaderboard & Benchmarks | Notes |
---|---|---|