LLM Performance Tracker

A production-grade dashboard for evaluating Large Language Models (LLMs) across multiple providers (OpenAI, Anthropic, etc.) focusing on accuracy, latency, and cost.

This project utilizes the OpenClaw paradigm for standardized model interfacing.

Features

Multi-Provider Support: Track OpenAI, Anthropic, and local models via OpenClaw.
Metric Tracking:
- Accuracy: Semantic similarity and exact match.
- Latency: Time to first token and total response time.
- Cost: Per-token pricing calculations.
Dashboard: Interactive data visualization of model performance.
Automated Benchmarking: Run suites of prompts against multiple models simultaneously.

Architecture

Evaluator Core: Python-based engine that sends prompts and collects telemetry.
OpenClaw Wrapper: Standardized interface for model interactions.
Database: SQLite/JSON backend for storing historical run data.
API: FastAPI backend to serve metrics.
Frontend: Streamlit-based dashboard for visualization.

Installation

# Clone the repository
git clone https://github.com/username/llm-eval-dashboard.git
cd llm-eval-dashboard

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

Usage

1. Run Benchmarks

Execute a prompt suite against configured models:

python scripts/run_benchmark.py --suite generic_tasks

2. Launch Dashboard

Visualize the results in your browser:

streamlit run app/dashboard.py

Project Structure

core/: Evaluation logic and metric calculations.
models/: OpenClaw model definitions.
data/: Local storage for benchmark results.
app/: Streamlit dashboard code.
tests/: Unit tests for evaluation logic.

Contributing

Please see CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
app		app
core		core
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Performance Tracker

Features

Architecture

Installation

Usage

1. Run Benchmarks

2. Launch Dashboard

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Performance Tracker

Features

Architecture

Installation

Usage

1. Run Benchmarks

2. Launch Dashboard

Project Structure

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages