Architecture

graph LR
  D[(Dataset<br/>JSONL or list)] --> R[Runner]
  S[Scorers<br/>plain Python] --> R
  R -->|N parallel calls| L[LLM provider]
  R --> B[(Backend<br/>SQLite / DuckDB)]
  B --> V[FastAPI + HTMX viewer]
  B --> CI[CI regression gate]

  classDef ext fill:#a78bfa,stroke:#a78bfa,color:#fff
  class L ext

Components

File	Responsibility
`aieval/cli.py`	Typer CLI: run, list, view, diff, ci
`aieval/core/runner.py`	Parallel execution with retry
`aieval/core/dataset.py`	JSONL file loader and in-process list loader
`aieval/core/scorer.py`	Scorer decorator and protocol
`aieval/scorers/*.py`	Built-in scorers: exact match, JSON validity, ROUGE
`aieval/backends/*.py`	SQLite and DuckDB backends
`aieval/providers/*.py`	LLM provider adapters: SarmaLink, OpenAI
`aieval/viewer/app.py`	FastAPI + HTMX viewer

Run lifecycle

stateDiagram-v2
  [*] --> Pending
  Pending --> Running: created in DB
  Running --> Persisting: all examples scored
  Persisting --> Done: results stored
  Running --> Failed: provider errors after retries
  Done --> [*]
  Failed --> [*]

Why each piece

Plain Python scorers rather than DSLs because Python is fine for this. You get full IDE support and no leaky abstraction.

Async parallel execution because eval runs are I/O-bound on provider calls. Tune the concurrency argument on run up for higher rate limits.

Versioning by git SHA + dataset hash so two runs with the same model and the same dataset are comparable. Different datasets get different hashes; different code (a new prompt) gets a different SHA.

SQLite default backend because zero infrastructure. Switch to DuckDB by setting AIEVAL_BACKEND when you want columnar analytics over many runs.

HTMX viewer rather than React because the viewer is a read-only inspection tool. HTML over the wire is faster to build and easier to embed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Architecture

Components

Run lifecycle

Why each piece

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally