Skip to content

Roadmap

sarmakska edited this page May 3, 2026 · 2 revisions

Roadmap

Shipped (1.0.0)

  • Parallel async runner with retry + partial resumption
  • JSONL dataset loader
  • Scorer decorator and protocol
  • Built-in scorers: exact match, JSON schema, ROUGE-L
  • SQLite backend
  • SarmaLink-AI and OpenAI providers
  • FastAPI + HTMX viewer
  • Typer CLI

Planned

  • BLEU and BERTScore scorers
  • LLM-as-judge with calibration set
  • Rubric-graded scoring
  • HuggingFace dataset hub integration
  • DuckDB and Postgres backends (stubs exist)
  • CI integration: PR comment with regression delta
  • Streaming view of in-progress runs
  • Cost tracking per run
  • Diff command: compare two runs side by side

Won't ship

  • Vendor-locked eval platform (this is open source on purpose)
  • Real-time online evaluation (use APM tools for that)
  • Auto-fix-the-prompt feature (humans should review prompt changes)

Contribute

PRs welcome. Pick from "Planned", open an issue, fork, branch, push, PR. Small, focused.

I will not merge:

  • Framework swaps (Typer + FastAPI stay)
  • Sync runners (everything is async/await)
  • Adapters for providers without a free tier path

Releases: see GitHub Releases.

Clone this wiki locally