-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
sarmakska edited this page May 3, 2026
·
2 revisions
- Parallel async runner with retry + partial resumption
- JSONL dataset loader
- Scorer decorator and protocol
- Built-in scorers: exact match, JSON schema, ROUGE-L
- SQLite backend
- SarmaLink-AI and OpenAI providers
- FastAPI + HTMX viewer
- Typer CLI
- BLEU and BERTScore scorers
- LLM-as-judge with calibration set
- Rubric-graded scoring
- HuggingFace dataset hub integration
- DuckDB and Postgres backends (stubs exist)
- CI integration: PR comment with regression delta
- Streaming view of in-progress runs
- Cost tracking per run
- Diff command: compare two runs side by side
- Vendor-locked eval platform (this is open source on purpose)
- Real-time online evaluation (use APM tools for that)
- Auto-fix-the-prompt feature (humans should review prompt changes)
PRs welcome. Pick from "Planned", open an issue, fork, branch, push, PR. Small, focused.
I will not merge:
- Framework swaps (Typer + FastAPI stay)
- Sync runners (everything is async/await)
- Adapters for providers without a free tier path
Releases: see GitHub Releases.