> evalflow
pytest for LLMs
You changed one prompt.
Summarization improved.
Classification silently broke.
Nobody noticed for 4 days.
evalflow catches this in CI before it ships.
pip install evalflowevalflow init
evalflow evalWhat you get on day one:
- local prompt and dataset files
- SQLite-backed run history in
.evalflow/ - CI-friendly exit codes
- offline cache support for repeatable checks
> evalflow eval
Running 5 test cases against gpt-4o-mini...
✓ summarize_short_article 0.91
✓ classify_sentiment 1.00
✓ extract_entities 0.87
✗ answer_with_context 0.61
✓ rewrite_formal 0.93
Quality Gate: PASS
Failures: 1
Run ID: 20240315-a3f9c2d81b4e
Traditional unit tests do not tell you when a prompt tweak quietly degrades a task. evalflow gives you a small local quality gate for prompt, model, and dataset changes.
Use it when you need to:
- catch regressions before merge
- compare runs locally
- keep prompt versions in YAML
- run the same gate in CI and on a laptop
# .github/workflows/evalflow.yml
name: LLM Quality Gate
on:
pull_request:
paths:
- "prompts/**"
- "evals/**"
- "**.py"
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install evalflow
- run: evalflow eval
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}- pytest-style exit codes:
0=pass,1=fail,2=error - exact match, embedding, consistency, and LLM judge methods
- baseline snapshots catch regressions, not just low scores
- prompt registry keeps prompts versioned in YAML
- works with OpenAI, Anthropic, Groq, Gemini, and Ollama
- local SQLite storage, no account needed
- offline cache for repeated and CI-safe checks
evalflow init
evalflow eval
evalflow doctor
evalflow runs
evalflow compare RUN_A RUN_B
evalflow prompt list- Docs hub: emartai.mintlify.app
- Quickstart source: docs/quickstart.mdx
- CLI reference source: docs/cli-reference.mdx
- CI guide source: docs/ci-github-actions.mdx
- Provider docs: docs/providers
- evalflow reads API keys from environment variables, never config files
evalflow.yamlstores env var names, not secret values- keep
.envand.evalflow/out of git - see docs/dev-doc/security.md for the full security model
Please do not open public GitHub issues for security vulnerabilities. Open a private GitHub Security Advisory.
See CONTRIBUTING.md for local setup, tests, smoke checks, and performance baselines.
MIT