Finding
examples/generation-eval/run.py runs without secrets and exits successfully, but the first-run output includes a per-fixture failure and warning:
g1: PASS
g2: PASS
g3: FAIL (score: 0.50) - 1 check(s) failed: keyword_presence
...
[WARNING] low_success_rate: Generation success rate is 67%, below threshold 80%
The failure occurs because the expected keyword is wrapper, while the mock output says wraps another function. The simple substring checker does not treat wraps as satisfying wrapper.
Repro
Source checkout: b312868, clean macOS Python 3.11 venv.
python -m pip install -e ".[dev]"
python examples/generation-eval/run.py
Expected
For adoption-readiness, examples should make their intent explicit:
- either all first-run example fixtures pass, or
- the example should state that it intentionally demonstrates a warning/failing fixture.
Impact
The script is runnable, but a new customer may interpret the example's FAIL line as package or adapter breakage rather than an intentional suggestion-engine demo.
Finding
examples/generation-eval/run.pyruns without secrets and exits successfully, but the first-run output includes a per-fixture failure and warning:The failure occurs because the expected keyword is
wrapper, while the mock output sayswraps another function. The simple substring checker does not treatwrapsas satisfyingwrapper.Repro
Source checkout:
b312868, clean macOS Python 3.11 venv.python -m pip install -e ".[dev]" python examples/generation-eval/run.pyExpected
For adoption-readiness, examples should make their intent explicit:
Impact
The script is runnable, but a new customer may interpret the example's
FAILline as package or adapter breakage rather than an intentional suggestion-engine demo.