v0.3.0
What's New
v0.3.0 makes findings a first-class CI surface. v0.2.0 introduced stable
finding ids (<domain>.<area>.<qualifier>); v0.3.0 builds the tooling around
them so PR review and CI gating actually use them.
bagx diff — compare two eval reports
Compare two bagx eval --json reports by stable finding id. Surfaces
NEW / GONE / WORSE / BETTER (and optionally SAME) findings, plus
evidence drift on numeric metrics when severity is unchanged.
bagx diff baseline.json current.json
bagx diff baseline.json current.json --format markdown --output diff.md
bagx diff baseline.json current.json --exit-on warning # CI gateThree output formats: text (terminal), markdown (PR comments), and json
(machine processing). --exit-on {info|warning|error|critical} makes the
exit code track regressions.
bagx eval --findings-only and --severity-min
Render the structured findings list directly in the terminal, optionally
filtered by minimum severity:
bagx eval bag.db3 --findings-only --severity-min warning--severity-min also filters the --json payload, so downstream consumers
see only the findings they care about.
Benchmark severity gate
Manifests now support:
forbidden_findings— fail when listed ids appear, optionally scoped to a
minimum severity ({"id": "sync.delay.high", "severity_min": "error"})max_severity— per-category ceiling (e.g.{"sensor_quality": "warning"})
bagx benchmark --exit-on warning exits non-zero when the suite's worst
finding severity reaches the threshold, independent of manifest pass/fail
status. The benchmark JSON now includes worst_severity per case and at
suite level.
Findings JSON Schema shipped
bagx/schema/findings.schema.json (Draft 2020-12) ships with the wheel.
Locate it from Python:
from bagx.contracts import findings_schema_path, findings_schemaExternal tools — Grafana, Slack bots, GitHub Actions — can validate finding
payloads without depending on the bagx Python runtime.
GitHub Actions PR template
A drop-in workflow at examples/github_actions/bagx-pr-check.yml runs
bagx eval on the PR branch, restores the previous baseline from
actions/cache, runs bagx diff, and posts the markdown result as a
sticky PR comment. Fails the check when readiness regresses.
Schema version
evalJSON: 1.1.0 → 1.2.0 (no payload changes; bumped alongside benchmark)benchmarkJSON: 1.1.0 → 1.2.0 (addsworst_severityat suite and case level)findings.schema.json: newly shipped
Tests
- 424 passing (v0.2.0 had 378)
- New:
tests/test_findings_schema.py,tests/test_diff.py - Expanded:
tests/test_benchmark.py,tests/test_cli.py