Skip to content

v0.3.0

Choose a tag to compare

@rsasaki0109 rsasaki0109 released this 21 May 21:27
· 20 commits to main since this release

What's New

v0.3.0 makes findings a first-class CI surface. v0.2.0 introduced stable
finding ids (<domain>.<area>.<qualifier>); v0.3.0 builds the tooling around
them so PR review and CI gating actually use them.

bagx diff — compare two eval reports

Compare two bagx eval --json reports by stable finding id. Surfaces
NEW / GONE / WORSE / BETTER (and optionally SAME) findings, plus
evidence drift on numeric metrics when severity is unchanged.

bagx diff baseline.json current.json
bagx diff baseline.json current.json --format markdown --output diff.md
bagx diff baseline.json current.json --exit-on warning   # CI gate

Three output formats: text (terminal), markdown (PR comments), and json
(machine processing). --exit-on {info|warning|error|critical} makes the
exit code track regressions.

bagx eval --findings-only and --severity-min

Render the structured findings list directly in the terminal, optionally
filtered by minimum severity:

bagx eval bag.db3 --findings-only --severity-min warning

--severity-min also filters the --json payload, so downstream consumers
see only the findings they care about.

Benchmark severity gate

Manifests now support:

  • forbidden_findings — fail when listed ids appear, optionally scoped to a
    minimum severity ({"id": "sync.delay.high", "severity_min": "error"})
  • max_severity — per-category ceiling (e.g. {"sensor_quality": "warning"})

bagx benchmark --exit-on warning exits non-zero when the suite's worst
finding severity reaches the threshold, independent of manifest pass/fail
status. The benchmark JSON now includes worst_severity per case and at
suite level.

Findings JSON Schema shipped

bagx/schema/findings.schema.json (Draft 2020-12) ships with the wheel.
Locate it from Python:

from bagx.contracts import findings_schema_path, findings_schema

External tools — Grafana, Slack bots, GitHub Actions — can validate finding
payloads without depending on the bagx Python runtime.

GitHub Actions PR template

A drop-in workflow at examples/github_actions/bagx-pr-check.yml runs
bagx eval on the PR branch, restores the previous baseline from
actions/cache, runs bagx diff, and posts the markdown result as a
sticky PR comment. Fails the check when readiness regresses.

Schema version

  • eval JSON: 1.1.0 → 1.2.0 (no payload changes; bumped alongside benchmark)
  • benchmark JSON: 1.1.0 → 1.2.0 (adds worst_severity at suite and case level)
  • findings.schema.json: newly shipped

Tests

  • 424 passing (v0.2.0 had 378)
  • New: tests/test_findings_schema.py, tests/test_diff.py
  • Expanded: tests/test_benchmark.py, tests/test_cli.py

Full Changelog

v0.2.0...v0.3.0