Release v0.3.0 · rsasaki0109/bagx

What's New

v0.3.0 makes findings a first-class CI surface. v0.2.0 introduced stable
finding ids (<domain>.<area>.<qualifier>); v0.3.0 builds the tooling around
them so PR review and CI gating actually use them.

`bagx diff` — compare two eval reports

Compare two bagx eval --json reports by stable finding id. Surfaces
NEW / GONE / WORSE / BETTER (and optionally SAME) findings, plus
evidence drift on numeric metrics when severity is unchanged.

bagx diff baseline.json current.json
bagx diff baseline.json current.json --format markdown --output diff.md
bagx diff baseline.json current.json --exit-on warning   # CI gate

Three output formats: text (terminal), markdown (PR comments), and json
(machine processing). --exit-on {info|warning|error|critical} makes the
exit code track regressions.

`bagx eval --findings-only` and `--severity-min`

Render the structured findings list directly in the terminal, optionally
filtered by minimum severity:

bagx eval bag.db3 --findings-only --severity-min warning

--severity-min also filters the --json payload, so downstream consumers
see only the findings they care about.

Benchmark severity gate

Manifests now support:

forbidden_findings — fail when listed ids appear, optionally scoped to a
minimum severity ({"id": "sync.delay.high", "severity_min": "error"})
max_severity — per-category ceiling (e.g. {"sensor_quality": "warning"})

bagx benchmark --exit-on warning exits non-zero when the suite's worst
finding severity reaches the threshold, independent of manifest pass/fail
status. The benchmark JSON now includes worst_severity per case and at
suite level.

Findings JSON Schema shipped

bagx/schema/findings.schema.json (Draft 2020-12) ships with the wheel.
Locate it from Python:

from bagx.contracts import findings_schema_path, findings_schema

External tools — Grafana, Slack bots, GitHub Actions — can validate finding
payloads without depending on the bagx Python runtime.

GitHub Actions PR template

A drop-in workflow at examples/github_actions/bagx-pr-check.yml runs
bagx eval on the PR branch, restores the previous baseline from
actions/cache, runs bagx diff, and posts the markdown result as a
sticky PR comment. Fails the check when readiness regresses.

Schema version

eval JSON: 1.1.0 → 1.2.0 (no payload changes; bumped alongside benchmark)
benchmark JSON: 1.1.0 → 1.2.0 (adds worst_severity at suite and case level)
findings.schema.json: newly shipped

Tests

424 passing (v0.2.0 had 378)
New: tests/test_findings_schema.py, tests/test_diff.py
Expanded: tests/test_benchmark.py, tests/test_cli.py

Full Changelog

v0.2.0...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

`bagx diff` — compare two eval reports

`bagx eval --findings-only` and `--severity-min`

Benchmark severity gate

Findings JSON Schema shipped

GitHub Actions PR template

Schema version

Tests

Full Changelog

Uh oh!

v0.3.0

What's New

bagx diff — compare two eval reports

bagx eval --findings-only and --severity-min

Benchmark severity gate

Findings JSON Schema shipped

GitHub Actions PR template

Schema version

Tests

Full Changelog

Uh oh!

`bagx diff` — compare two eval reports

`bagx eval --findings-only` and `--severity-min`