fix: audit followups — real hard gates, /graph budget, PLE self-apply, v0.4.0 artifacts#155
Open
fix: audit followups — real hard gates, /graph budget, PLE self-apply, v0.4.0 artifacts#155
Conversation
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rivet Criterion Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: 83be1ae | Previous: 60d728a | Ratio |
|---|---|---|---|
store_lookup/100 |
2228 ns/iter (± 11) |
1681 ns/iter (± 4) |
1.33 |
store_lookup/1000 |
24778 ns/iter (± 759) |
19280 ns/iter (± 48) |
1.29 |
traceability_matrix/1000 |
59122 ns/iter (± 537) |
41331 ns/iter (± 88) |
1.43 |
query/100 |
797 ns/iter (± 4) |
619 ns/iter (± 1) |
1.29 |
query/1000 |
6982 ns/iter (± 107) |
5174 ns/iter (± 14) |
1.35 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
57e4ee9 to
83be1ae
Compare
Audit found that all four verification-pyramid CI jobs were silently
failing on main. None had ever run green. This fixes three and scopes
the fourth to an upstream bug.
**Kani Proofs** — flipped to hard gate. Five harnesses in
rivet-core/src/proofs.rs were initializing `EvalContext` with only
`artifact` + `graph` fields after the struct grew a `store: Option<...>`
field for quantifier support. The cfg(kani) gate meant the break was
invisible to normal `cargo check`. Added `store: None` to all five.
**Rocq Proofs** — flipped to hard gate. The `rocq_library` target
`rivet_metamodel` had `srcs = []`, which fails Bazel analysis with
"rocq_library requires at least one source file". Removed the empty
aggregator target and pointed the test at the two real libraries
(Schema + Validation) directly.
**Mutation Testing** — split into a per-crate matrix so rivet-core
and rivet-cli each get a 45-minute budget. Previously both crates
shared a single 40-minute timeout, causing rivet-core to be cancelled
before finishing and rivet-cli to never run. `--timeout` per-mutant
reduced from 120s to 90s. Uploads are now per-crate artifacts.
**Verus Proofs** — left as continue-on-error with a pointer comment.
Root cause is in rules_verus (pulseengine/rules_verus, commit e2c1600):
the hub repository's `:all` alias only points to the first platform's
toolchain rather than registering `toolchain()` rules for each
platform, so `register_toolchains("@verus_toolchains//:all")`
resolves to a non-toolchain target. Fixing this requires an upstream
change to rules_verus.
With these fixes, CI will fail — honestly — on Kani regressions,
Rocq proof breaks, and surviving mutants, instead of silently
reporting green.
Implements: REQ-010, REQ-029
Verifies: REQ-010
The /graph dashboard route previously ran layout + SVG over the full
link graph (~1800 artifacts on the dogfood dataset), taking ~57s and
producing ~1MB of HTML. The Playwright test at graph.spec.ts:17 was
named "node budget prevents crash on full graph" but grepping the
renderer for budget/max_nodes returned zero matches -- the budget was
aspirational.
This commit adds a real safety valve in render_graph_view:
- DEFAULT_NODE_BUDGET = 200, MAX_NODE_BUDGET = 2000 (hard ceiling).
- After the filtered subgraph is built but before the expensive
pgv_layout + render_svg calls, short-circuit with a budget message
when node_count > budget.
- The message contains the literal string "budget" so the Playwright
locator `svg, :text('budget')` matches and exposes the standard
filter form (types / focus / depth / link_types / limit) so users
can scope the view without editing URLs.
- Per-request override via ?limit=NNN, clamped to [1, MAX_NODE_BUDGET].
- Filtered views under the budget (?types=requirement,
?focus=REQ-001&depth=2) continue to render SVG unchanged.
Perf (release build, rivet dogfood dataset via serve_integration test):
before after
GET /graph ~57s / ~1MB ~1ms / 20KB
GET /graph?types=requirement (filtered) ~1ms / 44KB (SVG)
GET /graph?focus=REQ-001&depth=2 (filtered) ~44ms / 67KB (SVG)
Three new integration tests in serve_integration.rs lock in the
invariant: full graph stays under 5s and returns the budget message,
focused view still renders SVG, and ?limit=1 forces the budget path.
Implements: REQ-007
Ship a feature model that describes the real variability in rivet: compile-time cargo features, CLI deployment surfaces (cli / dashboard / LSP / MCP), built-in adapters, export formats, test-import formats, and init presets. Every declared feature maps 1:1 to something grep-able in the code: a cargo feature flag, a `rivet` subcommand, a format string dispatched by `export --format` / `import-results --format`, or an init preset in `resolve_preset()`. Closes the dogfooding gap for #128 — v0.4.0 shipped `rivet variant check`, but the rivet project itself had no feature model to feed it. Files: * artifacts/feature-model.yaml — root feature tree + constraints * artifacts/variants/minimal-ci.yaml — default-features cargo build, CLI-only deployment (what CI runs) * artifacts/variants/full-desktop.yaml — every surface, every preset, wasm + oslc cargo features on Real variability identified: * yaml-backend alternative (rowan-yaml default, serde-yaml-only fallback) * deployment-surface or-group (cli-only, dashboard, lsp-server, mcp-server) * adapters or-group with cargo-feature constraints (implies oslc-client feat-oslc; implies wasm-adapter feat-wasm) * export-formats / test-import-formats / init-presets or-groups * preset ↔ adapter constraints (preset-aadl implies aadl-adapter; preset-stpa implies stpa-yaml-adapter) * dashboard implies html-export (shared HTML pipeline) * reqif-export implies reqif-adapter (shared reqif module) Verification (both variants pass): $ rivet variant check --model artifacts/feature-model.yaml \ --variant artifacts/variants/minimal-ci.yaml Variant 'minimal-ci': PASS Effective features (40): aadl-adapter, adapters, baselines, cli-only, commits, core, coverage, deployment-surface, docs-cli, export-formats, generic-yaml-adapter, generic-yaml-export, hooks-infra, html-export, impact-analysis, init-presets, junit-adapter, junit-import, matrix, mutations, needs-json-adapter, needs-json-import, optional-cargo-features, preset-aadl, preset-dev, preset-stpa, query, reqif-adapter, reqif-export, rivet, rowan-yaml, schema-system, sexpr-language, snapshots, stpa-yaml-adapter, test-import-formats, validate, variant-mgmt, yaml-backend, zola-export $ rivet variant check --model artifacts/feature-model.yaml \ --variant artifacts/variants/full-desktop.yaml Variant 'full-desktop': PASS Effective features (58): ...minimal-ci set plus dashboard, lsp-server, mcp-server, oslc-client, wasm-adapter, feat-oslc, feat-wasm, and all 14 init presets (aspice, stpa-ai, cybersecurity, eu-ai-act, safety-case, do-178c, en-50128, iec-61508, iec-62304, iso-pas-8800, sotif, plus the three in minimal-ci). Notes from reading the code: * `rowan-yaml` cargo feature: default-on, with a `cfg(not(feature = "rowan-yaml"))` fallback path in rivet-core/src/db.rs — so the alternative group has two real arms, not one. * `aadl` cargo feature: default-on. Modelled as a mandatory (always-present) adapter since no real build disables it — not as an optional-feature toggle. * `oslc` and `wasm`: off-by-default cargo features, correctly modelled as optional with implies-constraints from the adapters. * `lsp-server`, `dashboard`, `mcp-server` are *not* behind cargo features — they're always compiled in today. The variance is runtime/deployment, not compile-time. Flagged this as a surprising mismatch with the v0.4.0 narrative (where LSP/MCP are described as optional deployment surfaces): they are, but only in the sense of "whether you launch that process", not "whether the code is in the binary". * The rowan YAML parser rejects multi-line `#` comments between mapping entries at the same indent (`expected mapping key, found Some(Comment)`). Worked around by keeping single-line section comments in feature-model.yaml; flagging this as a latent parser bug worth a follow-up issue. Refs: #128
Addresses three gaps found in the post-v0.4.0 dogfooding audit. **v0.4.0 shipped-work artifacts** — `artifacts/v040-features.yaml` was last touched 2026-04-12 and describes variant/PLE work (FEAT-106..114), not the verification pyramid that actually shipped on 2026-04-19. New file `artifacts/v040-verification.yaml` authors 4 design decisions (DD-052 four-layer verification pyramid, DD-053 suffix-based yaml-section matching, DD-054 non-blocking framing for formal CI jobs, DD-055 cfg-gate platform syscalls), 8 features (FEAT-115..122 covering Kani 27-harness expansion, differential YAML tests, operation-sequence proptest, STPA-Sec suite, suffix-based UCA extraction, nested control-action extraction, Zola export, Windows support), and 1 requirement (REQ-060 cross-platform binaries). Counts were verified against the actual codebase — 27 `#[kani::proof]` attrs in proofs.rs, 6 differential tests, 16 STPA-Sec tests. **Retroactive trailer map** — extended `AGENTS.md` with three more legacy orphans (51f2054 #126, f958a7e, 75521b8 #44), a new v0.4.0 PR-level section for #150/#151/#152/#153, and an honest "genuinely-unmappable" section calling out `ca97dd9f` (#95) whose `SC-EMBED-*` trailers point to artifacts that were never authored. **Verus Proofs → hard gate** — rules_verus PR #21 (merged as 5bc96f39) fixes the hub-repo's ambiguous `:all` alias by emitting proper `toolchain()` wrappers per platform. Updates the git_override pin from e2c1600a (Feb 2026, broken) to 5bc96f39 and removes `continue-on-error: true` from the Verus job. Implements: REQ-030, REQ-060 Refs: DD-052, DD-053, DD-054, DD-055, FEAT-115, FEAT-116, FEAT-117, FEAT-118, FEAT-119, FEAT-120, FEAT-121, FEAT-122 Verifies: REQ-030
First run of the flipped hard gates exposed real issues: - **Kani**: `eval_context(artifact: &Artifact)` had an unused param after the store-building refactor. cfg(kani) hid it from `cargo check`; CI's `-D warnings` caught it. Prefixed with `_artifact`. - **Rocq**: Schema.v / Validation.v opened `string_scope` but used `++` on `Store` (a list). Rocq 9.0.1 parses `++` in string_scope as `String.append`, so `s ++ [a]` failed with "s has type Store while expected string". Added `Open Scope list_scope.` after the string open so list concatenation takes precedence. Neither file uses string `++` so the scope swap is safe. - **Verus**: unblocked the `:all` alias bug via upstream rules_verus PR (5bc96f39), but hit a deeper upstream issue — rules_rust 0.56.0 references `CcInfo` which has been removed from current Bazel. Needs a rules_rust bump inside rules_verus before Verus can be a hard gate. Reverted to `continue-on-error: true` with a pointer comment so this is honestly signposted rather than silently advertised as shipped. Mutation Testing rivet-cli passed on the first run. rivet-core still running. /graph budget works in CI (included in the same PR). Implements: REQ-030
The `Open Scope string_scope.` at the top of Schema.v / Validation.v shadowed `length` (String.length vs List.length) and `++` (String.append vs List.app), breaking every Store operation once the proofs got compiled under Rocq 9.0.1. Neither file actually uses infix string operators — all string literals are either passed to `String.eqb` or constructed with explicit `%string` tags. Drop the scope open; tag the one remaining bare literal `"broken-link"` in Validation.v:120 with `%string`. Explanatory comment in both files so a future reader doesn't reopen string_scope and re-break this.
With `Require Import Coq.Strings.String` after `Coq.Lists.List`, the bare identifier `length` resolves to `String.length` (the latest import wins), so `length s` with `s : Store` fails to typecheck. Qualify every `length` call against a list as `List.length` so name resolution cannot drift. Five call sites across Schema.v / Validation.v.
`reach_direct` has a forall-bound `lk : LinkKind` that isn't surfaced in the goal after `apply`. Rocq 9.0.1 refuses the implicit instantiation that older versions allowed, fails with "Unable to find an instance for the variable lk". Using `eapply` creates a metavariable that unifies when the inner `exact Hl1_kind` step substitutes the real link kind.
The `apply reach_direct` + `eapply reach_direct` routes both fail under Rocq 9.0.1 because the proof has an actual hole: `t1` (the link target artifact introduced by destructing `artifact_satisfies_rule`) is not the same as `a2` (the caller-supplied intermediate). The goal after the link-wiring step reduces to `link_target l1 = art_id a2`, but we only have `art_id t1 = link_target l1` — nothing ties `t1` to `a2`. Rather than write around the gap with tactics that wouldn't hold, mark the theorem `Admitted.` with an explicit comment describing what the correct strengthening would look like. All other theorems in Schema.v / Validation.v remain Qed'd. This lets the Rocq hard gate actually compile and enforce the proofs we DO have, rather than hiding a stale semantic break behind a tactic that just happened to typecheck on older Rocq.
The first hard-gate run surfaced issues deeper than one-line fixes. This commit restores honesty rather than hiding them: **Hard gates that stay on:** - Kani compile errors (`store: None`, `_artifact`) — fixed, but see below. - Rocq `Open Scope list_scope.` + `List.length` qualification + `eapply reach_direct` — applied. - Mutation Testing (rivet-cli): 0 surviving mutants, hard gate. **Jobs moved back to continue-on-error with TODOs:** - **Kani**: 27-harness suite exceeded the 30-min CI budget and got cancelled. Bumped timeout to 45 min and left continue-on-error on until we scope the PR-sized subset vs nightly full suite. - **Rocq**: Rocq 9.0.1 is stricter than the version the proofs were written against. Fixed three classes of errors; a fourth (`No such contradiction` in a destructure) remains unfixed. Also `vmodel_chain_two_steps` has a genuine proof gap (link target t1 ≠ caller's a2 without an extra hypothesis) and is now `Admitted.` with an explicit note. Needs a systematic port pass before hard-gating. - **Mutation Testing (rivet-core)**: 3677 mutants, real surviving ones in `collect_yaml_files` / `import_with_schema` (lib.rs:80,241,268) and `bazel.rs::lex` (delete match arm `b'\r'`). Those are actual test coverage gaps. Hard-gating rivet-core means writing tests to kill every one of them first; scoping that out of this PR. rivet-cli mutation stays hard-gated per above. - **Verus**: still blocked on rules_rust 0.56 `CcInfo` removal upstream. The goal of "real hard gates" was to stop advertising verification that never ran green. Three checkpoints are now genuine (rivet-cli mutations, Kani compile-clean once unblocked, Rocq compile-clean once ported). The rest have explicit follow-up notes in ci.yml pointing at what needs to happen before they flip.
The Verus job was marked continue-on-error because rules_verus's minimum rules_rust (0.56.0) used the Bazel built-in `CcInfo` symbol that current Bazel has removed, so the module failed to load. pulseengine/rules_verus@fc7b636 bumps the floor to 0.58.0 — the release where CcInfo is loaded from @rules_cc//cc/common:cc_info.bzl instead. Bumping our pin past that commit unblocks the load and lets the verus job run as a real gate. The same pin range (5e2b7c6) also picks up three correctness fixes in verus-strip: backtick-escaped `verus!` in doc comments no longer truncates output, `pub exec const` strips the `exec` keyword, and content after the `verus!{}` block is preserved. Trace: skip
//verus:rivet_specs_verify references `//rivet-core/src:verus_specs.rs` as a Bazel label, but rivet-core/src was not a Bazel package, so `bazel test` failed analysis with: ERROR: no such package 'rivet-core/src': BUILD file not found Adds a minimal BUILD.bazel that marks the directory as a package and exports the verus specs file. The crate itself is still built via cargo — this file exists only so the Bazel-side Verus targets can address the spec source. Trace: skip
verus/ and proofs/rocq/ each had their own MODULE.bazel, which made every Bazel label relative to those subdirectories. That broke //verus:rivet_specs_verify's attempt to reference //rivet-core/src:verus_specs.rs — the label resolved against the verus/ workspace root and demanded a `verus/rivet-core/src` directory that doesn't exist, yielding: ERROR: no such package 'rivet-core/src': BUILD file not found Root cause was architectural. Consolidate into one workspace at the repo root so cross-directory Bazel references work: - New top-level `MODULE.bazel` merges the two previous module declarations (rules_verus + rules_rocq_rust + rules_nixpkgs_core, same commit pins and same toolchain registrations). - New top-level `BUILD.bazel` as a minimal package marker. - Deleted `verus/MODULE.bazel` and `proofs/rocq/MODULE.bazel`. - CI: run `bazel test //verus:rivet_specs_verify` and `bazel test //proofs/rocq:rivet_metamodel_test` from the repo root, not `working-directory: verus|proofs/rocq`. The Rust crates are still built via cargo. Bazel in this repo is scoped to the formal-verification targets only. With the unified workspace, //verus:rivet_specs_verify can now reach //rivet-core/src:verus_specs.rs which is the precondition for the Verus hard gate to do real work. Trace: skip
Workspace consolidation (6771e6e) means root MODULE.bazel registers both Verus and Rocq toolchains. Bazel resolves every registered toolchain at analysis time regardless of which target is being built, so the Verus-only job now hits the Rocq toolchain extension, which requires rules_nixpkgs_core, which requires nix-build on PATH: ERROR: An error occurred during the fetch of repository 'rules_rocq_rust++rocq+rocq_toolchains': Platform is not supported: nix-build not found in PATH. Install Nix on the Verus runner too. Small cost (~30s) on a job that already takes 20 min, and it's the minimal fix — alternatives (split MODULE.bazel, or rules_nixpkgs_core fail_not_supported) either undo the consolidation or require upstream changes. Trace: skip
Workspace hoist + Nix install fixed the plumbing: Verus now analyses //verus:rivet_specs_verify against //rivet-core/src:verus_specs.rs and invokes rust_verify. But the specs themselves fail verification in 0.1s — a real SMT proof obligation can't be discharged. That's spec-level work (audit which `requires`/`ensures` clauses are wrong) and doesn't belong in this CI-hard-gate PR. Soft-gate until the spec fixes land. Trace: skip
83be1ae to
1efc2e6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses the dogfooding audit findings. Parallel implementation across 4 tracks:
1. CI hard gates (was: silently failing with `continue-on-error: true`)
2. `/graph` node budget
3. PLE self-application (closes dogfooding gap on #128)
4. v0.4.0 shipped-work artifacts
Validation
Upstream
🤖 Generated with Claude Code