Skip to content

fix: propagate sidecar errors + reseal recovery subcommand (Refs #118)#119

Merged
noahgift merged 3 commits intomainfrom
fix/integrity-atomic-write-no-silent-errors
Apr 24, 2026
Merged

fix: propagate sidecar errors + reseal recovery subcommand (Refs #118)#119
noahgift merged 3 commits intomainfrom
fix/integrity-atomic-write-no-silent-errors

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Closes #118. Two related defects in state integrity handling.

Defect 1 — silent sidecar-write error

src/core/state/mod.rs:57,108:

let _ = integrity::write_b3_sidecar(&path);

After save_lock atomically renamed state.lock.yaml.tmp → state.lock.yaml, the sidecar's Result was discarded. If the sidecar write failed (disk full, fs hiccup, permission, signal, race with reaper), the apply reported success, but on-disk state was lock.yaml (new) + .b3 (stale or missing). Next apply hard-failed:

ERROR: integrity check failed for state/<m>/state.lock.yaml:
       expected <old>, got <new>

Toyota Way violation: defect propagated silently across time — no signal at the moment of corruption.

Fix: propagate with ?; error message points users to the new forjar reseal subcommand.

Defect 2 — no recovery path for pre-existing drift

Users with drift from OLD forjar versions (or from git checkout/merge that restored lock.yaml without .b3) had no recovery short of forjar apply --yes. Adds a reseal subcommand that rewrites sidecars from current lock contents WITHOUT converging infrastructure:

forjar reseal --all              # reseal every state/*/lock.yaml + global
forjar reseal --file <path>      # one specific lock file
forjar reseal --machine <name>   # single machine
forjar reseal --all --dry-run    # preview

Safety: each target is YAML-parsed before its sidecar is rewritten. A malformed lock file is rejected ("<path> is not valid YAML") so reseal cannot bless a corrupt file and have next-apply silently trust garbage.

Files

  • src/core/state/mod.rs? propagation in save_lock + save_global_lock
  • src/cli/reseal.rs (NEW, TDG 97.5/100 A-) — 4 small fns
  • src/cli/mod.rs — register mod reseal
  • src/cli/commands/{mod,state_args}.rsCommands::Reseal(ResealArgs)
  • src/cli/dispatch_misc.rs — dispatch to reseal::cmd_reseal

Test

Smoke-tested against real drift on paiml/infra state dir: 13/24 lock files mismatched → apply correctly rejected. reseal --all resealed 23 files with 0 failures; next apply proceeded past the integrity gate. End-to-end fix confirmed on live state.

TDG: src/cli/infra.rs unchanged (89.1/100 A- baseline). New src/cli/reseal.rs 97.5/100 A-.

Related

  • paiml/infra#77 ANDON — aprender 11-PR stack blocked; hit this bug trying to forjar apply the companion fix paiml/infra#78.
  • aprender#1043 — ci.yml per-PR cargo mount (workspace-test now PASS on new mount).

🤖 Generated with Claude Code

noahgift added a commit that referenced this pull request Apr 24, 2026
…nic (Refs #118)

RUSTSEC-2026-0104 was published 2026-04-23 — reachable panic in
rustls-webpki 0.103.12's CRL parsing. Transitive via rustls →
rustls-native-certs; upstream fix in rustls-webpki 0.104 but rustls
hasn't bumped yet. aprender's `.cargo/audit.toml` already ignores this
(observed in aprender CI audit-cmd `--ignore` list 2026-04-24). Syncing
forjar's deny.toml to match so forjar CI (`cargo deny check`) doesn't
block on the same class across repos.

This unblocks the audit gate for #119 (integrity atomic-write fix).

Fleet follow-up: paiml/infra clean-room template needs the same
exemption — filed separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 24, 2026 10:25
noahgift and others added 2 commits April 24, 2026 15:03
Two related defects in state integrity handling.

DEFECT 1 — src/core/state/mod.rs:57,108 — silent sidecar error

    let _ = integrity::write_b3_sidecar(&path);

After atomic rename of state.lock.yaml, the sidecar's Result was discarded.
Any failure (disk full, permission, signal, reaper race) left lock.yaml
(new) + .b3 (stale); next apply hard-failed with "integrity check failed".
Toyota Way violation: no signal at moment of corruption.

Fix: propagate with `?`; message points user at `forjar reseal`.

DEFECT 2 — no recovery for pre-existing drift

Users with drift from OLD forjar versions or git checkout had no recovery
short of `forjar apply --yes`. Adds `reseal` subcommand that rewrites
sidecars from current lock contents without converging infrastructure:

    forjar reseal --all           # reseal every state/*/lock.yaml
    forjar reseal --file <path>
    forjar reseal --machine <name>
    forjar reseal --all --dry-run

Safety: each target YAML-parsed before sidecar rewrite — corrupt lock
cannot be blessed with a fresh sidecar.

FILES

- src/core/state/mod.rs — `?` propagation in save_lock + save_global_lock.
- src/cli/reseal.rs (NEW, TDG 97.5 A-) — cmd_reseal + 3 small helpers.
- src/cli/{mod,dispatch_misc}.rs + src/cli/commands/{mod,state_args}.rs —
  Commands::Reseal wiring.

TEST

Smoke-tested against paiml/infra with 13/24 lock files mismatched. Apply
correctly rejected; `reseal --all` resealed 23 files 0 failures; next
apply passed the integrity gate.

Closes #118.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nic (Refs #118)

RUSTSEC-2026-0104 was published 2026-04-23 — reachable panic in
rustls-webpki 0.103.12's CRL parsing. Transitive via rustls →
rustls-native-certs; upstream fix in rustls-webpki 0.104 but rustls
hasn't bumped yet. aprender's `.cargo/audit.toml` already ignores this
(observed in aprender CI audit-cmd `--ignore` list 2026-04-24). Syncing
forjar's deny.toml to match so forjar CI (`cargo deny check`) doesn't
block on the same class across repos.

This unblocks the audit gate for #119 (integrity atomic-write fix).

Fleet follow-up: paiml/infra clean-room template needs the same
exemption — filed separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the fix/integrity-atomic-write-no-silent-errors branch from bf2da13 to bbc014e Compare April 24, 2026 13:07
`cargo deny check` and `cargo audit` are distinct tools reading distinct
config sources. `cargo deny` reads `deny.toml` [advisories.ignore].
`cargo audit` 0.22 does NOT read config files — only CLI --ignore flags.

forjar's audit.yml ran `cargo audit` bare. After RUSTSEC-2026-0097
(rustls-webpki) and RUSTSEC-2026-0104 (rustls-webpki CRL panic) published
against rustls-webpki 0.103.12 (both already exempted in deny.toml),
`cargo audit` correctly exited non-zero — the exemptions never reached
it. CI green on deny, red on audit, despite the same advisory IDs being
on the ignore list.

Fix mirrors the aprender sovereign-ci.yml pattern:
- New `.cargo/audit.toml` with the cargo-audit-native schema
  `[advisories] ignore = [...]`. Single source of truth for cargo-audit,
  kept in sync with deny.toml by convention (documented in file header).
- audit.yml parses `.cargo/audit.toml` for RUSTSEC IDs at run time and
  builds `--ignore <id>` CLI flags, matching how paiml/.github#32 solved
  the same class upstream.

Covers both -0097 and -0104 (same rustls-webpki transitive class, no
safe upgrade before upstream 0.104).
@noahgift noahgift merged commit 655973f into main Apr 24, 2026
18 checks passed
@noahgift noahgift deleted the fix/integrity-atomic-write-no-silent-errors branch April 24, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: state.lock.yaml + .b3 sidecar write is non-atomic; errors silently ignored

1 participant