Skip to content

fix(sovereign-ci): self-heal broken sibling clone cache#24

Merged
noahgift merged 1 commit into
mainfrom
fix/sibling-clone-robustness
Apr 18, 2026
Merged

fix(sovereign-ci): self-heal broken sibling clone cache#24
noahgift merged 1 commit into
mainfrom
fix/sibling-clone-robustness

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

  • Validate cached sibling clones via `git rev-parse HEAD`. Nuke + re-clone if HEAD missing or fetch fails.
  • Removes `|| true` that previously swallowed broken-cache errors (bashrs#197 coverage blocker).
  • Drop `--quiet` from clone so failures are visible.

Five whys

  1. Why did bashrs#197 coverage fail? `cargo metadata` couldn't read `/__w/bashrs/provable-contracts/crates/provable-contracts/Cargo.toml`.
  2. Why missing? Sibling dir exists but is empty (broken cache from a 10-day-stale runner workspace).
  3. Why empty? `git fetch` into broken cache silently failed.
  4. Why silent? The `|| true` in the old code swallowed all fetch errors.
  5. Why `|| true`? Defensive against offline/transient failures — but it hid real corruption.

Affected jobs

All 5 of: test, lint, coverage, security, provenance (each had a duplicate copy of the sibling-clone block).

Test plan

Refs paiml/bashrs#197

🤖 Generated with Claude Code

Root cause (bashrs#197 coverage failure):
The sibling checkout step had `|| true` on fetch, silently masking
a broken cached clone. Script flow:
  if [ -d "$repo" ]; then
    git fetch ... && reset --hard ... || true   # swallows errors
  else
    for attempt ...; do git clone ...; done
  fi

When cache dir exists but is empty/broken (runner workspace leftover),
the fetch fails but `|| true` eats the error. Next step (cargo metadata)
then fails with 'No such file or directory' pointing at the missing
Cargo.toml — 20s into coverage, not at the sibling-clone step.

Five-whys:
1. Why did cargo metadata fail? Missing provable-contracts Cargo.toml
2. Why missing? Sibling dir exists but is empty
3. Why empty? Fetch into existing broken cache failed
4. Why silent? `|| true` swallowed the error
5. Why defensive? To tolerate offline/transient failures — but it
   hid real ones too

Fix: validate cache with `git rev-parse HEAD`. If HEAD missing OR
fetch fails, rm -rf + re-clone. Drop --quiet from clone so output
is visible on failure.

Applied to all 5 jobs (test, lint, coverage, security, provenance).
No caller changes needed — `uses: @main` picks up fix automatically.

Refs paiml/bashrs#197 (coverage blocker after 10d cache staleness)
@noahgift noahgift merged commit b073d8a into main Apr 18, 2026
2 checks passed
@noahgift noahgift deleted the fix/sibling-clone-robustness branch April 18, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant