governance-reusable: add codeload-retry resilience (44-PR estate flake 2026-05-26)

# Add codeload-retry resilience to governance-reusable and friends

## Context

During the otpiser#11 estate-wide blocker sweep on 2026-05-26, a single
transient codeload outage caused **44 PRs across the estate** to fail
their `governance/*` checks simultaneously. The failure mode is:

```
##[error]An action could not be found at the URI
  'https://codeload.github.com/trufflesecurity/trufflehog/tar.gz/6c05c4a00b91aa542267d8e32a8254774799d68d'
  (B440:3ED9CE:1CD787:252B63:6A1594AA)
##[error]Failed to download archive ... after 1 attempts.
```

The SHA was valid, the action.yml was present at the SHA, and direct
`curl https://codeload.github.com/.../tar.gz/<sha>` returned 200 ten
minutes after the failure. **The runner only retries once** before
failing the job, and a single codeload glitch propagates to ALL of the
matrix jobs (`Code quality + docs`, `Workflow security linter`,
`Well-Known (RFC 9116 + RSR)`, `Validate Hypatia baseline`,
`Security policy checks`, `Guix primary / Nix fallback policy`,
`Language / package anti-pattern policy`) for every consuming repo.

Today's mitigation was bulk `gh run rerun --failed` across 79 PRs; that
restored 61 of them. But this should not require manual intervention.

## Actions known to be affected today

| Action SHA | Repo | Used in |
|---|---|---|
| `6c05c4a00b91aa542267d8e32a8254774799d68d` | `trufflesecurity/trufflehog` | governance-reusable.yml:523 |
| `fc68ffb90438ef2936bbb3251622353b3dcb2f93` | `erlef/setup-beam` | hypatia-scan.yml (called transitively) |

## Recommendations

1. **Add explicit `continue-on-error: true` + retry shim** for non-blocking
   secret-scanner steps (trufflehog is informational; a one-off skip is
   safer than a one-off red CI).

2. **Cache the action tarballs** via `actions/cache` keyed on
   `${{ runner.os }}-action-${SHA}`. Once cached, subsequent runs use
   the cached copy and bypass codeload entirely.

3. **Prefer self-hosted alternatives** for the heaviest deps: trufflehog
   can be installed via apt/curl directly rather than as an action.

4. **Document the flake-clearing recipe** in standards/CONTRIBUTING.md
   so future maintainers know to run `gh run rerun --failed` on a
   governance cluster red, not to chase per-PR fixes.

## Today's impact

- 79 PRs failed across 79 repos
- 44-PR cluster all on the same codeload glitch
- 61 PRs auto-merged after rerun; 18 still open with non-codeload issues

## Next steps

If accepted: file follow-up PR for recommendation 1 (continue-on-error),
recommendation 2 (action caching), and recommendation 4 (CONTRIBUTING
note). Recommendation 3 (trufflehog→apt) is larger and can go in a
separate issue if pursued.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

governance-reusable: add codeload-retry resilience (44-PR estate flake 2026-05-26) #208

Add codeload-retry resilience to governance-reusable and friends

Context

Actions known to be affected today

Recommendations

Today's impact

Next steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Action SHA	Repo	Used in
`6c05c4a00b91aa542267d8e32a8254774799d68d`	`trufflesecurity/trufflehog`	governance-reusable.yml:523
`fc68ffb90438ef2936bbb3251622353b3dcb2f93`	`erlef/setup-beam`	hypatia-scan.yml (called transitively)

Uh oh!

governance-reusable: add codeload-retry resilience (44-PR estate flake 2026-05-26) #208

Description

Add codeload-retry resilience to governance-reusable and friends

Context

Actions known to be affected today

Recommendations

Today's impact

Next steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions