Skip to content

feat(cicd_rules): :known_fake_action_sha rule — block partial-prefix-corruption fakes#397

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/cicd-rule-known-fake-action-shas
May 30, 2026
Merged

feat(cicd_rules): :known_fake_action_sha rule — block partial-prefix-corruption fakes#397
hyperpolymath merged 1 commit into
mainfrom
claude/cicd-rule-known-fake-action-shas

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Adds a new Hypatia.Rules.CicdRules entry that flags GitHub Action pins where the SHA is fabricated. Complement to the existing :unpinned_action rule.

Problem

Estate audit on 2026-05-30 found 67 fake action SHA pairs across ~50 repos (11% fabrication rate across 372 unique pins). Verified via gh api repos/<org>/<action>/commits/<sha> returning 422.

Universal pattern: partial-prefix corruption. The first 8-20 hex chars match a real release's SHA; the suffix is fabricated. Examples:

Fake Real Real version
7ab2955eb728f5440978d7b4f723a50dea1f3608 7ab2955eb728f5440978d5824358023be3a2802d setup-zig v2.2.0
49933ea5288caeca8642195f2b846b8bbe245a93 49933ea5288caeca8642d1e84afbd3f7d6820020 setup-node v4.4.0
909cc5acb0135c37a79510dd77767e217930de55 909cc5acb0fdd60627fb858598759246509fa755 setup-deno v2.0.2

Almost certainly a single AI-hallucination event that propagated across the estate via copy-customise of templates. The fakes slip past visual review because of the matching prefix.

What this rule does

Adds :known_fake_action_sha to @blocked_patterns enumerating the 25 known fakes from the audit. Caught at scan time; blocks new code from re-introducing them. Reason field links to the substitution map in project_estate_fake_action_sha_punch_list_2026_05_30 memory entry.

applies_to: [\"*.yml\", \"*.yaml\"] to scope to workflow files.

Tested inline (transcript)

✓ expect=true  got=true   :: known fake setup-beam
✓ expect=true  got=true   :: known fake upload-artifact (partial-prefix)
✓ expect=false got=false  :: REAL upload-artifact v4.6.2 (no false-positive)
✓ expect=false got=false  :: REAL setup-node v4.4.0 (no false-positive)
✓ expect=true  got=true   :: fake setup-node (partial-prefix corruption)
✓ expect=true  got=true   :: fake codeql
✓ expect=false got=false  :: REAL checkout v4 (no false-positive)

Out of scope

For proactive detection of FUTURE fakes (beyond the static 25-entry list), a mix hypatia.verify_action_shas task is the right shape — needs network access to gh api, can't be a static-regex rule. Documented as a follow-up in the punch-list memory.

Provenance

Discovered while wiring hyperpolymath/snifs#30 build-mode CI gate. The static-rule design intent was previously captured in feedback_verify_action_sha_pins memory:

Consider adding a complementary rule that VERIFIES SHA pins resolve upstream — would have caught all four fakes above.

This PR ships the static-list version of that idea.

Test plan

  • Rule compiles + registers (verified locally)
  • No false-positives on real action SHAs in current estate workflows
  • When scanning the 50 repos still carrying these fakes, the rule flags them
  • After round-2 sweep (admin-merge style, in flight as PID 607915) completes, the rule's positives should drop to zero estate-wide

…fix-corruption fakes

Complement to the existing `:unpinned_action` rule. Where `:unpinned_action`
catches non-SHA pins (`@v1`, `@main`, etc.), this catches the opposite
failure mode: SHA-shaped pins that don't actually exist on the upstream
action repo (`gh api commits/<sha> -> 422`).

## How these are getting in

Estate audit on 2026-05-30 found 67 fake action SHA pairs across ~50 repos
(11% fabrication rate across 372 unique pins). Universal pattern:
**partial-prefix corruption** — the first 8-20 hex chars match a real
release's SHA exactly, then the suffix is fabricated. Examples:

  fake: 7ab2955eb728f5440978d7b4f723a50dea1f3608
  real: 7ab2955eb728f5440978d5824358023be3a2802d (setup-zig v2.2.0)

  fake: 49933ea5288caeca8642195f2b846b8bbe245a93
  real: 49933ea5288caeca8642d1e84afbd3f7d6820020 (setup-node v4.4.0)

  fake: 909cc5acb0135c37a79510dd77767e217930de55
  real: 909cc5acb0fdd60627fb858598759246509fa755 (setup-deno v2.0.2)

The corruption pattern is consistent enough that it's almost certainly a
single AI-hallucination event that got copy-pasted across the estate.

## What this rule does

Static-list pattern enumerating the 25 known fake SHAs from the
2026-05-30 audit. Caught at scan time, blocked from new code, and the
reason field directs to the substitution map in
`project_estate_fake_action_sha_punch_list_2026_05_30.md` (memory).

## Tested

Verified via inline Elixir tests (in transcript):
- 5/5 known fakes correctly matched
- 3/3 real SHAs correctly NOT matched (no false-positives on real
  v4.6.2 upload-artifact, real v4.4.0 setup-node, real checkout v4)

## Out of scope

For proactive detection of FUTURE fakes (not in the 25-entry static
list), a `mix hypatia.verify_action_shas` task is the right shape —
requires network access to `gh api` and can't be a static-regex rule.
Documented as follow-up in the punch-list memory.

Provenance: discovered while wiring hyperpolymath/snifs#30 build-mode CI
gate; the static-rule design intent was previously captured in
`feedback_verify_action_sha_pins` memory ("Consider adding a complementary
rule that VERIFIES SHA pins resolve upstream — would have caught all four
fakes above").
@hyperpolymath hyperpolymath enabled auto-merge (squash) May 30, 2026 17:00
@hyperpolymath hyperpolymath merged commit f5e0941 into main May 30, 2026
1 of 31 checks passed
@hyperpolymath hyperpolymath deleted the claude/cicd-rule-known-fake-action-shas branch May 30, 2026 17:09
hyperpolymath added a commit that referenced this pull request May 30, 2026
…via gh api (#399)

## Summary

Companion to the static-list `:known_fake_action_sha` rule (PR #397).
Where that rule blocks the 25 fakes known at audit time, this mix task
catches FUTURE fakes by actually calling `gh api
repos/<org>/<action>/commits/<sha>` against every unique action pin in
the estate.

## Why this is needed

The static rule covers KNOWN fakes. But the next AI-hallucination event
will introduce DIFFERENT SHAs that don't match the static list. To catch
those, we have to actually verify each pin against upstream — that's
network access; can't be a pure-pattern scan.

Together, #397 + this PR close the "hypatia is a waste of time if these
issues persist" concern:

- **#397 (static)** — instant detection of the 25 known fakes
- **this task (dynamic)** — periodic audit catches new fabrications

## Design

Per hypatia's CLAUDE.md scanner-hygiene guidance, `System.cmd("gh",
["api", ...])` shell-out instead of introducing an HTTP-client dep.
Reuses existing `gh` auth — no new auth surface, 5000 req/hr rate limit
(plenty for 372 pins).

### Cache

`data/verified-action-shas.json` stores `{ref}@{sha}` → `"real"|"fake"`
so subsequent runs only verify NEW pins:
- Cold: ~5 minutes (372 pins)
- Warm: ~5 seconds (deltas only)

### `--paranoid` mode

Re-verifies cached real SHAs. Run periodically to catch the rare case
where upstream commits a revert that breaks a previously-verified SHA.

### Throttling

50ms inter-call sleep gives ~20 req/sec, well under rate limit even
without auth cap.

## Usage

\`\`\`bash
mix hypatia.verify_action_shas                # default text output
mix hypatia.verify_action_shas --paranoid     # re-verify cache
mix hypatia.verify_action_shas --format json  # tooling integration
\`\`\`

## Exit codes

- 0 — verification complete, zero fakes
- 2 — fakes found (CI can use as hard gate)
- 1 — hard failure (bad args, gh missing, cache unwritable)

## Out of scope (follow-up)

- Wire into `LearningScheduler` GenServer for periodic auto-audits
- Plumb through `Hypatia.Safety.RateLimiter` for fleet-wide concurrent
runs

## Provenance

Discovered the need during the snifs#30 build-mode arc / 2026-05-30
estate audit. Design intent captured in
`feedback_verify_action_sha_pins` memory ("Consider adding a
complementary rule that VERIFIES SHA pins resolve upstream").

## Test plan

- [ ] `mix hypatia.verify_action_shas --format json` runs to completion
- [ ] Reports the known 25 fakes when run before round-2 sweep merges
- [ ] Reports zero fakes after round-2 sweep + #397 land
- [ ] Cache persists across runs (warm path << cold path)
hyperpolymath added a commit that referenced this pull request May 30, 2026
## Summary

Wires `mix hypatia.verify_action_shas` (#399) into `LearningScheduler`
so it runs as a daily auto-audit, not just on manual invocation.
Completes the auto-defense story alongside `:known_fake_action_sha`
static rule (#397).

## How fake SHAs get caught now

| Mechanism | Latency | Surface |
|---|---|---|
| `:known_fake_action_sha` rule (#397) | instant | static scan; catches
the 25 known fakes immediately |
| `mix hypatia.verify_action_shas` (#399) | manual | one-off audit;
catches ANY 422-returning SHA |
| **This PR — LearningScheduler integration** | daily | automatic
estate-wide audit, no manual run needed |

## Design

### Cadence

24h via mtime check on the verifier's cache file
(`data/verified-action-shas.json`). Subsequent runs are ~5 sec (only
checks new pins) — cheap enough that daily costs nothing meaningful.

### Isolation

Spawned as `System.cmd` subprocess via `Task.start`, NOT inline in the
GenServer:

- The mix task's `exit({:shutdown, 2})` on fakes-found would crash
`LearningScheduler` if run in-process. Subprocess isolates that.
- `Task.start` (not `Task.async`) — fire-and-forget; the learning cycle
never waits on the gh-api walk.
- Exceptions in the verification path are logged but never bubble up.

### Logging

- Clean (zero fakes): info-level confirmation
- Fakes found (exit 2): warning with clipped sample of output
- Other exit codes: warning with stderr clip

## What's complete now

The auto-defense story is closed:

- New fake committed → static rule catches it at next scan
- New fake somehow bypasses static scan → daily audit catches it within
24h
- Cache keeps the daily audit cheap (~5 sec warm)
- Zero manual intervention required for ongoing protection

## Provenance

Closes the "hypatia is a waste of time" concern raised during the
2026-05-30 snifs#30 / fake-SHA arc.

## Test plan

- [ ] Module compiles + LearningScheduler still starts cleanly
- [ ] First cycle after startup runs the verification (cache absent →
`action_sha_verify_stale?` returns true)
- [ ] Subsequent cycles within 24h skip the verification (cache fresh)
- [ ] After 24h, next cycle runs verification again
- [ ] Verification exit 2 (fakes found) logs warning but doesn't crash
scheduler
- [ ] Verification crash logs warning but doesn't crash scheduler
hyperpolymath added a commit that referenced this pull request May 30, 2026
## Summary

Hypatia carried 3 fake SHA pins in its own workflows. Self-clean so
hypatia passes its own `:known_fake_action_sha` rule (#397) when
scanning itself.

## Substitutions (version-faithful where possible)

| Action | Sites | Fake SHA | Real replacement | Version |
|---|---|---|---|---|
| `Swatinem/rust-cache` | 7 (ci.yml ×4, tests.yml ×3, release.yml ×1) |
`ad397744...b8db` | `9d47c6ad4b02e050fd481d890b2ea34778fd09d6` | v2.7.8
(intent preserved) |
| `haskell-actions/hlint-run` | 1 (ci.yml) | `75c62c3b...6ce2` |
`0b0024319753ba0c8b2fa21b7018ed252aed8181` | v2.4.9 (intent preserved) |
| `haskell-actions/hlint-setup` | 1 (ci.yml) | `17f0f409...ddfa` |
`fe9cd1cd1af94a23900c06738e73f6ddb092966a` | **v2.4.10 (bumped —
original `# v2.7.0` was doubly fictional)** |

### Note on hlint-setup

The original
`haskell-actions/hlint-setup@17f0f4093d35cfdbf02aab186d51d0bb8b92ddfa #
v2.7.0` was fake at TWO levels: the SHA doesn't exist, AND the version
`v2.7.0` was never released. `hlint-setup`'s tag history only goes up to
v2.4.10. Bumped to v2.4.10 (current latest) rather than try to preserve
a version that never existed.

## Verification

All three real SHAs return 200 from `gh api
repos/<org>/<action>/commits/<sha>`. Verified pre-commit.

## Why one PR for both action families

The round-2 estate sweep in flight (~46 repos, version-faithful
substitution map) handles `rust-cache@ad397744...` but NOT
`hlint-run`/`hlint-setup` — those aren't in the substitution map. Filing
both fixes in one PR for hypatia means:

1. Hypatia is fully self-clean immediately (passes its own rule)
2. When the in-flight sweep reaches hypatia, it'll see
`corrected_not_emitted` (no-op) — clean handoff
3. No leftover hlint fixes deferred to a separate PR

## Provenance

Estate audit 2026-05-30 found 67 fake action SHA pairs; the in-flight
round-2 sweep handles 26 substitutions across 46 repos; ~25 niche
single-repo fakes including hlint were documented as deferred. This PR
moves the 2 hlint fakes from "deferred" to "done" since they're in
hypatia's own repo and there's symbolic value to hypatia being
self-clean.

See `project_estate_fake_action_sha_punch_list_2026_05_30.md` for the
full substitution map context.

## Test plan

- [ ] `gh api` returns 200 for all 3 new SHAs (verified)
- [ ] No remaining fake-SHA occurrences in hypatia's workflows (verified
via grep)
- [ ] CI passes (rust-cache + hlint behaviour is identical, just real
SHAs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant