Scorecard/code-scanning findings recur estate-wide: Hypatia never closes the alert-lifecycle loop

## Summary

OSSF Scorecard / code-scanning findings **recur on every audit, estate-wide, and have for months**. Concretely, `modshells` has 3 open alerts (Pinned-Dependencies #63, SAST #72, Maintained #44) that Hypatia has never closed. Root-causing them exposes a structural defect: **Hypatia ingests findings into its own pipeline but never closes the loop on the authoritative GitHub alert lifecycle.** It re-derives findings from local heuristics, fixes a subset, and never (a) reads the real open alerts, (b) records accepted exceptions, (c) verifies tool *efficacy* vs mere presence, or (d) dismisses non-actionable findings with a reason. So the same three classes leak forever.

This issue tracks the **Targeted fixes + Scorecard↔GitHub reconciliation loop**, plus a no-holds-barred design refinement so Hypatia's logic actually prevails.

`Refs` — joint-close only on explicit agreement (do not auto-close).

---

## What Hypatia is supposed to do — and keeps failing to

Intended loop: **scan → classify (safety triangle: eliminate/control/accept) → auto-fix where safe → learn → keep the estate continuously green.**

Observed loop: scan (local heuristics) → classify → fix-a-subset. **Missing: authoritative sense, verify, and learn-write-back.** That open loop is the disease. Three leak classes:

- **(A) Non-actionable** (Maintained, Contributors): no code fix exists. Must be *dismissed with reason*, not re-discovered every run. → alert #44.
- **(B) Design-correct / accepted exception** (SLSA generator *must* stay on a semver tag — SHA-ref produces invalid provenance): Hypatia's only symbolic remediation (SHA-pin) is **actively harmful**, with no exception channel. → alert #63.
- **(C) Nominal-not-effective** (CodeQL present but pointed at a language the repo doesn't contain → 0 results): presence check passes, tool produces nothing. → alert #72.

## Root-cause defects in the symbolic engine

1. **Dark rule.** `WorkflowAudit.check_codeql_language_matrix_mismatch` already encodes the correct fix (`:switch_codeql_matrix_to_actions`) but is gated `Keyword.get(opts, :has_codeql_supported_language, true)` — **defaults true → silent no-op** unless a caller computes language detection (it never does). (#72)
2. **Open-loop ingestion.** `ScorecardIngestor` maps checks into Hypatia's pipeline but there is **no write path back to GitHub's code-scanning alert state** (no dismiss/accept). Non-fixable + exception findings re-accumulate as "open" every audit. (meta / #44 / #63)
3. **Presence ≠ efficacy.** `ScorecardIngestor.check_sast` greps for the string `"codeql"`. modshells *has* `codeql.yml`, so Hypatia believes SAST is satisfied while Scorecard reports `0/7 commits checked`. (#72)
4. **Harmful canonical remediation, no exception.** `workflow_audit.ex:49` and `security_errors.ex:291` map `slsa-framework/slsa-github-generator@v2.1.0 → <sha>`. Applying it breaks SLSA provenance. No exemption registry exists. (#63)
5. **No suppression memory.** Nothing persists "this finding on this repo is accepted because X," keyed by a stable fingerprint, so anything dismissed re-fires next scan. (the recurrence engine itself)

---

## Part 1 — Targeted fixes (sub-issues)

- **[ ] CodeQL efficacy + un-dark the matrix rule.** Default `has_codeql_supported_language` to *computed-from-linguist*, not `true`. Make absence of a scannable language the trigger. Add a paired **effective-SAST** rule: finding when `codeql.yml`'s language matrix ∌ any language the repo contains **and** ∌ `actions`. Recommend `language: actions` (always scannable; runs every commit).
- **[ ] SLSA / self-verifying-ref exemption registry.** Remove the harmful SHA mapping for `slsa-framework/slsa-github-generator`. Introduce a first-class `pin_exempt` / `must_track_tag` registry (data, not code constants) for reusable workflows that self-verify `github.ref`. Emit an **accept** finding with rationale, never a *fix*.
- **[ ] `Hypatia.ScorecardReconciler` (the closer).** Pull *live* code-scanning alerts via the GitHub API. For each: actionable → dispatch fix; accepted-exception → dismiss `won't fix` + rationale comment; non-actionable/informational → dismiss `won't fix` + rationale. Idempotent. Every dismissal recorded by stable `(repo, rule, location_fingerprint)` in a persisted exceptions store so re-scans do not re-open.

## Part 2 — Neurosymbolic anti-recurrence (no holds barred)

**Symbolic**
- **Triage taxonomy generalization.** Replace per-check booleans with a 4-axis classifier: `{actionable_by_code?, remediation_is_safe?, effective_vs_nominal?, activity_only?}`. Every Scorecard check *and* code-scanning rule maps to one cell; the cell deterministically selects the lifecycle action (fix / dismiss-accept / dismiss-info / open-and-alert). General (new rules slot in) **and** narrow (one unambiguous action per cell).
- **Detector splitting (sensitivity ⟂ specificity).** For every "tool present" rule, add a paired "tool produced results in last N commits" rule (query analyses / check-runs API). Presence = specificity; efficacy = sensitivity. This is the generalisation of fix #72.
- **Exception/known-good data, not code constants.** Move `@known_good_shas` + exemptions into versioned data the learning loop can write.
- **Fingerprint-stable suppressions.** Key everything on `repo+rule+normalized-path+symbol`, never line number (lines drift; fingerprints don't). This single change is the recurrence killer.

**Neural**
- **LLM exception adjudicator.** When a finding's symbolic cell is ambiguous (e.g., a new third-party reusable workflow that *might* self-verify its ref), an LLM judges "is SHA-pinning safe or harmful?" using the alert help text + upstream README, proposing a lifecycle action + rationale. **Neural proposes; the decision is crystallised into the symbolic exception registry and is symbolic thereafter.**
- **Remediation-safety predictor.** Small classifier over (action, fix-type, downstream-effect) trained on past outcomes (did the fix break CI? did the alert reopen?) to gate auto-fix vs accept.
- **Recurrence forecaster (the months-long-pain killer).** Cluster findings by (rule, repo-class, root-cause-signature). When a cluster's reopen-rate exceeds threshold, **escalate from per-repo fix to a template fix** (PR into `rsr-template-repo` / `v3-templater`). The meta-signal is reopen-rate; the meta-action is template propagation. Attack the source, not the leaves.

## Part 3 — Meta-layers, connections, data model (logic must prevail)

- **Closed-loop control architecture.** `Sense(authoritative = live GitHub alerts API) → Classify(taxonomy + neural adjudicator) → Act(fix | dismiss-with-reason | escalate-to-template) → Verify(re-query alert state next run) → Learn(write exception/outcome; update reopen-rate)`. Today only `Sense(local) → Classify → Act(subset)` exists. Add authoritative-Sense, Verify, Learn-write-back.
- **Normalized data model.**
  - `finding(id, repo, source, rule, location_fingerprint, first_seen, last_seen, state)`
  - `lifecycle_decision(finding_fp, action, rationale, decided_by[symbolic|neural|human], decided_at, confidence)`
  - `exception_registry(scope[repo|estate], rule, predicate, rationale, expiry?)` — versioned, machine+human writable
  - `outcome(finding_fp, fix_pr, ci_result, reopened?, reopened_after_days)` — feeds the forecaster
- **Connections.** gitbot-fleet must carry *lifecycle decisions + exceptions*, not just raw findings, so one adjudicated SLSA exception auto-applies to all ~30 estate repos sharing that workflow fingerprint. Hypatia ↔ rsr-template-repo/v3-templater: forecaster opens template PRs. Event-driven (`repository_dispatch` on a new Scorecard run), not only weekly cron → responsiveness.
- **Logic-prevails invariants.** Every neural decision must (a) reduce to a symbolic registry entry before enforcement, (b) be reversible, (c) carry a rationale string, (d) be confidence-gated with human escalation below threshold. **Neural may propose and prioritise; only symbolic rules + registry may enforce.**

---

## Reactive companion (already in flight)

`modshells`: `codeql.yml` matrix `javascript-typescript → actions` (PR, `Refs` this issue); alerts #63 and #44 dismissed with written rationale (SLSA-tag-by-design; informational-activity-signal respectively).

## Part 4 — Autonomy & credit-economics (the real KPI)

The binding constraint is **the maintainer spends Claude credit every week manually getting Hypatia to (a) do its job and (b) learn anything**. Success is therefore *economic*, not just green checks:

- **North-star KPI:** human/LLM interventions per estate-green-week → ~0; finding **reopen-rate** trending monotonically to 0; credit spend on estate maintenance flat or falling week-over-week. These are dashboardable from the `outcome` table.
- **Learning must be cheap and symbolic-by-default.** Every adjudication (neural or human) crystallises into the `exception_registry` so it is *never re-reasoned*. The expensive LLM path runs only on genuinely novel, ambiguous findings — bounded, logged, and amortised to a one-time cost per pattern. A finding class adjudicated once must never burn credit again.
- **Self-healing without a human in the weekly loop.** The closed loop (Sense→Classify→Act→Verify→Learn) runs on `repository_dispatch`/cron entirely in Actions; humans are paged only on (i) below-confidence novel findings or (ii) a fix that broke CI. "World ends, only Actions survives" ⇒ the loop must need no local machine, no interactive Claude, no `.git-private-farm` online — it degrades to symbolic-registry-only and still keeps every repo green.
- **`.git-private-farm` role:** durable, offline-survivable mirror of the `exception_registry` + `outcome` history so learning is not lost if GitHub state is reset — the "rebuild from DNA" substrate.

## Part 5 — CI/CD adequacy & anti-redundancy audit

Goal: 100% CI success every run, covering corrective + adaptive + perfective maintenance, with no pointless/redundant workflows (genuine triangulation is *kept*).

- **Coverage check:** map the standard workflow set against {build, test, lint/format, SAST, secret-scan, dependency-update, supply-chain/provenance, license/SPDX, compliance/standards-drift, dead-config detection}. Flag any axis with zero effective (not merely nominal) coverage — the #72 lesson generalised: a workflow that *runs but verifies nothing* is a coverage gap, not coverage.
- **Redundancy/irrelevance check:** identify workflows that are dead config for the repo's actual languages (e.g., a JS/TS CodeQL on an Ada repo, npm/bun blockers where no JS exists *as enforcement* vs *as guardrail* — keep guardrails, cut dead analyzers), duplicate scanners with no triangulation value, and workflows whose failure can never gate anything. Distinguish **triangulation (keep)** from **redundancy (cut)**: triangulation = independent methods confirming the same property; redundancy = same method, no added assurance.
- **Deliverable:** a per-repo CI adequacy scorecard Hypatia computes itself (new rule class), feeding the same reconciliation loop so "inadequate or wasteful CI" becomes an auto-tracked, auto-PR'd finding like any other.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scorecard/code-scanning findings recur estate-wide: Hypatia never closes the alert-lifecycle loop #260

Summary

What Hypatia is supposed to do — and keeps failing to

Root-cause defects in the symbolic engine

Part 1 — Targeted fixes (sub-issues)

Part 2 — Neurosymbolic anti-recurrence (no holds barred)

Part 3 — Meta-layers, connections, data model (logic must prevail)

Reactive companion (already in flight)

Part 4 — Autonomy & credit-economics (the real KPI)

Part 5 — CI/CD adequacy & anti-redundancy audit

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Scorecard/code-scanning findings recur estate-wide: Hypatia never closes the alert-lifecycle loop #260

Description

Summary

What Hypatia is supposed to do — and keeps failing to

Root-cause defects in the symbolic engine

Part 1 — Targeted fixes (sub-issues)

Part 2 — Neurosymbolic anti-recurrence (no holds barred)

Part 3 — Meta-layers, connections, data model (logic must prevail)

Reactive companion (already in flight)

Part 4 — Autonomy & credit-economics (the real KPI)

Part 5 — CI/CD adequacy & anti-redundancy audit

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions