Skip to content

Scorecard/code-scanning findings recur estate-wide: Hypatia never closes the alert-lifecycle loop #260

@hyperpolymath

Description

@hyperpolymath

Summary

OSSF Scorecard / code-scanning findings recur on every audit, estate-wide, and have for months. Concretely, modshells has 3 open alerts (Pinned-Dependencies #63, SAST #72, Maintained #44) that Hypatia has never closed. Root-causing them exposes a structural defect: Hypatia ingests findings into its own pipeline but never closes the loop on the authoritative GitHub alert lifecycle. It re-derives findings from local heuristics, fixes a subset, and never (a) reads the real open alerts, (b) records accepted exceptions, (c) verifies tool efficacy vs mere presence, or (d) dismisses non-actionable findings with a reason. So the same three classes leak forever.

This issue tracks the Targeted fixes + Scorecard↔GitHub reconciliation loop, plus a no-holds-barred design refinement so Hypatia's logic actually prevails.

Refs — joint-close only on explicit agreement (do not auto-close).


What Hypatia is supposed to do — and keeps failing to

Intended loop: scan → classify (safety triangle: eliminate/control/accept) → auto-fix where safe → learn → keep the estate continuously green.

Observed loop: scan (local heuristics) → classify → fix-a-subset. Missing: authoritative sense, verify, and learn-write-back. That open loop is the disease. Three leak classes:

Root-cause defects in the symbolic engine

  1. Dark rule. WorkflowAudit.check_codeql_language_matrix_mismatch already encodes the correct fix (:switch_codeql_matrix_to_actions) but is gated Keyword.get(opts, :has_codeql_supported_language, true)defaults true → silent no-op unless a caller computes language detection (it never does). (chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72)
  2. Open-loop ingestion. ScorecardIngestor maps checks into Hypatia's pipeline but there is no write path back to GitHub's code-scanning alert state (no dismiss/accept). Non-fixable + exception findings re-accumulate as "open" every audit. (meta / chore(deps): bump trufflesecurity/trufflehog from 3.92.5 to 3.93.0 #44 / chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63)
  3. Presence ≠ efficacy. ScorecardIngestor.check_sast greps for the string "codeql". modshells has codeql.yml, so Hypatia believes SAST is satisfied while Scorecard reports 0/7 commits checked. (chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72)
  4. Harmful canonical remediation, no exception. workflow_audit.ex:49 and security_errors.ex:291 map slsa-framework/slsa-github-generator@v2.1.0 → <sha>. Applying it breaks SLSA provenance. No exemption registry exists. (chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63)
  5. No suppression memory. Nothing persists "this finding on this repo is accepted because X," keyed by a stable fingerprint, so anything dismissed re-fires next scan. (the recurrence engine itself)

Part 1 — Targeted fixes (sub-issues)

  • [ ] CodeQL efficacy + un-dark the matrix rule. Default has_codeql_supported_language to computed-from-linguist, not true. Make absence of a scannable language the trigger. Add a paired effective-SAST rule: finding when codeql.yml's language matrix ∌ any language the repo contains andactions. Recommend language: actions (always scannable; runs every commit).
  • [ ] SLSA / self-verifying-ref exemption registry. Remove the harmful SHA mapping for slsa-framework/slsa-github-generator. Introduce a first-class pin_exempt / must_track_tag registry (data, not code constants) for reusable workflows that self-verify github.ref. Emit an accept finding with rationale, never a fix.
  • [ ] Hypatia.ScorecardReconciler (the closer). Pull live code-scanning alerts via the GitHub API. For each: actionable → dispatch fix; accepted-exception → dismiss won't fix + rationale comment; non-actionable/informational → dismiss won't fix + rationale. Idempotent. Every dismissal recorded by stable (repo, rule, location_fingerprint) in a persisted exceptions store so re-scans do not re-open.

Part 2 — Neurosymbolic anti-recurrence (no holds barred)

Symbolic

  • Triage taxonomy generalization. Replace per-check booleans with a 4-axis classifier: {actionable_by_code?, remediation_is_safe?, effective_vs_nominal?, activity_only?}. Every Scorecard check and code-scanning rule maps to one cell; the cell deterministically selects the lifecycle action (fix / dismiss-accept / dismiss-info / open-and-alert). General (new rules slot in) and narrow (one unambiguous action per cell).
  • Detector splitting (sensitivity ⟂ specificity). For every "tool present" rule, add a paired "tool produced results in last N commits" rule (query analyses / check-runs API). Presence = specificity; efficacy = sensitivity. This is the generalisation of fix chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72.
  • Exception/known-good data, not code constants. Move @known_good_shas + exemptions into versioned data the learning loop can write.
  • Fingerprint-stable suppressions. Key everything on repo+rule+normalized-path+symbol, never line number (lines drift; fingerprints don't). This single change is the recurrence killer.

Neural

  • LLM exception adjudicator. When a finding's symbolic cell is ambiguous (e.g., a new third-party reusable workflow that might self-verify its ref), an LLM judges "is SHA-pinning safe or harmful?" using the alert help text + upstream README, proposing a lifecycle action + rationale. Neural proposes; the decision is crystallised into the symbolic exception registry and is symbolic thereafter.
  • Remediation-safety predictor. Small classifier over (action, fix-type, downstream-effect) trained on past outcomes (did the fix break CI? did the alert reopen?) to gate auto-fix vs accept.
  • Recurrence forecaster (the months-long-pain killer). Cluster findings by (rule, repo-class, root-cause-signature). When a cluster's reopen-rate exceeds threshold, escalate from per-repo fix to a template fix (PR into rsr-template-repo / v3-templater). The meta-signal is reopen-rate; the meta-action is template propagation. Attack the source, not the leaves.

Part 3 — Meta-layers, connections, data model (logic must prevail)

  • Closed-loop control architecture. Sense(authoritative = live GitHub alerts API) → Classify(taxonomy + neural adjudicator) → Act(fix | dismiss-with-reason | escalate-to-template) → Verify(re-query alert state next run) → Learn(write exception/outcome; update reopen-rate). Today only Sense(local) → Classify → Act(subset) exists. Add authoritative-Sense, Verify, Learn-write-back.
  • Normalized data model.
    • finding(id, repo, source, rule, location_fingerprint, first_seen, last_seen, state)
    • lifecycle_decision(finding_fp, action, rationale, decided_by[symbolic|neural|human], decided_at, confidence)
    • exception_registry(scope[repo|estate], rule, predicate, rationale, expiry?) — versioned, machine+human writable
    • outcome(finding_fp, fix_pr, ci_result, reopened?, reopened_after_days) — feeds the forecaster
  • Connections. gitbot-fleet must carry lifecycle decisions + exceptions, not just raw findings, so one adjudicated SLSA exception auto-applies to all ~30 estate repos sharing that workflow fingerprint. Hypatia ↔ rsr-template-repo/v3-templater: forecaster opens template PRs. Event-driven (repository_dispatch on a new Scorecard run), not only weekly cron → responsiveness.
  • Logic-prevails invariants. Every neural decision must (a) reduce to a symbolic registry entry before enforcement, (b) be reversible, (c) carry a rationale string, (d) be confidence-gated with human escalation below threshold. Neural may propose and prioritise; only symbolic rules + registry may enforce.

Reactive companion (already in flight)

modshells: codeql.yml matrix javascript-typescript → actions (PR, Refs this issue); alerts #63 and #44 dismissed with written rationale (SLSA-tag-by-design; informational-activity-signal respectively).

Part 4 — Autonomy & credit-economics (the real KPI)

The binding constraint is the maintainer spends Claude credit every week manually getting Hypatia to (a) do its job and (b) learn anything. Success is therefore economic, not just green checks:

  • North-star KPI: human/LLM interventions per estate-green-week → ~0; finding reopen-rate trending monotonically to 0; credit spend on estate maintenance flat or falling week-over-week. These are dashboardable from the outcome table.
  • Learning must be cheap and symbolic-by-default. Every adjudication (neural or human) crystallises into the exception_registry so it is never re-reasoned. The expensive LLM path runs only on genuinely novel, ambiguous findings — bounded, logged, and amortised to a one-time cost per pattern. A finding class adjudicated once must never burn credit again.
  • Self-healing without a human in the weekly loop. The closed loop (Sense→Classify→Act→Verify→Learn) runs on repository_dispatch/cron entirely in Actions; humans are paged only on (i) below-confidence novel findings or (ii) a fix that broke CI. "World ends, only Actions survives" ⇒ the loop must need no local machine, no interactive Claude, no .git-private-farm online — it degrades to symbolic-registry-only and still keeps every repo green.
  • .git-private-farm role: durable, offline-survivable mirror of the exception_registry + outcome history so learning is not lost if GitHub state is reset — the "rebuild from DNA" substrate.

Part 5 — CI/CD adequacy & anti-redundancy audit

Goal: 100% CI success every run, covering corrective + adaptive + perfective maintenance, with no pointless/redundant workflows (genuine triangulation is kept).

  • Coverage check: map the standard workflow set against {build, test, lint/format, SAST, secret-scan, dependency-update, supply-chain/provenance, license/SPDX, compliance/standards-drift, dead-config detection}. Flag any axis with zero effective (not merely nominal) coverage — the chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72 lesson generalised: a workflow that runs but verifies nothing is a coverage gap, not coverage.
  • Redundancy/irrelevance check: identify workflows that are dead config for the repo's actual languages (e.g., a JS/TS CodeQL on an Ada repo, npm/bun blockers where no JS exists as enforcement vs as guardrail — keep guardrails, cut dead analyzers), duplicate scanners with no triangulation value, and workflows whose failure can never gate anything. Distinguish triangulation (keep) from redundancy (cut): triangulation = independent methods confirming the same property; redundancy = same method, no added assurance.
  • Deliverable: a per-repo CI adequacy scorecard Hypatia computes itself (new rule class), feeding the same reconciliation loop so "inadequate or wasteful CI" becomes an auto-tracked, auto-PR'd finding like any other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmajorLoad-bearing / requirements-level workrequirements-targetTracks a requirement; PRs Refs not Closes; joint-close only

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions