You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OSSF Scorecard / code-scanning findings recur on every audit, estate-wide, and have for months. Concretely, modshells has 3 open alerts (Pinned-Dependencies #63, SAST #72, Maintained #44) that Hypatia has never closed. Root-causing them exposes a structural defect: Hypatia ingests findings into its own pipeline but never closes the loop on the authoritative GitHub alert lifecycle. It re-derives findings from local heuristics, fixes a subset, and never (a) reads the real open alerts, (b) records accepted exceptions, (c) verifies tool efficacy vs mere presence, or (d) dismisses non-actionable findings with a reason. So the same three classes leak forever.
This issue tracks the Targeted fixes + Scorecard↔GitHub reconciliation loop, plus a no-holds-barred design refinement so Hypatia's logic actually prevails.
Refs — joint-close only on explicit agreement (do not auto-close).
What Hypatia is supposed to do — and keeps failing to
Intended loop: scan → classify (safety triangle: eliminate/control/accept) → auto-fix where safe → learn → keep the estate continuously green.
Observed loop: scan (local heuristics) → classify → fix-a-subset. Missing: authoritative sense, verify, and learn-write-back. That open loop is the disease. Three leak classes:
(B) Design-correct / accepted exception (SLSA generator must stay on a semver tag — SHA-ref produces invalid provenance): Hypatia's only symbolic remediation (SHA-pin) is actively harmful, with no exception channel. → alert chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63.
Dark rule.WorkflowAudit.check_codeql_language_matrix_mismatch already encodes the correct fix (:switch_codeql_matrix_to_actions) but is gated Keyword.get(opts, :has_codeql_supported_language, true) — defaults true → silent no-op unless a caller computes language detection (it never does). (chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72)
Harmful canonical remediation, no exception.workflow_audit.ex:49 and security_errors.ex:291 map slsa-framework/slsa-github-generator@v2.1.0 → <sha>. Applying it breaks SLSA provenance. No exemption registry exists. (chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63)
No suppression memory. Nothing persists "this finding on this repo is accepted because X," keyed by a stable fingerprint, so anything dismissed re-fires next scan. (the recurrence engine itself)
Part 1 — Targeted fixes (sub-issues)
[ ] CodeQL efficacy + un-dark the matrix rule. Default has_codeql_supported_language to computed-from-linguist, not true. Make absence of a scannable language the trigger. Add a paired effective-SAST rule: finding when codeql.yml's language matrix ∌ any language the repo contains and ∌ actions. Recommend language: actions (always scannable; runs every commit).
[ ] SLSA / self-verifying-ref exemption registry. Remove the harmful SHA mapping for slsa-framework/slsa-github-generator. Introduce a first-class pin_exempt / must_track_tag registry (data, not code constants) for reusable workflows that self-verify github.ref. Emit an accept finding with rationale, never a fix.
[ ] Hypatia.ScorecardReconciler (the closer). Pull live code-scanning alerts via the GitHub API. For each: actionable → dispatch fix; accepted-exception → dismiss won't fix + rationale comment; non-actionable/informational → dismiss won't fix + rationale. Idempotent. Every dismissal recorded by stable (repo, rule, location_fingerprint) in a persisted exceptions store so re-scans do not re-open.
Part 2 — Neurosymbolic anti-recurrence (no holds barred)
Symbolic
Triage taxonomy generalization. Replace per-check booleans with a 4-axis classifier: {actionable_by_code?, remediation_is_safe?, effective_vs_nominal?, activity_only?}. Every Scorecard check and code-scanning rule maps to one cell; the cell deterministically selects the lifecycle action (fix / dismiss-accept / dismiss-info / open-and-alert). General (new rules slot in) and narrow (one unambiguous action per cell).
Detector splitting (sensitivity ⟂ specificity). For every "tool present" rule, add a paired "tool produced results in last N commits" rule (query analyses / check-runs API). Presence = specificity; efficacy = sensitivity. This is the generalisation of fix chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72.
Exception/known-good data, not code constants. Move @known_good_shas + exemptions into versioned data the learning loop can write.
Fingerprint-stable suppressions. Key everything on repo+rule+normalized-path+symbol, never line number (lines drift; fingerprints don't). This single change is the recurrence killer.
Neural
LLM exception adjudicator. When a finding's symbolic cell is ambiguous (e.g., a new third-party reusable workflow that might self-verify its ref), an LLM judges "is SHA-pinning safe or harmful?" using the alert help text + upstream README, proposing a lifecycle action + rationale. Neural proposes; the decision is crystallised into the symbolic exception registry and is symbolic thereafter.
Remediation-safety predictor. Small classifier over (action, fix-type, downstream-effect) trained on past outcomes (did the fix break CI? did the alert reopen?) to gate auto-fix vs accept.
Recurrence forecaster (the months-long-pain killer). Cluster findings by (rule, repo-class, root-cause-signature). When a cluster's reopen-rate exceeds threshold, escalate from per-repo fix to a template fix (PR into rsr-template-repo / v3-templater). The meta-signal is reopen-rate; the meta-action is template propagation. Attack the source, not the leaves.
Part 3 — Meta-layers, connections, data model (logic must prevail)
Closed-loop control architecture.Sense(authoritative = live GitHub alerts API) → Classify(taxonomy + neural adjudicator) → Act(fix | dismiss-with-reason | escalate-to-template) → Verify(re-query alert state next run) → Learn(write exception/outcome; update reopen-rate). Today only Sense(local) → Classify → Act(subset) exists. Add authoritative-Sense, Verify, Learn-write-back.
outcome(finding_fp, fix_pr, ci_result, reopened?, reopened_after_days) — feeds the forecaster
Connections. gitbot-fleet must carry lifecycle decisions + exceptions, not just raw findings, so one adjudicated SLSA exception auto-applies to all ~30 estate repos sharing that workflow fingerprint. Hypatia ↔ rsr-template-repo/v3-templater: forecaster opens template PRs. Event-driven (repository_dispatch on a new Scorecard run), not only weekly cron → responsiveness.
Logic-prevails invariants. Every neural decision must (a) reduce to a symbolic registry entry before enforcement, (b) be reversible, (c) carry a rationale string, (d) be confidence-gated with human escalation below threshold. Neural may propose and prioritise; only symbolic rules + registry may enforce.
Reactive companion (already in flight)
modshells: codeql.yml matrix javascript-typescript → actions (PR, Refs this issue); alerts #63 and #44 dismissed with written rationale (SLSA-tag-by-design; informational-activity-signal respectively).
Part 4 — Autonomy & credit-economics (the real KPI)
The binding constraint is the maintainer spends Claude credit every week manually getting Hypatia to (a) do its job and (b) learn anything. Success is therefore economic, not just green checks:
North-star KPI: human/LLM interventions per estate-green-week → ~0; finding reopen-rate trending monotonically to 0; credit spend on estate maintenance flat or falling week-over-week. These are dashboardable from the outcome table.
Learning must be cheap and symbolic-by-default. Every adjudication (neural or human) crystallises into the exception_registry so it is never re-reasoned. The expensive LLM path runs only on genuinely novel, ambiguous findings — bounded, logged, and amortised to a one-time cost per pattern. A finding class adjudicated once must never burn credit again.
Self-healing without a human in the weekly loop. The closed loop (Sense→Classify→Act→Verify→Learn) runs on repository_dispatch/cron entirely in Actions; humans are paged only on (i) below-confidence novel findings or (ii) a fix that broke CI. "World ends, only Actions survives" ⇒ the loop must need no local machine, no interactive Claude, no .git-private-farm online — it degrades to symbolic-registry-only and still keeps every repo green.
.git-private-farm role: durable, offline-survivable mirror of the exception_registry + outcome history so learning is not lost if GitHub state is reset — the "rebuild from DNA" substrate.
Part 5 — CI/CD adequacy & anti-redundancy audit
Goal: 100% CI success every run, covering corrective + adaptive + perfective maintenance, with no pointless/redundant workflows (genuine triangulation is kept).
Coverage check: map the standard workflow set against {build, test, lint/format, SAST, secret-scan, dependency-update, supply-chain/provenance, license/SPDX, compliance/standards-drift, dead-config detection}. Flag any axis with zero effective (not merely nominal) coverage — the chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72 lesson generalised: a workflow that runs but verifies nothing is a coverage gap, not coverage.
Redundancy/irrelevance check: identify workflows that are dead config for the repo's actual languages (e.g., a JS/TS CodeQL on an Ada repo, npm/bun blockers where no JS exists as enforcement vs as guardrail — keep guardrails, cut dead analyzers), duplicate scanners with no triangulation value, and workflows whose failure can never gate anything. Distinguish triangulation (keep) from redundancy (cut): triangulation = independent methods confirming the same property; redundancy = same method, no added assurance.
Deliverable: a per-repo CI adequacy scorecard Hypatia computes itself (new rule class), feeding the same reconciliation loop so "inadequate or wasteful CI" becomes an auto-tracked, auto-PR'd finding like any other.
Summary
OSSF Scorecard / code-scanning findings recur on every audit, estate-wide, and have for months. Concretely,
modshellshas 3 open alerts (Pinned-Dependencies #63, SAST #72, Maintained #44) that Hypatia has never closed. Root-causing them exposes a structural defect: Hypatia ingests findings into its own pipeline but never closes the loop on the authoritative GitHub alert lifecycle. It re-derives findings from local heuristics, fixes a subset, and never (a) reads the real open alerts, (b) records accepted exceptions, (c) verifies tool efficacy vs mere presence, or (d) dismisses non-actionable findings with a reason. So the same three classes leak forever.This issue tracks the Targeted fixes + Scorecard↔GitHub reconciliation loop, plus a no-holds-barred design refinement so Hypatia's logic actually prevails.
Refs— joint-close only on explicit agreement (do not auto-close).What Hypatia is supposed to do — and keeps failing to
Intended loop: scan → classify (safety triangle: eliminate/control/accept) → auto-fix where safe → learn → keep the estate continuously green.
Observed loop: scan (local heuristics) → classify → fix-a-subset. Missing: authoritative sense, verify, and learn-write-back. That open loop is the disease. Three leak classes:
Root-cause defects in the symbolic engine
WorkflowAudit.check_codeql_language_matrix_mismatchalready encodes the correct fix (:switch_codeql_matrix_to_actions) but is gatedKeyword.get(opts, :has_codeql_supported_language, true)— defaults true → silent no-op unless a caller computes language detection (it never does). (chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72)ScorecardIngestormaps checks into Hypatia's pipeline but there is no write path back to GitHub's code-scanning alert state (no dismiss/accept). Non-fixable + exception findings re-accumulate as "open" every audit. (meta / chore(deps): bump trufflesecurity/trufflehog from 3.92.5 to 3.93.0 #44 / chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63)ScorecardIngestor.check_sastgreps for the string"codeql". modshells hascodeql.yml, so Hypatia believes SAST is satisfied while Scorecard reports0/7 commits checked. (chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 #72)workflow_audit.ex:49andsecurity_errors.ex:291mapslsa-framework/slsa-github-generator@v2.1.0 → <sha>. Applying it breaks SLSA provenance. No exemption registry exists. (chore(deps): bump ruby/setup-ruby from 1.288.0 to 1.289.0 #63)Part 1 — Targeted fixes (sub-issues)
has_codeql_supported_languageto computed-from-linguist, nottrue. Make absence of a scannable language the trigger. Add a paired effective-SAST rule: finding whencodeql.yml's language matrix ∌ any language the repo contains and ∌actions. Recommendlanguage: actions(always scannable; runs every commit).slsa-framework/slsa-github-generator. Introduce a first-classpin_exempt/must_track_tagregistry (data, not code constants) for reusable workflows that self-verifygithub.ref. Emit an accept finding with rationale, never a fix.Hypatia.ScorecardReconciler(the closer). Pull live code-scanning alerts via the GitHub API. For each: actionable → dispatch fix; accepted-exception → dismisswon't fix+ rationale comment; non-actionable/informational → dismisswon't fix+ rationale. Idempotent. Every dismissal recorded by stable(repo, rule, location_fingerprint)in a persisted exceptions store so re-scans do not re-open.Part 2 — Neurosymbolic anti-recurrence (no holds barred)
Symbolic
{actionable_by_code?, remediation_is_safe?, effective_vs_nominal?, activity_only?}. Every Scorecard check and code-scanning rule maps to one cell; the cell deterministically selects the lifecycle action (fix / dismiss-accept / dismiss-info / open-and-alert). General (new rules slot in) and narrow (one unambiguous action per cell).@known_good_shas+ exemptions into versioned data the learning loop can write.repo+rule+normalized-path+symbol, never line number (lines drift; fingerprints don't). This single change is the recurrence killer.Neural
rsr-template-repo/v3-templater). The meta-signal is reopen-rate; the meta-action is template propagation. Attack the source, not the leaves.Part 3 — Meta-layers, connections, data model (logic must prevail)
Sense(authoritative = live GitHub alerts API) → Classify(taxonomy + neural adjudicator) → Act(fix | dismiss-with-reason | escalate-to-template) → Verify(re-query alert state next run) → Learn(write exception/outcome; update reopen-rate). Today onlySense(local) → Classify → Act(subset)exists. Add authoritative-Sense, Verify, Learn-write-back.finding(id, repo, source, rule, location_fingerprint, first_seen, last_seen, state)lifecycle_decision(finding_fp, action, rationale, decided_by[symbolic|neural|human], decided_at, confidence)exception_registry(scope[repo|estate], rule, predicate, rationale, expiry?)— versioned, machine+human writableoutcome(finding_fp, fix_pr, ci_result, reopened?, reopened_after_days)— feeds the forecasterrepository_dispatchon a new Scorecard run), not only weekly cron → responsiveness.Reactive companion (already in flight)
modshells:codeql.ymlmatrixjavascript-typescript → actions(PR,Refsthis issue); alerts #63 and #44 dismissed with written rationale (SLSA-tag-by-design; informational-activity-signal respectively).Part 4 — Autonomy & credit-economics (the real KPI)
The binding constraint is the maintainer spends Claude credit every week manually getting Hypatia to (a) do its job and (b) learn anything. Success is therefore economic, not just green checks:
outcometable.exception_registryso it is never re-reasoned. The expensive LLM path runs only on genuinely novel, ambiguous findings — bounded, logged, and amortised to a one-time cost per pattern. A finding class adjudicated once must never burn credit again.repository_dispatch/cron entirely in Actions; humans are paged only on (i) below-confidence novel findings or (ii) a fix that broke CI. "World ends, only Actions survives" ⇒ the loop must need no local machine, no interactive Claude, no.git-private-farmonline — it degrades to symbolic-registry-only and still keeps every repo green..git-private-farmrole: durable, offline-survivable mirror of theexception_registry+outcomehistory so learning is not lost if GitHub state is reset — the "rebuild from DNA" substrate.Part 5 — CI/CD adequacy & anti-redundancy audit
Goal: 100% CI success every run, covering corrective + adaptive + perfective maintenance, with no pointless/redundant workflows (genuine triangulation is kept).