Claude/charming hamilton 2n l pc#310
Merged
Merged
Conversation
Recipe matcher rejected every scorecard-source finding (~310 ecosystem-
wide), routing them to :control "no safe fix available" advisories.
Root cause: `lib/recipe_matcher.ex` filtered candidate recipes with
`"*" in langs or language in langs`. Two failure modes:
1. 12 recipes declared `languages: ["any"]` — never matched, since
`"any"` is not a sentinel the filter recognises and no repo has
`"any"` as its primary language.
2. 8 scorecard / workflow-file recipes declared `languages: ["yaml"]`
— never matched, since yaml is a workflow-file type, not any
repo's primary language. So `recipe-pin-dependencies`,
`recipe-fix-workflow-permissions`, etc. were unreachable for SC013/
SC018 findings — the exact rule families dominating the daily
remediation sweep.
Fix:
- `langs_match?/2` private helper accepts `"*"` and `"any"` as
synonymous language-agnostic sentinels.
- `effective_language_for/2` remaps the lookup language to `"yaml"`
for patterns whose `source` is `"scorecard"` or whose `category`
names a known workflow-file rule family (DependencyPinning,
TokenPermissions, DangerousWorkflow, etc.). The repo's primary
language is irrelevant for workflow-file findings.
- Applied to `best_recipe/2`, `category_match_recipe/2`, and
`fuzzy_match_recipe/2`.
Tests pin all three invariants. All 22 scorecard recipe `fix_script`
references already exist on disk in `scripts/fix-scripts/` — the bug
was purely in matcher reachability, not missing fix implementations.
Closes the dispatcher half of the "no security stuff being sorted"
symptom. Remaining M7 work (PAT for cross-repo dispatch, push fixes
to remotes) still needs operator action, but the manifests will now
carry populated fix_script fields for scorecard findings.
The baseline had drifted into pure historical risk: 71 accepted findings (31 critical, 40 high) generated before the #278 stale-escript fix and the wave of code_safety/security_errors cleanups landed. A fresh scan against the current tree finds 35 findings, all medium-or-lower: - 32 low (code_safety hot-path expects, ncl_docker_not_podman, workflow_audit missing-workflow, structural_drift, etc.) - 3 medium (git_state transient + structural_drift) - 0 critical, 0 high Most old baseline entries are either: - fixed in code (e.g. the believe_me at src/abi/RuleEngine.idr is now inline-suppressed with a documented `-- hypatia: allow` directive), - migrated/refactored (e.g. lib/direct_github_pr.ex no longer exists), - or were covered by the new total-Python-ban / scanner-soundness wave. Net effect: every gate threshold of "fail on critical|high above baseline" now starts from an empty critical/high ledger — net-new critical or high findings will stand out, which is what the baseline is supposed to enable. Generated with the canonical Elixir escript pipeline against this tree (no rule changes, just a snapshot refresh). Severity threshold "low" so the snapshot reflects the full advisory surface, not just gates.
The HYPATIA_DISPATCH_PAT was provisioned with read access to
secret-scanning alerts, code-scanning alerts, and Dependabot alerts.
Only Dependabot was actually being consumed (lib/rules/dependabot_alerts.ex,
DA001-DA004) — the other two alert surfaces were granted but unused.
Adds two new rule modules mirroring the DependabotAlerts shape:
lib/rules/secret_scanning_alerts.ex (SSA001-SSA004)
SSA001 — Open leaked-secret alerts (always :critical; staleness
surfaced in the reason for triage prioritisation).
SSA002 — Repo-level meta-finding when any open alert exists.
SSA003 — Stale open alerts past the 7-day rotation threshold.
SSA004 — Resolved alerts with no documented resolution vocabulary
(anything outside revoked/used_in_tests/pattern_deleted/
pattern_edited).
lib/rules/code_scanning_alerts.ex (CSA001-CSA004)
CSA001 — Open code-scanning alerts (CodeQL + third-party SARIF
including Hypatia's own `hypatia` category). Severity
mapped from `security_severity_level`/`severity` onto the
canonical four-bucket scale.
CSA002 — Severity summary (any critical, ≥5 high, or ≥10 total).
CSA003 — Stale open alerts (3/7/30/90 days by severity bucket).
CSA004 — Dismissed without documented reason.
Wires both into `Hypatia.CLI`:
- registered in `@all_rule_modules` so the default scan includes them,
- scan blocks emit normalised findings alongside the rest,
- `format_module_name/1` gives them display names,
- usage strings updated to list the new --rules tokens.
Workflow comment in `.github/workflows/hypatia-scan.yml` updated to
note that the existing `security-events: write` grant now covers all
three alert APIs, not just Dependabot. No new permissions needed.
Tests pin token-absent behaviour and the non-GitHub-remote error path
for each module's helpers.
PR #278 documented that the deployed escript had been silently dropping the Elixir/Erlang/Coq/Lean/Agda/Zig/F*/Ada code_safety pattern families for days because the binary was stale relative to the rule sources. "No findings" looks identical whether the code is clean or the rule is broken — that ambiguity is the soundness gap. Closes it with the simplest possible mechanism: for every rule the scanner is supposed to detect, keep a known-bad sample on disk, and assert in CI that the rule fires on its sample at the expected severity. A rule that goes silent (regex drift, file pruning, packaging regression, module rename) breaks the build instead of silently weakening the estate's security posture. Layout: test/soundness/ manifest.json -- rule -> fixture -> severity fixtures/code_safety/ believe_me.idr -- Idris2 sorry.lean -- Lean admitted.v -- Coq unsafe_coerce.hs -- Haskell obj_magic_ocaml.ml -- OCaml getexn_on_external.res -- ReScript unwrap_without_check.rs -- Rust transmute.rs -- Rust unsafe elixir_system_shell.ex -- THE PR#278 false-negative elixir_os_cmd.ex -- Elixir os.cmd elixir_code_eval.ex -- Elixir Code.eval shell_download_then_run.sh -- curl|bash agda_postulate.agda -- Agda zig_ptr_cast.zig -- Zig README.adoc -- how to add a fixture test/soundness_test.exs -- runner, @moduletag :soundness Manifest entries cover all the language families PR #278 specifically called out as having been silently dropped. The runner is data-driven: adding a rule means dropping a fixture + a manifest entry, no test code change. Hand-run against the current tree: 14/14 fixtures fire at the expected severity. The soundness gate is operational. Out of scope (next iteration): - End-to-end escript-build soundness (build the escript, run it against the fixture corpus -- exact PR #278 reproduction). The in-process test catches rule-definition regressions, but a packaging regression that strips a module would still slip through. - Fixtures for non-code_safety families (workflow_audit, cicd_rules, structural_drift, scorecard, dependabot_alerts, ...).
The OutcomeTracker.verify_fix/3 re-scan mechanism existed but its result
was discarded on the success path: clean re-scans produced no marker,
unclean re-scans were re-recorded as :false_positive without preserving
the "this was verification, not an organic failure" distinction. The
outcomes log had no way to answer "what fraction of this recipe's
'successes' were actually verified clean by post-fix re-scan?"
That's the closed-loop metric this commit adds.
lib/outcome_tracker.ex
record_outcome/4,5
Optional `metadata` map merges into the record (under the canonical
fields so a caller can't overwrite recipe_id/repo/file/outcome/
timestamp/bot by accident).
record_and_verify/5
Now persists the verification verdict on every branch:
verified -> success record with "verification" = "verified"
still_present -> success record with "verification" = "still_present"
PLUS a follow-up :false_positive record
(caused_by = "post_fix_rescan")
scan_failed -> success record with "verification" = "scan_failed"
verify: false -> outcome record with "verification" = "unverified"
The distinction between "scan_failed" and "unverified" matters: a
recipe is not penalised for being run in environments without
panic-attack.
verification_rate/2
For a recipe_id, returns counts {verified, still_present,
scan_failed, unverified} and a rate = verified / (verified +
still_present). scan_failed and unverified records are excluded
from the denominator so a low-verification-attempt environment
doesn't artificially deflate the rate. Returns :insufficient_data
below min_attempts.
recipe_health/1
Aggregates across every recipe with recorded outcomes. Returns a
list of maps with dispatches / successes / failures / FPs /
success_rate / verification breakdown / status, sorted so the
most actionable rows (quarantine_candidate, degraded) surface
first. Configurable thresholds.
lib/mix/tasks/hypatia.recipe_health.ex
mix hypatia.recipe_health [--format json] [--only-actionable]
Prints the report in a human-readable table or JSON.
test/recipe_health_test.exs
Pins the rate calculation (verified/still_present ratio, scan_failed
+ unverified excluded), the insufficient_data threshold, and the
healthy/degraded/quarantine_candidate status mapping.
Hand-run against the current outcomes log: 4 recipes found, all flagged
:insufficient_data because the historical log was written before the
verification marker existed. From the next `record_and_verify`-enabled
dispatch onwards, recipes will accumulate verification data and migrate
to :healthy / :degraded / :quarantine_candidate based on real evidence.
| @@ -0,0 +1,5 @@ | |||
| // SPDX-License-Identifier: MPL-2.0 | |||
🔍 Hypatia Security ScanFindings: 2 issues detected
View findings[
{
"reason": "Js.Dict deprecated -- use Dict (2 occurrences)",
"type": "deprecated_api",
"file": "/home/runner/work/hypatia/hypatia/test/soundness/fixtures/code_safety/getexn_on_external.res",
"action": "module_replace",
"rule_module": "migration_rules",
"severity": "high"
},
{
"reason": "Repository has 2 non-main remote branch(es). Policy: single main branch only.",
"type": "GS007",
"file": ".",
"action": "delete_remote_branches",
"rule_module": "git_state",
"severity": "medium"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.