Skip to content

tooling(scripts): add per-template sweep classifiers (#187/#190/#192/#193)#194

Open
hyperpolymath wants to merge 2 commits into
mainfrom
feat/sweep-classifiers
Open

tooling(scripts): add per-template sweep classifiers (#187/#190/#192/#193)#194
hyperpolymath wants to merge 2 commits into
mainfrom
feat/sweep-classifiers

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Durable tooling for the wrapper-sweep work that follows each of the four foundational reusable PRs filed today (#187 mirror, #190 secret-scanner, #192 codeql, #193 hypatia-scan).

Adds scripts/sweep-classifiers/:

What each classifier does

  1. Reads a paginated gh api /search/code JSON dump for the template
  2. Fetches each unique blob SHA exactly once (cached in $BLOBS_DIR)
  3. Classifies each blob (job-set match, line-count band, language matrix)
  4. Emits per-repo TSV: <repo>\t<sha>\t<class>\t<reason>\t<lines>\t<details>

Numbers produced across the four campaign templates

Template TRIVIAL / mechanical NEEDS_REVIEW Notable
mirror.yml 267/289 (92.4%) 22 16 slim 2-3 forge variants
secret-scanner 273/281 (97.2%) MISSING_SHELL_SECRETS 3 Only standards repo carries shell-secrets today
codeql 246/263 (93.5%) 17 11 custom 99-114-line workflows
hypatia-scan 249/255 (97.6%) 6 Pure propagation lag, no real customisation

Nested-path caveat (documented in README)

gh api /search/code with path:.github/workflows matches the path
PREFIX — monorepo nested workflow files (e.g.,
a2ml/bindings/deno/.github/workflows/hypatia-scan.yml) are EXCLUDED.
Verified for hypatia-scan: broader query without path: returns 704
results vs 255 path-filtered. The same effect likely applies to the
other three templates; sweep tooling must walk all
**/.github/workflows/<template>.yml paths.

Pattern

Same shape as scripts/apply-baseline.sh (paired with
scripts/tests/apply-baseline-test.sh) — committed durable tooling
rather than ephemeral /tmp scripts.

🤖 Generated with Claude Code

…-workflow campaign

Durable tooling for the wrapper-sweep work that follows each of the
foundational reusable PRs (#187 mirror, #190 secret-scanner, #192
codeql, #193 hypatia-scan).

Each classifier:
- reads a paginated `gh api /search/code` JSON dump
- fetches each unique blob SHA exactly once (cached in $BLOBS_DIR)
- emits per-repo TSV: <repo>\t<sha>\t<class>\t<reason>\t<lines>\t<details>

Classes vary per template but follow the same shape: TRIVIAL (canonical
match, mechanical wrapper) vs SLIM/MISSING/OLDER (propagation lag,
auto-upgrades on first run after wrapper merge) vs NEEDS_REVIEW
(custom workflow body, requires per-repo diff).

Numbers produced by these classifiers across the four campaign templates:
- mirror.yml      — 267/289 TRIVIAL (92.4%); 22 NEEDS_REVIEW
- secret-scanner  — 273/281 missing shell-secrets (97.2%); 1 TRIVIAL (standards itself)
- codeql          — 246/263 mechanical (93.5%); 17 NEEDS_REVIEW
- hypatia-scan    — 249/255 safe-to-standardize-up (97.6%); 6 NEEDS_REVIEW

README documents the path-filter caveat: `gh api /search/code` with
`path:.github/workflows` excludes monorepo-nested workflow files; the
broader `filename:` query (no path filter) catches them. For
hypatia-scan, the broader query returns 704 vs the 255 path-filtered
count — the ~449 nested copies also need wrappers when sweeps fire.
Same as #192 (codeql-reusable) — auto-merge enabled but zero workflow
runs against the head commit. Pushing empty commit to re-trigger CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant