Skip to content

ci(workflows): add timeout-minutes to every job + fix changelog YAML#360

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/workflow-hardening
Jun 3, 2026
Merged

ci(workflows): add timeout-minutes to every job + fix changelog YAML#360
hyperpolymath merged 1 commit into
mainfrom
claude/workflow-hardening

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Workflow-hygiene hardening

Follow-up to the Hypatia workflow_audit findings that kept surfacing on #358, handled as a separate PR per owner request (keeps #358 focused on the A2ML/K9 foundational layer).

What changed

1. timeout-minutes on every job that lacked one (37 step-jobs, 21 files)
Matches the existing repo convention (governance-reusable.yml: "every job carries timeout-minutes so a transient hang fails fast"):

  • 10 min — policy / lint / dispatch / summary jobs (doc-format, makefile-blocker, no-js-scan, licence-consistency, instant-sync, lockstep, scorecard-enforcer checks, trust-summary, fast secret greps).
  • 20 min — build / compile / analyze / network / security-scan jobs (affinescript, casket-pages, codeql, deno-ci, elixir, rust-ci ×4, mirror ×7, hypatia-scan, gitleaks/trufflehog, scorecard analysis, spark-theatre-gate).

Reusable-workflow caller jobs (governance.yml, mirror.yml, scorecard.yml, secret-scanner.yml) are intentionally left without timeout-minutes — it is invalid on a job that delegates via uses:; their timeouts live in the called reusable workflows (which this PR fixes).

2. changelog-reusable.yml — fixed unparseable YAML (root bug found during the sweep)
The commit-message body (commit-back mode) and PR body (pr-back mode) had continuation lines at column 0, which terminated the run: | block scalar and left the whole file as invalid YAML — GitHub could not load the workflow at all. Indented the continuations back into the block; the file now parses and its generate job gained a timeout too.

Notes

  • unpinned_action findings are false positives. Every uses: in the tree is already pinned to a full 40-char SHA (e.g. actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2). Hypatia truncates the SHA to @de0f in its own message and mis-flags it — that's a rule bug in the hypatia repo, not a repo problem here. No change needed.
  • boj-build.yml deliberately untouched. It has a separate pre-existing YAML bug — two orphaned steps (K9-SVC Validation, Contractile Check) indented at job level instead of step level, making the file invalid. The fix is ambiguous (the steps can't simply join trigger-boj, which is gated on if: BOJ_SERVER_URL configured), so it's raised with the owner separately rather than guessed at here.

Validation

  • All 27 touched/other workflows parse cleanly (boj-build.yml excepted, by design).
  • Zero step-jobs remain without a timeout-minutes.

🤖 Draft.

https://claude.ai/code/session_01XZhw6Fq27eoeyEB4LR3a2c


Generated by Claude Code

Workflow-hygiene hardening (follow-up to the Hypatia workflow_audit
findings surfaced on #358, handled as a separate PR per owner request).

* timeout-minutes on all 37 step-jobs that lacked one, matching the repo
  convention (10 for policy/lint/dispatch jobs, 20 for build/compile/
  analyze/scan/network jobs; see governance-reusable.yml). Reusable-
  workflow *caller* jobs (governance/mirror/scorecard/secret-scanner)
  are intentionally left without timeout-minutes — it is invalid on a
  job that delegates via 'uses:'; their timeouts live in the called
  reusable workflows.

* changelog-reusable.yml: fix two invalid block scalars. The commit-
  message body (commit-back mode) and PR body (pr-back mode) had
  continuation lines at column 0, which terminated the 'run: |' block
  and left the file unparseable YAML — GitHub could not load the
  workflow at all. Indented the continuations back into the block;
  the file now parses, and its 'generate' job gained a timeout too.

Note: the Hypatia 'unpinned_action' findings are false positives — every
'uses:' in the tree is already pinned to a full 40-char SHA (Hypatia
truncates the SHA to '@de0f' in its own message). No action needed there.

boj-build.yml is deliberately untouched: it has a separate pre-existing
YAML bug (orphaned steps at job indent) whose fix is ambiguous; raised
with the owner separately.

https://claude.ai/code/session_01XZhw6Fq27eoeyEB4LR3a2c
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🔍 Hypatia Security Scan

Findings: 185 issues detected

Severity Count
🔴 Critical 64
🟠 High 62
🟡 Medium 59

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action for the check script)\n        uses: actions/checkout@de0f needs attention",
    "type": "unpinned_action",
    "file": "governance-reusable.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action for the check script)\n        uses: actions/checkout@de0f needs attention",
    "type": "unpinned_action",
    "file": "governance-reusable.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in governance.yml",
    "type": "missing_timeout_minutes",
    "file": "governance.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in mirror.yml",
    "type": "missing_timeout_minutes",
    "file": "mirror.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in scorecard.yml",
    "type": "missing_timeout_minutes",
    "file": "scorecard.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in secret-scanner.yml",
    "type": "missing_timeout_minutes",
    "file": "secret-scanner.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in scorecard-enforcer.yml",
    "type": "scorecard_publish_with_run_step",
    "file": "scorecard-enforcer.yml",
    "action": "split_scorecard_publish_job",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in mirror-reusable.yml",
    "type": "secret_action_without_presence_gate",
    "file": "mirror-reusable.yml",
    "action": "webfactory/ssh-agent",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in mirror-reusable.yml",
    "type": "secret_action_without_presence_gate",
    "file": "mirror-reusable.yml",
    "action": "webfactory/ssh-agent",
    "rule_module": "workflow_audit",
    "severity": "high"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 3, 2026 21:55
@hyperpolymath hyperpolymath merged commit 6cd3772 into main Jun 3, 2026
21 checks passed
@hyperpolymath hyperpolymath deleted the claude/workflow-hardening branch June 3, 2026 21:56
hyperpolymath pushed a commit that referenced this pull request Jun 3, 2026
…j-build.yml

Builds on #360 (which added timeouts to the top-level workflows) by extending coverage to the embedded sub-project workflows and fixing one malformed file #360 left untouched:

- Add timeout-minutes to every runs-on job across all 355 embedded sub-repo workflows (660 jobs / 312 files). Reusable-workflow-call jobs excluded (invalid there).

- boj-build.yml: two steps (K9-SVC Validation, Contractile Check) were outdented to sequence items under jobs:, making it invalid YAML. Removed them (redundant with governance/hypatia; referenced a non-canonical 'lust' contractile) and added timeout-minutes to trigger-boj.

Verified: all 383 workflow files parse; 0 runs-on jobs without a timeout. changelog-reusable.yml left as-is (already fixed by #360).

https://claude.ai/code/session_01AmPXB2dA2wCcabo8BXwS28
hyperpolymath pushed a commit that referenced this pull request Jun 3, 2026
#360 added timeout-minutes estate-wide and fixed changelog-reusable.yml, but
could not touch boj-build.yml because it does not parse: two trailing steps
(K9-SVC Validation, Contractile Check) had dropped to 2-space indentation,
out of the 'steps:' list, so GitHub fails to compile it on every push
(startup_failure, 0 jobs). Re-indent those steps to 6 spaces and add the
job-level timeout-minutes the estate convention expects.

The job stays gated on 'if: vars.BOJ_SERVER_URL != ...', so it skips cleanly
when unconfigured. Validated with PyYAML. Addresses the repair side of
standards#331. No SPDX header or licence content touched.

https://claude.ai/code/session_011xv3VLrqeXkpjXxUojKz82
hyperpolymath pushed a commit that referenced this pull request Jun 3, 2026
#360/#361 added timeout-minutes and fixed the malformed-YAML workflows, but
the two startup_failure classes that fail BEFORE any job runs remained on
main. Fix them at the source:

secrets-in-if (Unrecognized named-value: 'secrets' at parse time):
- boj-build.yml: drop the job-level `if` (the Trigger step already self-gates
  on an empty BOJ_URL, so the job is a cheap no-op when unconfigured).
- instant-sync.yml, mirror-reusable.yml (mirror-radicle): map the secret to
  job-level `env` and gate the step `if`s on `env.*`. `env` is available in
  step `if:` and secrets are valid in job-level `env`; this is the canonical
  presence-gate pattern and is valid regardless.

hashFiles() in a job-level `if` (evaluated server-side before checkout, so it
always returns '' and silently skips the job on every repo):
- rust-ci-reusable.yml, elixir-ci-reusable.yml: add a `detect` job that
  checks out and exports a real boolean (has_cargo / has_mix); the real jobs
  gate on `needs.detect.outputs.*`. Behaviour preserved; the guard now works.

Rebased onto current main (bec6161). changelog-reusable.yml and boj-build's
malformed YAML are already fixed upstream, so they drop out of this PR.

Consumers pin the reusables by SHA, so callers are unaffected until they bump.
hyperpolymath added a commit that referenced this pull request Jun 3, 2026
## Scan the `actions` language with CodeQL

Adds an `actions` entry to the CodeQL matrix (`build-mode: none`) so
CodeQL scans the repo's GitHub Actions workflow files for security
issues, alongside the existing `javascript-typescript` analysis.

Resolves the Hypatia `workflow_audit` finding
`codeql_missing_actions_language` (surfaced on #362 once the
medium-severity timeout noise was cleared).

```diff
         include:
           - language: javascript-typescript
             build-mode: none
+          - language: actions
+            build-mode: none
```

### Why a separate PR
Per owner request — this enables a **new analysis target**, isolated
from the workflow-hygiene work (#360, #362) so the new scan is easy to
review on its own. It may surface new code-scanning alerts on existing
workflows; those would be triaged normally.

### Also in this PR: registry sync (required to pass `registry-verify`)
This branch was cut before #356 + #361 landed on `main`. After merging
`main` in, the `registry-verify` gate failed because **`main`'s
committed `.machine_readable/REGISTRY.a2ml` is itself stale** — #361's
estate-wide workflow edits changed files under tracked spec-home dirs
without re-running `just registry`, leaving 9 `source_hash` entries out
of date. Regenerated deterministically via `scripts/build-registry.sh`;
`--check` now exits clean. (Heads-up: `main`'s own `registry-verify` is
likely red until this lands or a dedicated regen does.)

PR diff vs `main`: `codeql.yml` (+2) and the regenerated
`REGISTRY.a2ml`.

🤖 Draft.

https://claude.ai/code/session_01XZhw6Fq27eoeyEB4LR3a2c

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 3, 2026
…hFiles-in-job-if) (#363)

## What

Fixes the workflow **`startup_failure`** classes that fail *before any
job runs* and were **left on `main` after #360/#361**. Rebased onto
current `main` (`bec6161`); now scoped to exactly the two remaining root
causes.

## Why this is still needed after #360 / #361

#360 added `timeout-minutes` + fixed changelog YAML; #361 added more
`timeout-minutes` + fixed two malformed workflows. Neither addressed the
two `startup_failure` causes below — I re-verified both are still
present on `bec6161`. (Where they overlapped, the upstream fixes are
kept: `changelog-reusable.yml` and `boj-build.yml`'s malformed YAML are
already fixed upstream, so they drop out of this PR.)

## Fixes

### 1. `secrets` context in `if:` → *"Unrecognized named-value:
'secrets'"* (parse-time startup failure)

| File | Before | Fix |
|---|---|---|
| `boj-build.yml` | job-level `if: …secrets.BOJ_SERVER_URL…` | **Drop
the job `if`** — the Trigger step already self-gates (exits 0 on empty
`BOJ_URL`), so the job is a cheap no-op when unconfigured. |
| `instant-sync.yml` | step `if: secrets.FARM_DISPATCH_TOKEN …` (×2) |
Map secret → job-level `env`, gate steps on `env.FARM_DISPATCH_TOKEN`. |
| `mirror-reusable.yml` (`mirror-radicle`) | step `if:
secrets.RADICLE_KEY …` (×4) | Map secret → job-level `env`, gate steps
on `env.RADICLE_KEY`. |

`env` **is** available in step `if:` and secrets **are** valid in
job-level `env` — this is the canonical presence-gate pattern and is
correct whether or not `secrets`-in-`if` parses, so it can't regress a
working file.

### 2. `hashFiles()` in a **job-level** `if:` (always `''` → job
silently skipped on every repo)

Job `if:` is evaluated server-side **before checkout**, so
`hashFiles(...)` sees an empty workspace and returns `''` — meaning
`rust-ci`/`elixir-ci` skipped on **every** consumer, Rust/Elixir or not.

- `rust-ci-reusable.yml` (×4) and `elixir-ci-reusable.yml` (×1): add a
small **`detect`** job that checks out and exports a real boolean
(`has_cargo` / `has_mix`); the real jobs gate on
`needs.detect.outputs.*`. Behaviour preserved; the guard actually works
now. (Step-level `with: hashFiles(mix.lock)` for caching is untouched —
that runs on the runner and is fine.)

## Verification

- All 5 files parse as YAML.
- `git grep 'if:.*secrets\.'` over all workflows → **none**.
- `git grep 'if:.*hashFiles'` over all workflows → **none** (step-level
`with:` usages remain, correctly).

## Known Hypatia scan findings (no action — see below)

The non-blocking Hypatia scan reports
`secret_action_without_presence_gate` on:

- **`instant-sync.yml` · `peter-evans/repository-dispatch` (high)** —
**false-positive introduced by this PR's correct fix.** The rule
pattern-matches a literal `secrets.X != ''` gate on the step; I replaced
it with `env.FARM_DISPATCH_TOKEN != ''` precisely because
`secrets`-in-`if` is the `startup_failure` being fixed. The presence
gate genuinely still exists (env mapped from the secret) — Hypatia just
doesn't recognise the env-indirection. Satisfying the rule would mean
reintroducing the real startup failure, so **GitHub correctness wins**.
The proper root fix is to teach Hypatia's
`secret_action_without_presence_gate` rule to accept an
`env`-mapped-from-`secret` gate (separate change in the `hypatia` repo).
- **`mirror-reusable.yml` · `webfactory/ssh-agent` (high, ×2)** —
pre-existing (GITEA key), unrelated to this PR; a hygiene flag, not a
`startup_failure`.

## Registry-verify on this PR

The `Registry + topology in sync` check is red here, but **not because
of this PR** — these five edits are all top-level `.github/workflows/*`
files (no spec home), adding zero registry drift. It's the
**pre-existing `main` drift that #366 fixes** (inherited because this
branched off `main`). Once #366 lands, a rebase clears it. Deliberately
not regenerating the registry here, to avoid colliding with #366.

Consumers pin the reusables by SHA, so callers are unaffected until they
bump their pin.

https://claude.ai/code/session_0178nMYCNXgotTeekePkoUjd

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 3, 2026
…362)

## Workflow hardening — round 2

Follow-up to #360, closing the remaining *real* `workflow_audit`
findings (the rest were false positives — see below). Owner decisions:
**new separate job** for boj-build, **harden all 6** mirror jobs.

### `boj-build.yml` — fix invalid YAML + correct the validation
The `K9-SVC Validation` and `Contractile Check` steps were indented at
**job** level instead of **step** level, so the file was **invalid YAML
and GitHub could not load the workflow at all**.

- Moved both steps into a new **ungated** job `validate-contractiles`
that runs on every push. (They validate *this repo's* contractile set,
so they must not sit behind `trigger-boj`'s `if: BOJ_SERVER_URL
configured` guard.)
- Both jobs gained `timeout-minutes: 10`.
- The completeness check was also **wrong**: it hard-coded `must trust
dust lust adjust intend` — but `lust` doesn't exist, `bust` was missing,
and intend's file is `Intentfile.a2ml` (not `Intendfile`). It now
**reads the canonical verb set from the registry**
(`.machine_readable/contractiles/INDEX.a2ml`), exactly as INDEX
instructs consumers to do — so it's correct now and self-maintaining as
verbs change.
- **Verified green locally:** all 6 canonical contractiles (`adjust,
bust, dust, intend, must, trust`) present; script exits 0.

### `mirror-reusable.yml` — secret-presence gate on the 6 SSH mirrors
The gitlab/bitbucket/codeberg/sourcehut/disroot/gitea jobs ran
`webfactory/ssh-agent` with an inherited SSH key gated only on
`vars.<FORGE>_MIRROR_ENABLED`. An *enabled-but-unconfigured* repo fed
ssh-agent an empty key and **failed**. Applied the existing **Radicle
pattern** already in this file:
- gate the `ssh-agent` + push steps on `secrets.<FORGE>_SSH_KEY != ''`
- add a `Skipped (<FORGE>_SSH_KEY not configured)` step on `== ''` that
emits an actionable `::notice::` and ends the job cleanly.

### On the findings *not* addressed (deliberately)
- **`unpinned_action`** (governance-reusable.yml) — false positive.
Every `uses:` is already pinned to a full 40-char SHA; Hypatia truncates
it to `@de0f` in its own message.
- **`missing_timeout_minutes`** on governance.yml / mirror.yml /
scorecard.yml / secret-scanner.yml — false positives. These are
reusable-workflow **caller** jobs; `timeout-minutes` is **invalid** on a
`uses:` job. Their timeouts live in the called workflows (fixed in #360
+ here).
- **`scorecard_publish_with_run_step`** (scorecard-enforcer.yml) — false
positive. The `scorecard` job that publishes already contains only
`uses:` steps; the `run:` steps are in separate jobs. Already compliant
(the author documented the split).

### Validation
- All 28 workflows parse cleanly.
- Every step-job has a `timeout-minutes`.
- mirror-reusable: 15 `!= ''` gates (6 ssh-agent + 6 push + 3 radicle)
and 7 skip-notice steps.

🤖 Draft.

https://claude.ai/code/session_01XZhw6Fq27eoeyEB4LR3a2c

---
_Generated by [Claude
Code](https://claude.ai/code/session_01XZhw6Fq27eoeyEB4LR3a2c)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants