CI Token Best Practices Sweep by swahtz · Pull Request #672 · openvdb/fvdb-core

swahtz · 2026-06-29T05:53:59Z

This PR hardens our GitHub Actions CI, centered on the admin-scoped token used to register self-hosted EC2 runners. It migrates that token to a fine-grained, purpose-scoped credential; completes a zizmor/actionlint audit pass across all workflows (pin actions to commit SHAs, scope id-token permissions, justified trigger exemptions); and adds a new Workflow Security gate that enforces — automatically, on every PR — that the runner token can only ever be used as the github-token input to machulav/ec2-github-runner.

Motivation

secrets.EC2_RUNNER_TOKEN has Administration: Read/Write on the repository — it has to, so machulav/ec2-github-runner can register and de-register runners. Our CI runs on pull_request_target, which is required to provision runners for PRs (by making the EC2_RUNNER_TOKEN available to PRs). This PR reduces the blast radius of that token and makes its safe usage a checked, enforceable invariant rather than a convention.

The specific threat the new gate stops

A single PR cannot exfiltrate the token: under pull_request_target the workflow definition runs from the base branch (not the PR), and the token only ever lives in the runner start/stop jobs, which run on GitHub-hosted runners and never check out or execute PR code. The real risk is a two-stage, time-of-merge attack: a PR adds the token to a code-running job / env: / run: step (dormant — the base definition runs on the PR, so it looks harmless), a maintainer merges it, and the next run executes the now-poisoned base definition and leaks the token. The new scan catches that dangerous edit against the PR's proposed workflow files before it can land on main.

What's in this PR

1. Token hardening

Replace GH_PERSONAL_ACCESS_TOKEN with the fine-grained EC2_RUNNER_TOKEN for starting/stopping EC2 runners across tests.yml, cu128.yml, cu130.yml, publish.yml, and nightly-publish.yml.
Remove unnecessary token usage from sync-doc-version.yml and only sync doc versions on publish.

2. zizmor / actionlint audit hardening

Pin every third-party action to a commit SHA across all workflows.
General zizmor audit fixes (credential handling, permissions, template-injection-prone patterns).
Scope id-token: write (AWS OIDC) down to only the jobs/steps that actually assume an AWS role, instead of granting it workflow-wide.
Add justified zizmor: ignore[dangerous-triggers] annotations on the pull_request_target triggers that are genuinely required to provision runners.

3. New Workflow Security gate

.github/workflows/workflow-security.yml — runs on every PR (pull_request_target) and on push to main. Three scans: the repo-specific token policy (+ its unit tests), actionlint, and zizmor.
.github/scripts/check_runner_token_policy.py — the repo-specific policy (rules below). Passes cleanly on all current workflows.
.github/scripts/test_check_runner_token_policy.py — unit tests for the policy (compliant baseline, each violation class, the real repo workflows, and a fail-closed test for the leak check). Run by the Workflow Security job on every PR; needs only pyyaml + pytest, no fvdb build.
Added permissions: contents: read to codestyle.yml and docs-build-test.yml so the whole workflow set passes zizmor at full strictness (no severity-threshold weakening).

The token policy (enforced by `check_runner_token_policy.py`)

The token name may appear only inside .github/workflows/*.{yml,yaml} (plus the enforcement script and its tests) — never in product source, etc.
Every textual occurrence in a workflow must be exactly github-token: ${{ secrets.EC2_RUNNER_TOKEN }}. This single rule forbids putting the token in env:, GH_TOKEN/GITHUB_TOKEN, with.token, a run: script, or a reusable-workflow secrets: block.
The step that consumes the token must uses: machulav/ec2-github-runner.
A job that references the token must not pull untrusted code into its workspace alongside the privileged context: no local actions (uses: ./...) and no actions/checkout.

Why `pull_request_target` (not `pull_request`) for the gate

Under pull_request_target the workflow definition, the policy script, and the scanner config all come from the base branch, so a malicious PR cannot edit the check to make it pass. The job overlays only the PR head's proposed .github/workflows files as inert data to scan — no PR code is executed and no secrets are exposed (permissions: contents: read).

Required follow-up (repository admin — not code)

The scan only stops the two-stage attack if it blocks the merge. After this merges, make the check Scan workflows + enforce runner-token policy a required status check on main (and any release branches), via a branch ruleset or classic branch protection, and do not allow admins to bypass it. Because the workflow triggers on pull_request_target for every branch, it runs on every PR, so requiring it will not leave PRs stuck "waiting for status". Optionally add a CODEOWNERS entry on .github/ so a human also reviews workflow diffs.

Copilot

Pull request overview

This PR hardens the repository’s GitHub Actions CI around the privileged EC2 runner registration token by tightening token usage patterns, reducing workflow permissions, pinning third-party actions to immutable SHAs, and introducing a dedicated “Workflow Security” gate that scans workflow changes on every PR.

Changes:

Replaced the broad admin PAT usage with a purpose-scoped EC2_RUNNER_TOKEN for EC2 runner lifecycle steps, and reduced token usage elsewhere.
Performed a zizmor/actionlint hardening sweep: pin actions by SHA, narrow id-token: write to only AWS-OIDC jobs, and add justified dangerous-trigger exemptions.
Added a new workflow-security workflow plus a repo-specific policy script to enforce safe runner-token usage patterns.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
.github/workflows/workflow-security.yml	New PR/merge gate to scan workflows (token-policy + actionlint + zizmor).
.github/scripts/check_runner_token_policy.py	New repo-specific enforcement script for runner-token safety constraints.
.github/workflows/tests.yml	Uses `EC2_RUNNER_TOKEN`, pins actions, and scopes OIDC permissions to runner lifecycle jobs.
.github/workflows/cu128.yml	Same hardening as tests workflow for CUDA 12.8 CI.
.github/workflows/cu130.yml	Same hardening as tests workflow for CUDA 13.0 CI.
.github/workflows/publish.yml	Pins actions, scopes OIDC permissions, and switches runner token usage to `EC2_RUNNER_TOKEN`.
.github/workflows/nightly-publish.yml	Pins actions, scopes OIDC permissions, and switches runner token usage to `EC2_RUNNER_TOKEN`.
.github/workflows/nightly.yml	Pins actions and reduces template-injection-prone patterns in shell steps.
.github/workflows/sync-doc-version.yml	Changes trigger to release-published; removes PAT usage and pins actions.
.github/workflows/load-versions.yml	Pins checkout action and disables credential persistence.
.github/workflows/docs-build-test.yml	Adds explicit read-only permissions and pins setup/checkout actions.
.github/workflows/codestyle.yml	Adds explicit read-only permissions and pins third-party actions.
.github/workflows/docs.yml	Pins Pages deployment-related actions to commit SHAs.
.github/workflows/issue-triage.yml	Reduces expression interpolation in shell by passing values via env vars.
.github/workflows/check-changes.yml	Pins `dorny/paths-filter` to a commit SHA.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove unnecessary token usage Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

…d EC2_RUNNER_TOKEN for starting EC2 instances Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

…RUNNER_TOKEN misuse in a PR Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

- Add the standard OpenVDB copyright header to check_runner_token_policy.py - check_no_leaks_outside_workflows() now fails closed on unexpected `git grep` exit codes (e.g. when run outside a git worktree) instead of silently passing, so Rule 1 cannot be weakened by an environment quirk - Add test_check_runner_token_policy.py (11 cases: compliant baseline, every violation class, the real repo workflows, and the leak check failing closed) and run it from the Workflow Security job; it needs only pyyaml + pytest - Broaden the token-name allow-list to cover .github/scripts/ tooling and the security doc (these reference the name, never the value) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.

Fix single-quoted shell env-var interpolations introduced by the template-injection hardening (single quotes prevent shell expansion, so the literal ${NEEDS_VERSIONS_OUTPUTS_*} was being passed through): - --cuda-arch-list in cu128, cu130, cu130-nightly, publish, nightly-publish - the gcc-toolset profile.d snippet in publish and nightly-publish These were correct when they were ${{ ... }} (render-time) but broke once they became ${VAR} (shell-time); switch the affected quotes to double quotes. Workflow Security gate: - Fetch the PR head via refs/pull/<n>/head instead of by SHA from origin, so the overlay works for forked PRs too. - Make the Rule 1 leak check scan the PR head commit tree (new --leak-check-ref, passed as FETCH_HEAD) so it also catches token references the PR adds outside .github/workflows, while the policy script still runs from the trusted base checkout (git grep only reads blobs). Add tests for the ref path (clean repo passes; missing ref fails closed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

@v5

…e tag - workflow-security.yml: pin ZIZMOR_VERSION to an exact 1.26.1 instead of the floating "1.*", so the security gate is deterministic and a new zizmor release can't start failing unrelated PRs. - check_runner_token_policy.py: treat a missing `git` (FileNotFoundError) as a Rule 1 violation rather than a silent skip, so the leak check always fails closed when it cannot verify confinement. Add a monkeypatched regression test. - docs.yml: correct the actions/deploy-pages SHA annotation from the misleading "# v3.0.2-node.24" to "# v5.0.0". The SHA is unchanged and is the same commit v5/v5.0.0 point to (it carries all three tags); this is not a downgrade, just an accurate annotation matching the prior @v5 usage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Pin the policy self-test dependencies to exact versions (pyyaml==6.0.3, pytest==9.0.3) via new PYYAML_VERSION/PYTEST_VERSION env vars, alongside the zizmor/actionlint pins, instead of the floating `pyyaml==6.*` / `pytest>=8,<9` ranges. Keeps the security gate deterministic so a new pyyaml/pytest release can't start failing unrelated PRs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

Rules 1-4 are textual and key on the literal name EC2_RUNNER_TOKEN. Dynamic secret indexing such as `secrets[format('EC2_RUNNER_%s', 'TOKEN')]` or `secrets[matrix.name]` could resolve the admin token without ever spelling its name, evading every rule (and the leak grep). Add Rule 5: reject any dynamic `secrets[...]` access in a workflow. It runs on every workflow file (even ones that never name the token), uses a \b anchor so identifiers merely ending in "secrets" aren't matched, and the repo uses no dynamic secret access today. Adds four tests (format(), matrix index, token-name-absent, and a negative case for the \b anchor). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

The workflow now triggers on `release: published`, where actions/checkout defaults to the tag ref (detached HEAD). peter-evans/create-pull-request then has no branch to base the PR on and would fail or base it on the tag commit rather than current main. Pin the checkout to `ref: main` so the doc-version PR is always created from the default branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Rule 1's docstring and the three leak-check messages said the token must be confined to `.github/workflows/`, but the implementation also allows `.github/scripts/` (where this script and its tests live). Introduce a single ALLOWED_PATHS_DESC derived from ALLOWED_PATH_PREFIXES and use it in all messages, and update the docstring, so failure output reflects the real allowed surface. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

harrism

Really nice. Great hardening. Nice tests.

## Problem The `Workflow Security` gate (merged in #672) is failing on real PRs (e.g. #673). Its `actionlint` step runs **shellcheck** on `run:` scripts whenever shellcheck is installed — it is on GitHub-hosted runners but not in many local setups, so the check passed locally yet fails on every PR with **157 info/style findings** across pre-existing workflow scripts repo-wide: | code | count | what | |---|---|---| | SC2086 | 146 | unquoted `$VAR` (word-splitting) | | SC2174 | 8 | `mkdir -p -m` mode only on deepest dir | | SC2012 | 2 | use `find` instead of `ls` | | SC2129 | 1 | grouped redirect style | There are **no** actual actionlint errors (syntax/type/injection) — purely the bundled shellcheck nits, almost all pre-existing and unrelated to the token work. ## Fix Set `SHELLCHECK_OPTS: "-e SC2086,SC2174,SC2012,SC2129"` on the actionlint step so the gate fails only on meaningful problems. **Every other shellcheck check stays active** — including `SC2016` (single-quote expansion, the exact bug class fixed during #672) — as do actionlint's own expression/injection checks and the token-policy script. The excludes can be dropped later once the scripts are properly quoted. ## Verification Reproduced locally with shellcheck installed: default `actionlint` → 157 findings / exit 1; with the excludes → **0 findings / exit 0**. `zizmor` and the token policy are unaffected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Jonathan Swartz <jonathan@jswartz.info> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

swahtz requested a review from a team as a code owner June 29, 2026 05:54

swahtz requested review from areidmeyer, Copilot and harrism June 29, 2026 05:54

Copilot started reviewing on behalf of swahtz June 29, 2026 05:57 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/scripts/check_runner_token_policy.py

Comment thread .github/scripts/check_runner_token_policy.py Outdated

Comment thread .github/scripts/check_runner_token_policy.py Outdated

swahtz added 7 commits June 29, 2026 18:05

Only sync doc versions on publish

5882082

Remove unnecessary token usage Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Replace usage of GH_PERSONAL_ACCESS_TOKEN token with more fine-graine…

9b325e0

…d EC2_RUNNER_TOKEN for starting EC2 instances Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Zizmor audit fixes

71c8e10

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

zizmor pin versions to hashes

7d618a6

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

scope id-token permissions to only steps that need them

45511a6

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

zizmor ignore pull_request_target triggers

de4d614

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Add a Workflow Security action which checks that there's been no EC2_…

4867997

…RUNNER_TOKEN misuse in a PR Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz force-pushed the ci_improvements branch from 90efd40 to 4867997 Compare June 29, 2026 06:07

swahtz changed the title ~~Ci improvements~~ CI Token Best Practices Sweep Jun 29, 2026

swahtz requested a review from Copilot June 29, 2026 06:18

Copilot started reviewing on behalf of swahtz June 29, 2026 06:19 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

swahtz requested a review from Copilot June 29, 2026 06:34

Copilot started reviewing on behalf of swahtz June 29, 2026 06:35 View session

swahtz added the CI Issues related to the Github actions CI/CD. For build issues use CMake/Build label Jun 29, 2026

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/publish.yml

Comment thread .github/workflows/nightly-publish.yml

swahtz requested a review from Copilot June 29, 2026 06:41

Copilot started reviewing on behalf of swahtz June 29, 2026 06:41 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/workflow-security.yml

Comment thread .github/scripts/check_runner_token_policy.py

Comment thread .github/workflows/docs.yml Outdated

swahtz requested a review from Copilot June 29, 2026 06:50

Copilot started reviewing on behalf of swahtz June 29, 2026 06:51 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/workflow-security.yml Outdated

swahtz requested a review from Copilot June 29, 2026 06:58

Copilot started reviewing on behalf of swahtz June 29, 2026 06:59 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/scripts/check_runner_token_policy.py

Potential fix for pull request finding

8a346f8

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz requested a review from Copilot June 29, 2026 07:05

Copilot started reviewing on behalf of swahtz June 29, 2026 07:05 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

swahtz requested a review from Copilot June 29, 2026 07:15

Copilot started reviewing on behalf of swahtz June 29, 2026 07:16 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/sync-doc-version.yml

swahtz requested a review from Copilot June 29, 2026 07:23

Copilot started reviewing on behalf of swahtz June 29, 2026 07:24 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/scripts/check_runner_token_policy.py Outdated

Comment thread .github/scripts/check_runner_token_policy.py

Comment thread .github/scripts/check_runner_token_policy.py Outdated

swahtz requested a review from Copilot June 29, 2026 10:05

Copilot started reviewing on behalf of swahtz June 29, 2026 10:06 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

harrism approved these changes Jun 30, 2026

View reviewed changes

swahtz merged commit 53ca3f1 into main Jun 30, 2026
40 checks passed

swahtz deleted the ci_improvements branch June 30, 2026 00:39

swahtz mentioned this pull request Jun 30, 2026

Workflow Security: scope bundled shellcheck to real issues #674

Merged

Uh oh!

Conversation

swahtz commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

The specific threat the new gate stops

What's in this PR

1. Token hardening

2. zizmor / actionlint audit hardening

3. New Workflow Security gate

The token policy (enforced by check_runner_token_policy.py)

Why pull_request_target (not pull_request) for the gate

Required follow-up (repository admin — not code)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

harrism left a comment

swahtz commented Jun 29, 2026 •

edited

Loading

The token policy (enforced by `check_runner_token_policy.py`)

Why `pull_request_target` (not `pull_request`) for the gate