feat: Phase 4 (retry) — non-root runner + --ephemeral + hardcoded checksum table#26
Merged
kurok merged 1 commit intofeat/al2023-supportfrom Apr 21, 2026
Merged
Conversation
…cksum table Closes #20. Supersedes the reverted #18 / #19 / #21. Implements the full Phase 4 bootstrap hardening from issue #10, with the root-cause fix from #20 baked in. Key differences from the earlier failed attempts: ## The fix for the actual failure Previous attempts died at: curl -fsSL <tarball>.sha256 | awk '{print }' with a 404 (actions/runner doesn't publish per-tarball sidecar files, empirically confirmed via aws ec2 get-console-output on a probe instance — see #20). This PR replaces that with a hardcoded table of expected hashes in src/runner-checksums.js, keyed by 'arch-version'. Two x86_64 / arm64 entries for the currently-pinned v2.333.1, sourced from the release body at github.com/actions/runner/releases/tag/v2.333.1. CI enforces table-vs-upstream consistency on every PR (see pr.yml). ## Everything else from Phase 4 - Non-root 'runner' user (useradd -m, sudo -u runner -H bash heredoc). RUNNER_ALLOW_RUNASROOT=1 escape hatch removed. - New 'runner-version' input in action.yml (default '2.333.1'). To override, add matching x64+arm64 SHAs to runner-checksums.js in the same PR — verify-runner-url CI will reject the change if the hashes don't match upstream. - --ephemeral --unattended --disableupdate on config.sh. GitHub auto-deregisters the runner after its job; disableupdate keeps the binary stable during the short ephemeral session. - set -euo pipefail on both the outer and inner (runner-user) shells. The earlier fatal failure under set -e was the .sha256 404, which no longer exists. - Paramaterized RUNNER_VERSION / TARBALL / BASE bash vars. ## Tests tests/runner-checksums.test.js — 6 new cases covering the table shape, hex format, x64+arm64 parity per version, lookup returns for known/unknown keys. tests/config.test.js — 2 new cases for the runner-version input (default fallback + override). Total: 36 -> 44 tests. ## CI: verify-runner-url overhaul The job now parses the runner-version from action.yml, then: 1. HEADs the Linux x64 release asset (unchanged). 2. Fetches the release body via 'gh api'. 3. Greps the BEGIN SHA linux-x64 / linux-arm64 HTML comments. 4. Cross-checks against the values lookup() returns from src/runner-checksums.js. Drift between the hardcoded table and upstream fails CI at code- review time, not at runtime. ## Dogfood plan (MUCH more careful this time) Provider SHA-pin rotation after merge, same pattern as prior phases. This time I have full EC2 console-output diagnostic capability via the recipe saved in my notes — any new bootstrap failure should be trivially diagnosable rather than opaque. Closing #20 on merge. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
kurok
added a commit
to namecheap/terraform-provider-namecheap
that referenced
this pull request
Apr 21, 2026
…phemeral) (#188) namecheap/ec2-github-runner#26 merged. Phase 4 retry lands all requirements from the original issue #10, with the .sha256 sidecar 404 bug fixed by a hardcoded {arch-version → sha256} table kept in sync with upstream by a new CI check. Rotation: 6bb148b (Phase 6.a, IMDSv2) -> 0fdd401 (Phase 4 retry). Critical dogfood. If start-runner fails, I have the console-output recipe ready — diagnosis turnaround is minutes, not a day. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #20. Supersedes the reverted #18 / #19 / #21.
The fix for what actually broke previously
#18 and #19 died at:
with a 404 — actions/runner doesn't publish per-tarball sidecars, empirically confirmed via
aws ec2 get-console-output --lateston a probe EC2 (see #20 for full console log).This PR replaces that with a hardcoded
{arch-version → sha256}table insrc/runner-checksums.js. Two entries today (x86_64 + arm64 for v2.333.1). Cross-checked against the live release body on every PR via an overhauledverify-runner-urlCI job, so drift between the table and upstream fails at code-review time, not at runtime.Everything from the original Phase 4 issue (#10)
runneruseruseradd -m -s /bin/bash+sudo -u runner -H bash <<'RUNNER_BOOTSTRAP'RUNNER_ALLOW_RUNASROOT=1runner-versioninput, default2.333.1--ephemeralon config.sh--unattendedon config.sh--disableupdateon config.shset -euo pipefailuseraddif ! id runner …guardCI: verify-runner-url overhaul
Previously just HEADed the tarball URL. Now additionally:
runner-versiondefault fromaction.yml.gh api /repos/actions/runner/releases/tags/v<version>.BEGIN SHA linux-x64/BEGIN SHA linux-arm64HTML comments for the upstream hashes.src/runner-checksums.jsvianode -e 'require(...)...'for the committed hashes.So bumping
runner-versioninaction.ymlwithout also updating the table in the same PR → red CI. Bumping the table to incorrect values → red CI.Tests
tests/runner-checksums.test.js— 6 new cases (table shape, hex format, per-version x64/arm64 parity, known/unknown lookup).tests/config.test.js— 2 new cases (runner-version default + override).Total: 36 → 44 tests.
Consumer impact
External contract unchanged.
mode/github-token/ec2-image-*/ instance type / subnet / SG / EIP / iam-role-name / aws-resource-tags inputs all work as before. Two new optional inputs (runner-versiondefault2.333.1,http-tokensalready in master from #24).Provider acctest impact — checked every step:
actions/checkout@v6writes to$GITHUB_WORKSPACE— no root needed.curlGo/Terraform tarballs to workspace — no root.tar -C .go-instance -xzf/unzip -o -d .terraform-bin— workspace, no root.make testacc=go test ./namecheap -run TestAcc— no root.Workspace absolute path shifts from
/actions-runner/_work/...to/home/runner/actions-runner/_work/...but$GITHUB_WORKSPACE/$HOME/ relative paths all resolve consistently.Dogfood plan
Rotate the provider's SHA pin after this merges (same pattern as every prior phase). This time I have the
aws ec2 get-console-output --latestrecipe documented privately; any new bootstrap failure is diagnosable in 3 minutes rather than opaque.If dogfood surfaces an unrelated regression, the fix is small and scoped — we've already isolated the fragile axes.