Skip to content

feat: Phase 4 (retry) — non-root runner + --ephemeral + hardcoded checksum table#26

Merged
kurok merged 1 commit intofeat/al2023-supportfrom
feat/20-phase4-retry
Apr 21, 2026
Merged

feat: Phase 4 (retry) — non-root runner + --ephemeral + hardcoded checksum table#26
kurok merged 1 commit intofeat/al2023-supportfrom
feat/20-phase4-retry

Conversation

@kurok
Copy link
Copy Markdown

@kurok kurok commented Apr 21, 2026

Closes #20. Supersedes the reverted #18 / #19 / #21.

The fix for what actually broke previously

#18 and #19 died at:

curl -fsSL <tarball>.sha256 | awk '{print $1}'

with a 404 — actions/runner doesn't publish per-tarball sidecars, empirically confirmed via aws ec2 get-console-output --latest on a probe EC2 (see #20 for full console log).

This PR replaces that with a hardcoded {arch-version → sha256} table in src/runner-checksums.js. Two entries today (x86_64 + arm64 for v2.333.1). Cross-checked against the live release body on every PR via an overhauled verify-runner-url CI job, so drift between the table and upstream fails at code-review time, not at runtime.

Everything from the original Phase 4 issue (#10)

Requirement Landed
Non-root runner user useradd -m -s /bin/bash + sudo -u runner -H bash <<'RUNNER_BOOTSTRAP'
No RUNNER_ALLOW_RUNASROOT=1 ✅ removed
Configurable runner version ✅ new runner-version input, default 2.333.1
Checksum verification ✅ hardcoded table (not sidecar; sidecars don't exist)
--ephemeral on config.sh
--unattended on config.sh
--disableupdate on config.sh
set -euo pipefail ✅ on both outer and inner shells
Idempotent useradd if ! id runner … guard

CI: verify-runner-url overhaul

Previously just HEADed the tarball URL. Now additionally:

  1. Reads runner-version default from action.yml.
  2. Fetches release body via gh api /repos/actions/runner/releases/tags/v<version>.
  3. Greps BEGIN SHA linux-x64 / BEGIN SHA linux-arm64 HTML comments for the upstream hashes.
  4. Loads src/runner-checksums.js via node -e 'require(...)...' for the committed hashes.
  5. Fails the job if either differs.

So bumping runner-version in action.yml without also updating the table in the same PR → red CI. Bumping the table to incorrect values → red CI.

Tests

  • tests/runner-checksums.test.js — 6 new cases (table shape, hex format, per-version x64/arm64 parity, known/unknown lookup).
  • tests/config.test.js — 2 new cases (runner-version default + override).

Total: 36 → 44 tests.

Consumer impact

External contract unchanged. mode / github-token / ec2-image-* / instance type / subnet / SG / EIP / iam-role-name / aws-resource-tags inputs all work as before. Two new optional inputs (runner-version default 2.333.1, http-tokens already in master from #24).

Provider acctest impact — checked every step:

  • actions/checkout@v6 writes to $GITHUB_WORKSPACE — no root needed.
  • curl Go/Terraform tarballs to workspace — no root.
  • tar -C .go-instance -xzf / unzip -o -d .terraform-bin — workspace, no root.
  • make testacc = go test ./namecheap -run TestAcc — no root.

Workspace absolute path shifts from /actions-runner/_work/... to /home/runner/actions-runner/_work/... but $GITHUB_WORKSPACE / $HOME / relative paths all resolve consistently.

Dogfood plan

Rotate the provider's SHA pin after this merges (same pattern as every prior phase). This time I have the aws ec2 get-console-output --latest recipe documented privately; any new bootstrap failure is diagnosable in 3 minutes rather than opaque.

If dogfood surfaces an unrelated regression, the fix is small and scoped — we've already isolated the fragile axes.

…cksum table

Closes #20. Supersedes the reverted #18 / #19 / #21.

Implements the full Phase 4 bootstrap hardening from issue #10, with
the root-cause fix from #20 baked in. Key differences from the
earlier failed attempts:

## The fix for the actual failure

Previous attempts died at:

    curl -fsSL <tarball>.sha256 | awk '{print }'

with a 404 (actions/runner doesn't publish per-tarball sidecar files,
empirically confirmed via aws ec2 get-console-output on a probe
instance — see #20).

This PR replaces that with a hardcoded table of expected hashes in
src/runner-checksums.js, keyed by 'arch-version'. Two x86_64 / arm64
entries for the currently-pinned v2.333.1, sourced from the release
body at github.com/actions/runner/releases/tag/v2.333.1. CI enforces
table-vs-upstream consistency on every PR (see pr.yml).

## Everything else from Phase 4

- Non-root 'runner' user (useradd -m, sudo -u runner -H bash heredoc).
  RUNNER_ALLOW_RUNASROOT=1 escape hatch removed.

- New 'runner-version' input in action.yml (default '2.333.1'). To
  override, add matching x64+arm64 SHAs to runner-checksums.js in
  the same PR — verify-runner-url CI will reject the change if
  the hashes don't match upstream.

- --ephemeral --unattended --disableupdate on config.sh. GitHub
  auto-deregisters the runner after its job; disableupdate keeps
  the binary stable during the short ephemeral session.

- set -euo pipefail on both the outer and inner (runner-user) shells.
  The earlier fatal failure under set -e was the .sha256 404, which
  no longer exists.

- Paramaterized RUNNER_VERSION / TARBALL / BASE bash vars.

## Tests

tests/runner-checksums.test.js — 6 new cases covering the table
shape, hex format, x64+arm64 parity per version, lookup returns for
known/unknown keys.

tests/config.test.js — 2 new cases for the runner-version input
(default fallback + override).

Total: 36 -> 44 tests.

## CI: verify-runner-url overhaul

The job now parses the runner-version from action.yml, then:
1. HEADs the Linux x64 release asset (unchanged).
2. Fetches the release body via 'gh api'.
3. Greps the BEGIN SHA linux-x64 / linux-arm64 HTML comments.
4. Cross-checks against the values lookup() returns from
   src/runner-checksums.js.

Drift between the hardcoded table and upstream fails CI at code-
review time, not at runtime.

## Dogfood plan (MUCH more careful this time)

Provider SHA-pin rotation after merge, same pattern as prior phases.
This time I have full EC2 console-output diagnostic capability via
the recipe saved in my notes — any new bootstrap failure should be
trivially diagnosable rather than opaque.

Closing #20 on merge.

Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
@kurok kurok merged commit 0fdd401 into feat/al2023-support Apr 21, 2026
4 checks passed
@kurok kurok deleted the feat/20-phase4-retry branch April 21, 2026 13:04
kurok added a commit to namecheap/terraform-provider-namecheap that referenced this pull request Apr 21, 2026
…phemeral) (#188)

namecheap/ec2-github-runner#26 merged. Phase 4 retry lands all
requirements from the original issue #10, with the .sha256 sidecar
404 bug fixed by a hardcoded {arch-version → sha256} table kept in
sync with upstream by a new CI check.

Rotation: 6bb148b (Phase 6.a, IMDSv2) -> 0fdd401 (Phase 4 retry).

Critical dogfood. If start-runner fails, I have the console-output
recipe ready — diagnosis turnaround is minutes, not a day.

Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant