revert: full rollback of Phase 4 bootstrap to Phase 1 known-good#21
Merged
kurok merged 2 commits intofeat/al2023-supportfrom Apr 21, 2026
Merged
revert: full rollback of Phase 4 bootstrap to Phase 1 known-good#21kurok merged 2 commits intofeat/al2023-supportfrom
kurok merged 2 commits intofeat/al2023-supportfrom
Conversation
Phase 4 attempts #18 (with non-root) and #19 (without non-root but keeping --ephemeral + checksum + set -euo pipefail + runner-version input) BOTH failed the provider dogfood with the same 6m15s runner registration timeout (terraform-provider-namecheap#182 and machulav#183). The fix-forward in #19 narrowed the suspect set from 'all Phase 4 changes' to 'one of: set -euo pipefail, --ephemeral flag, --disableupdate flag, checksum verify, parameterized bash vars'. Still not isolated. Full rollback here restores the known-good Phase 1 bootstrap exactly. Everything else from Phase 1 is preserved (aws-sdk v3, ncc 0.38, jest tests, .gitattributes). Phase 4 work is NOT abandoned — it moves to follow-up issues where each change lands on its own with its own dogfood, so the next failure isolates itself to a single axis instead of requiring bisection across five simultaneous changes. Files reverted to match a1bd2f9 (Phase 1 tip): - action.yml (drops runner-version input) - src/aws.js (original 12-line bash array, yum install libicu make, RUNNER_ALLOW_RUNASROOT=1, no --ephemeral, no checksum verify) - src/config.js (drops runnerVersion field) - tests/config.test.js (drops runner-version test block, 23 -> 21 tests) Dist rebuilt against the reverted src (verify-dist will confirm). Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
Paired with the full Phase 4 revert — now that action.yml no longer has a runner-version default, the Phase 4 version of verify-runner-url that reads action.yml can't find the version. Restore the original extractor that greps the literal URL out of src/aws.js. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
This was referenced Apr 21, 2026
Merged
kurok
added a commit
to namecheap/terraform-provider-namecheap
that referenced
this pull request
Apr 21, 2026
…trap restored) (#184) namecheap/ec2-github-runner#21 merged. Phase 4 bootstrap changes are fully reverted to the Phase 1 known-good state. Everything else from Phase 1 is preserved (aws-sdk v3, ncc 0.38, jest, gitattributes). Expected: this dogfood passes. If it does, the previous two failures are confirmed to be caused by something in Phase 4's bootstrap deltas — most likely 'set -euo pipefail' interacting with an in-flight failure (`mount -o remount,size=1G /tmp` is the prime suspect per the hypothesis in ec2-github-runner#20). Rotation chain: 54459d6 (DEP0169 filter) a1bd2f9 (Phase 1, aws-sdk v3) -- dogfood green 7b949a3 (Phase 4 original, non-root) -- dogfood failed 78f98d1 (Phase 4 safe, no non-root) -- dogfood failed 249efbd (Phase 4 full revert to Phase 1 bootstrap) -- this PR Phase 4 work pauses here until #20 gets debug instrumentation so individual change re-introductions can be bisected. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
kurok
added a commit
that referenced
this pull request
Apr 21, 2026
…cksum table (#26) Closes #20. Supersedes the reverted #18 / #19 / #21. Implements the full Phase 4 bootstrap hardening from issue #10, with the root-cause fix from #20 baked in. Key differences from the earlier failed attempts: ## The fix for the actual failure Previous attempts died at: curl -fsSL <tarball>.sha256 | awk '{print }' with a 404 (actions/runner doesn't publish per-tarball sidecar files, empirically confirmed via aws ec2 get-console-output on a probe instance — see #20). This PR replaces that with a hardcoded table of expected hashes in src/runner-checksums.js, keyed by 'arch-version'. Two x86_64 / arm64 entries for the currently-pinned v2.333.1, sourced from the release body at github.com/actions/runner/releases/tag/v2.333.1. CI enforces table-vs-upstream consistency on every PR (see pr.yml). ## Everything else from Phase 4 - Non-root 'runner' user (useradd -m, sudo -u runner -H bash heredoc). RUNNER_ALLOW_RUNASROOT=1 escape hatch removed. - New 'runner-version' input in action.yml (default '2.333.1'). To override, add matching x64+arm64 SHAs to runner-checksums.js in the same PR — verify-runner-url CI will reject the change if the hashes don't match upstream. - --ephemeral --unattended --disableupdate on config.sh. GitHub auto-deregisters the runner after its job; disableupdate keeps the binary stable during the short ephemeral session. - set -euo pipefail on both the outer and inner (runner-user) shells. The earlier fatal failure under set -e was the .sha256 404, which no longer exists. - Paramaterized RUNNER_VERSION / TARBALL / BASE bash vars. ## Tests tests/runner-checksums.test.js — 6 new cases covering the table shape, hex format, x64+arm64 parity per version, lookup returns for known/unknown keys. tests/config.test.js — 2 new cases for the runner-version input (default fallback + override). Total: 36 -> 44 tests. ## CI: verify-runner-url overhaul The job now parses the runner-version from action.yml, then: 1. HEADs the Linux x64 release asset (unchanged). 2. Fetches the release body via 'gh api'. 3. Greps the BEGIN SHA linux-x64 / linux-arm64 HTML comments. 4. Cross-checks against the values lookup() returns from src/runner-checksums.js. Drift between the hardcoded table and upstream fails CI at code- review time, not at runtime. ## Dogfood plan (MUCH more careful this time) Provider SHA-pin rotation after merge, same pattern as prior phases. This time I have full EC2 console-output diagnostic capability via the recipe saved in my notes — any new bootstrap failure should be trivially diagnosable rather than opaque. Closing #20 on merge. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes follow-ups: Phase 4's bootstrap changes need a redo. Full rollback here so the provider's acceptance tests work again.
Why full revert
Two dogfood attempts failed with the identical 6m15s registration timeout:
--ephemeral+--disableupdate+ checksum +set -euo pipefail+ runner-version inputStart self-hosted EC2 runnerBoth failures'
Start self-hosted EC2 runnerstep times out after 5 min of polling for a registered runner, meaning the user-data bootstrap never reached./run.sh. Since #19 removed the non-root transition and still fails, the breaker is NOT the non-root move — it's somewhere in the rest of Phase 4's changes (--ephemeral,--disableupdate, checksum verify,set -euo pipefail, or parameterized bash vars).Rather than another narrow-search iteration, revert the whole block to Phase 1 bytes and restart Phase 4 with one change per PR + one dogfood per PR.
What this PR does
Checkout
a1bd2f9's (Phase 1 tip)src/aws.js/src/config.js/action.yml/tests/into the current tree. Rebuilddist/. Everything else from Phase 1 stays intact:runner-versiontests).gitattributesline-ending normalization fordist/What's lost
All five Phase 4 changes need to re-land via new issues / PRs:
runner-versioninput (additive, zero runtime risk — easy relaunch)set -euo pipefailin user-data (the most likely actual breaker IMO — could be turning a previously-silent non-fatal step into a hard stop)--ephemeral+--unattended+--disableupdateon config.sh (could be flag-combination incompatibility with the installed runner version).sha256URL returns an unexpected body)RUNNER_VERSION/TARBALL/BASEbash vars (lowest risk)Follow-up issue #20 (non-root move) is unchanged — still waiting on debug instrumentation.
Next steps post-merge