Phase 4.b: drop runner to non-root user — second attempt with bootstrap debuggability

Part of plan #15. Follow-up to Phase 4 (#10) — originally intended to be part of that PR but reverted after the dogfood failure.

## Context

Phase 4 (#10 → PR #18) tried to drop the runner from root to a dedicated `runner` user via `sudo -u runner -H bash <<'RUNNER_BOOTSTRAP'` wrapping the download + configure + run sequence. The [terraform-provider-namecheap#182](https://github.com/namecheap/terraform-provider-namecheap/pull/182) dogfood failed at `Start self-hosted EC2 runner` with the 5-minute registration timeout — the EC2 instance came up but the runner never reached `./run.sh` polling GitHub.

PR #19 (the fix-forward) reverted the non-root transition and kept everything else from Phase 4 (runner-version input, `--ephemeral`, `--unattended`, `--disableupdate`, SHA-256 checksum, `set -euo pipefail`). Dogfood with that pin now passes.

## The missing piece

Instrumentation. We can't post-mortem the failed instance — it was ephemeral and got terminated by the stop-runner step before we could SSH in. Any second attempt at non-root needs to leave a breadcrumb trail we can pull up after the fact.

## Suggested approach

### 1. Add an opt-in debug mode (via the Phase 7 `debug` input)

When `debug: true`:
- `set -x` on the bootstrap so every command is echoed.
- Stream stdout+stderr to `/var/log/ec2-github-runner-bootstrap.log` (world-readable).
- Upload the log to S3 (bucket passed via new `debug-log-bucket` input) on exit, pass or fail. Requires an additional `s3:PutObject` permission on the runner's IAM instance profile.
- Or simpler: write the log to the instance metadata service as a `User-Data/Instance-Identity` field — no extra IAM, but size-limited.

### 2. Reproduce outside the ephemeral flow

Launch one test instance with the same AMI + the failing user-data, and watch the EC2 console output via `aws ec2 get-console-output`. No runner registration needed — just look at where cloud-init actually stopped.

### 3. Binary-search the bootstrap

Rather than adding back all the non-root machinery at once, add it in the smallest possible steps:
- Step a: `useradd runner` only. Keep root execution. Verify dogfood passes.
- Step b: `sudo -u runner true` before anything else. Verify dogfood passes.
- Step c: Move only the tarball download + extraction to the runner user. Keep root for `config.sh` + `run.sh`.
- Step d: Move `config.sh` as well.
- Step e: Move `run.sh`.

Any step that fails dogfood is the breaker and can be investigated in isolation.

## Likely suspects

- **`requiretty` or `Defaults env_reset` in sudoers** on the `DEVOPS/hardened-amazon-linux2023` AMI. Cloud-init runs user-data in a non-interactive, tty-less context; `requiretty` would kill `sudo -u runner`.
- **SELinux** enforcing a context on `/home/runner` that prevents `config.sh` from writing its `.runner` and `.credentials` files. Visible in `audit.log` / `journalctl`.
- **Heredoc quoting in my JS** — unlikely, but the quoted delimiter `<<'RUNNER_BOOTSTRAP'` vs unquoted might have interacted weirdly with the token containing special characters.
- **Runner user's PATH** — `config.sh` and `run.sh` inside `/home/runner/actions-runner/` execute via `./` which doesn't need `.` in PATH, but some internal runner code might.

## Acceptance criteria

- [ ] Debug mode exists that makes bootstrap failures diagnosable from outside the instance.
- [ ] Non-root runner user applied in the smallest coherent chunk that passes dogfood.
- [ ] A decision recorded about whether the hardened-AL2023 AMI's sudoers / SELinux config needs changing, or whether the action works around them.
- [ ] The `RUNNER_ALLOW_RUNASROOT=1` escape hatch is gone at the end of this work.

## Severity
Medium — security hardening, not a blocker. Phase 4 shipped the safer pieces; non-root is the outstanding goal.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 4.b: drop runner to non-root user — second attempt with bootstrap debuggability #20

Context

The missing piece

Suggested approach

1. Add an opt-in debug mode (via the Phase 7 `debug` input)

2. Reproduce outside the ephemeral flow

3. Binary-search the bootstrap

Likely suspects

Acceptance criteria

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phase 4.b: drop runner to non-root user — second attempt with bootstrap debuggability #20

Description

Context

The missing piece

Suggested approach

1. Add an opt-in debug mode (via the Phase 7 debug input)

2. Reproduce outside the ephemeral flow

3. Binary-search the bootstrap

Likely suspects

Acceptance criteria

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Add an opt-in debug mode (via the Phase 7 `debug` input)