feat: structured logging + opt-in debug mode (Phase 7)#22
Merged
kurok merged 1 commit intofeat/al2023-supportfrom Apr 21, 2026
Merged
feat: structured logging + opt-in debug mode (Phase 7)#22kurok merged 1 commit intofeat/al2023-supportfrom
kurok merged 1 commit intofeat/al2023-supportfrom
Conversation
Closes #13. Part of plan #15. ## Motivation Phase 4's dogfood failures took three iterations and a full revert to diagnose because the ephemeral EC2 instance terminates before anyone can look at cloud-init logs. Structured action-side logs are the cheap half of the diagnostic story (see #20 for the AWS-side upload- to-S3 trap that completes it). ## New module: src/log.js JSON-shaped events via core.info / core.warning / core.error: log.info('run_instance', { ami_id: 'ami-0x', instance_type: 't3.medium', subnet_id: 'subnet-aaa', sg_id: 'sg-bbb', label: 'abcde', }); // -> ::notice ::{"step":"run_instance","mode":"start",...} Features: - Lazy require of ./config so validation-phase log calls don't crash when config's try/catch hasn't finished running new Config(). - sanitize() redacts values under a small secret-key allowlist before emitting (githubToken, github-token, token, aws-access-key-id, aws-secret-access-key, GPG_PRIVATE_KEY, password). - log.debug(...) is a no-op unless config.input.debug === 'true'. ## New input: debug action.yml declares 'debug: false' as default. Consumers opt in by setting debug: 'true' in their workflow; the action then emits the extra diagnostic traffic (full DescribeImages response, wait_for_runner poll details, input echo) without changing default output. src/config.js reads the input with a 'false' fallback so old callers that don't set it continue to work. ## Call sites instrumented src/aws.js: - wait_for_instance (start + end + error) - describe_images (request + selection + full-list under debug) - run_instance (request + result + error) - associate_address (attempt + warn on failure) - terminate_instance (start + end + error) src/gh.js: - gh_registration_token (attempt + success + error) - remove_runner (attempt + success + skip + error) - wait_for_runner (final outcome + per-poll details under debug) src/index.js: - start/stop bracketed by core.startGroup/endGroup so the Actions log collapses cleanly. - Top-level 'fatal' log on uncaught failure. - start_inputs / stop_inputs logged under debug (sanitized). ## Tests tests/log.test.js — 7 new cases: - info / warn / error emit through the right core.* channels with step + mode + fields baked into the JSON payload. - debug emits nothing when config.input.debug !== 'true'. - debug emits when config.input.debug === 'true'. - sanitize redacts recursively. - Payload-level redaction works (info with a {githubToken: …} field). tests/config.test.js — 2 new cases for the debug input (default + explicit override). Total: 23 -> 30 tests. ## Unblocks Phase 4.b (non-root retry, #20) now has visibility into what the bootstrap is doing. Combined with the S3-upload trap on the AWS side (also scoped in #20), any future bootstrap regression is diagnosable from outside the instance. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
This was referenced Apr 21, 2026
kurok
added a commit
to namecheap/terraform-provider-namecheap
that referenced
this pull request
Apr 21, 2026
…185) namecheap/ec2-github-runner#22 merged. Phase 7 adds JSON-shaped log events on every lifecycle step, an opt-in 'debug' input, and core.startGroup/endGroup wrapping so the Actions log collapses cleanly. No behavior change to the EC2 bootstrap or runner registration — this is pure action-side observability. Default-mode output grows by one JSON line per notable event (~10 lines total for a full start+stop cycle). Setting debug: 'true' in workflow yaml would add verbose traffic, but that's opt-in. Rotation: 249efbd (Phase 4 revert) -> b1b8d6d (Phase 7). Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
This was referenced Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #13. Part of plan #15.
Motivation
Phase 4's dogfood failures (#18, #19) took three iterations and a full revert to diagnose because the ephemeral EC2 instance is terminated by the
stop-runnerstep before anyone can look at cloud-init logs. Structured action-side logs are the cheap half of the diagnostic story (see #20 for the AWS-sideexec > … | tee | aws s3 cp …trap that completes it).What lands
src/log.js— JSON-shaped events viacore.info/core.warning/core.error. Lazy-requires./configso validation-phase log calls don't crash before Config is constructed. Auto-redacts values under a small secret-key allowlist (githubToken,token,aws-access-key-id, …).debuginput (defaultfalse). When set totrue, the action emits extra diagnostic traffic — fullDescribeImagesresponse, per-pollwait_for_runnerdetails, sanitized input echo. No change to default output.src/aws.js/src/gh.js/src/index.js:run_instance,describe_images,wait_for_instance,associate_address,terminate_instance,gh_registration_token,remove_runner,wait_for_runner.core.startGroup/endGrouparoundstart()andstop()so the Actions run log collapses.tests/log.test.js, 2 new intests/config.test.js. Total 23 → 30.Example output
With
debug: false(default) — one JSON line per notable event:With
debug: true— adds:Secrets never appear in logs. The sanitize-then-stringify pass in
emit()runs on every field payload, including recursively through nested objects.Compatibility
debugdefaults tofalse→ no additional log traffic.label,ec2-instance-id) work exactly as before.Unblocks
log.info('retry', {attempt, delay_ms, reason})without more boilerplate.Dogfood
Behavior change is log-output-only — nothing in this PR affects how the EC2 instance boots or how the runner registers. Rotating the provider's pin to this tip should be risk-free; if the pin rotation's CI passes it validates the full action pipeline under the new logging.