Summary
In v0.69.0, the Compute current body text step (id: sanitized, runs compute_text.cjs) redacts every URL in the triggering event's title/body whose hostname isn't on a hardcoded default allow-list — because the compiler never passes the workflow's allowed-domain configuration to that step. As a result, any URL a user pastes into an issue body (news articles, market pages, documentation, etc.) arrives at the agent as <domain>/redacted, even when:
- the domain is listed under
network.allowed in the workflow frontmatter, and
- the domain is listed under
safe-outputs.allowed-domains, and
- the domain appears in the
GH_AW_ALLOWED_DOMAINS env var that the compiler does wire up on the downstream output-ingest step.
This effectively breaks any workflow whose whole point is to ingest a user-supplied URL (triage, research, summarization, "explain this article", etc.).
Impact
- Any workflow that trusts a URL from an
issues, issue_comment, pull_request, discussion, or discussion_comment event body.
- Users pasting URLs from Reuters, AP, BBC, Bloomberg, or any other legitimate, network-allow-listed source will see the agent fetch
<that-domain>/redacted, which 404s. The agent has no signal that the URL was altered, so it often concludes the user submitted a broken link and closes/escalates incorrectly.
- Workarounds require hand-editing the compiled
.lock.yml, which is not durable across recompiles.
Expected Behavior
The compute_text / sanitized step should receive the same GH_AW_ALLOWED_DOMAINS value that the compiler already computes for the output-ingest step — i.e. the union of the engine/network base set, network.allowed, and safe-outputs.allowed-domains. Incoming-text sanitization and outgoing-content sanitization should apply the same allow-list; otherwise the two sides of the pipeline disagree about which domains are "known good."
Actual Behavior
In pkg/workflow/compiler_activation_job_builder.go (the NeedsTextOutput branch, around the step titled Compute current body text), the only env var emitted is GH_AW_ALLOWED_BOTS (when data.Bots is populated). GH_AW_ALLOWED_DOMAINS is not set, so at runtime sanitize_content_core.cjs#buildAllowedDomains falls back to the hardcoded default:
github.com, github.io, githubusercontent.com, githubassets.com, github.dev, codespaces.new
Any URL whose host is not on that list is rewritten to (<sanitized-domain>/redacted) by sanitizeUrlDomains before the text ever reaches the prompt construction step. The redaction is logged to /tmp/gh-aw/redacted-urls.log, but the agent itself has no awareness of which URLs were altered.
Reproduction
- Create a workflow that triggers on
issues: [opened, labeled], declares network.allowed including some external domain (e.g. cnn.com), and instructs the agent to fetch the URL from the issue body.
- Compile with
gh aw compile.
- Open an issue whose body contains
https://cnn.com/some-article.
- Inspect the agent transcript — the URL the agent received is
cnn.com/redacted, and any fetch call against it returns 404.
- Inspect the compiled
.lock.yml — the id: sanitized step has no env: block (or only GH_AW_ALLOWED_BOTS), while the downstream collect_output step does have the full GH_AW_ALLOWED_DOMAINS value. The two are inconsistent.
Suggested Fix
In the NeedsTextOutput branch of the activation-job builder, emit the same GH_AW_ALLOWED_DOMAINS env var that generateOutputCollectionStep already emits (the value produced by computeExpandedAllowedDomainsForSanitization / computeAllowedDomainsForSanitization). That single line reuses logic that already exists and brings the incoming-text sanitizer into line with the outgoing-content sanitizer.
Optional follow-ups that would also be nice:
- Surface a warning in
compute_text.cjs when redactions occur (e.g. one-line summary to the step summary), so authors notice when user-supplied URLs are being stripped.
- Document the incoming-text sanitizer and its relationship to
network.allowed and safe-outputs.allowed-domains — today the only mention of sanitization in the docs is on the output side, so authors reasonably assume network.allowed covers "URLs my agent is allowed to read."
Workaround
Until this is fixed, workflows that need to accept user-supplied URLs must either:
- post-process
steps.sanitized.outputs.text in a custom step (fragile), or
- instruct users to put the URL somewhere the sanitizer doesn't touch — e.g. as a label, a custom field, or a structured issue-form dropdown — which defeats the point of a free-form URL field.
Neither is a good long-term answer; the right fix is to thread the allow-list through to the sanitize step.
Environment
gh-aw v0.69.0
- Runner:
ubuntu:24.04
- Engine: Copilot (reproduces regardless of engine — the sanitizer runs before the engine dispatch)
Summary
In v0.69.0, the
Compute current body textstep (id: sanitized, runscompute_text.cjs) redacts every URL in the triggering event's title/body whose hostname isn't on a hardcoded default allow-list — because the compiler never passes the workflow's allowed-domain configuration to that step. As a result, any URL a user pastes into an issue body (news articles, market pages, documentation, etc.) arrives at the agent as<domain>/redacted, even when:network.allowedin the workflow frontmatter, andsafe-outputs.allowed-domains, andGH_AW_ALLOWED_DOMAINSenv var that the compiler does wire up on the downstream output-ingest step.This effectively breaks any workflow whose whole point is to ingest a user-supplied URL (triage, research, summarization, "explain this article", etc.).
Impact
issues,issue_comment,pull_request,discussion, ordiscussion_commentevent body.<that-domain>/redacted, which 404s. The agent has no signal that the URL was altered, so it often concludes the user submitted a broken link and closes/escalates incorrectly..lock.yml, which is not durable across recompiles.Expected Behavior
The
compute_text/sanitizedstep should receive the sameGH_AW_ALLOWED_DOMAINSvalue that the compiler already computes for the output-ingest step — i.e. the union of the engine/network base set,network.allowed, andsafe-outputs.allowed-domains. Incoming-text sanitization and outgoing-content sanitization should apply the same allow-list; otherwise the two sides of the pipeline disagree about which domains are "known good."Actual Behavior
In
pkg/workflow/compiler_activation_job_builder.go(theNeedsTextOutputbranch, around the step titledCompute current body text), the only env var emitted isGH_AW_ALLOWED_BOTS(whendata.Botsis populated).GH_AW_ALLOWED_DOMAINSis not set, so at runtimesanitize_content_core.cjs#buildAllowedDomainsfalls back to the hardcoded default:Any URL whose host is not on that list is rewritten to
(<sanitized-domain>/redacted)bysanitizeUrlDomainsbefore the text ever reaches the prompt construction step. The redaction is logged to/tmp/gh-aw/redacted-urls.log, but the agent itself has no awareness of which URLs were altered.Reproduction
issues: [opened, labeled], declaresnetwork.allowedincluding some external domain (e.g.cnn.com), and instructs the agent tofetchthe URL from the issue body.gh aw compile.https://cnn.com/some-article.cnn.com/redacted, and anyfetchcall against it returns 404..lock.yml— theid: sanitizedstep has noenv:block (or onlyGH_AW_ALLOWED_BOTS), while the downstreamcollect_outputstep does have the fullGH_AW_ALLOWED_DOMAINSvalue. The two are inconsistent.Suggested Fix
In the
NeedsTextOutputbranch of the activation-job builder, emit the sameGH_AW_ALLOWED_DOMAINSenv var thatgenerateOutputCollectionStepalready emits (the value produced bycomputeExpandedAllowedDomainsForSanitization/computeAllowedDomainsForSanitization). That single line reuses logic that already exists and brings the incoming-text sanitizer into line with the outgoing-content sanitizer.Optional follow-ups that would also be nice:
compute_text.cjswhen redactions occur (e.g. one-line summary to the step summary), so authors notice when user-supplied URLs are being stripped.network.allowedandsafe-outputs.allowed-domains— today the only mention of sanitization in the docs is on the output side, so authors reasonably assumenetwork.allowedcovers "URLs my agent is allowed to read."Workaround
Until this is fixed, workflows that need to accept user-supplied URLs must either:
steps.sanitized.outputs.textin a custom step (fragile), orNeither is a good long-term answer; the right fix is to thread the allow-list through to the sanitize step.
Environment
gh-awv0.69.0ubuntu:24.04