`compute_text` step strips all non-GitHub URLs from issue/PR/discussion bodies before the agent sees them

## Summary

In v0.69.0, the `Compute current body text` step (`id: sanitized`, runs `compute_text.cjs`) redacts every URL in the triggering event's title/body whose hostname isn't on a hardcoded default allow-list — because the compiler never passes the workflow's allowed-domain configuration to that step. As a result, any URL a user pastes into an issue body (news articles, market pages, documentation, etc.) arrives at the agent as `<domain>/redacted`, even when:

- the domain is listed under `network.allowed` in the workflow frontmatter, and
- the domain is listed under `safe-outputs.allowed-domains`, and
- the domain appears in the `GH_AW_ALLOWED_DOMAINS` env var that the compiler *does* wire up on the downstream output-ingest step.

This effectively breaks any workflow whose whole point is to ingest a user-supplied URL (triage, research, summarization, "explain this article", etc.).

## Impact

- Any workflow that trusts a URL from an `issues`, `issue_comment`, `pull_request`, `discussion`, or `discussion_comment` event body.
- Users pasting URLs from Reuters, AP, BBC, Bloomberg, or any other legitimate, network-allow-listed source will see the agent fetch `<that-domain>/redacted`, which 404s. The agent has no signal that the URL was altered, so it often concludes the user submitted a broken link and closes/escalates incorrectly.
- Workarounds require hand-editing the compiled `.lock.yml`, which is not durable across recompiles.

## Expected Behavior

The `compute_text` / `sanitized` step should receive the same `GH_AW_ALLOWED_DOMAINS` value that the compiler already computes for the output-ingest step — i.e. the union of the engine/network base set, `network.allowed`, and `safe-outputs.allowed-domains`. Incoming-text sanitization and outgoing-content sanitization should apply the same allow-list; otherwise the two sides of the pipeline disagree about which domains are "known good."

## Actual Behavior

In `pkg/workflow/compiler_activation_job_builder.go` (the `NeedsTextOutput` branch, around the step titled `Compute current body text`), the only env var emitted is `GH_AW_ALLOWED_BOTS` (when `data.Bots` is populated). `GH_AW_ALLOWED_DOMAINS` is not set, so at runtime `sanitize_content_core.cjs#buildAllowedDomains` falls back to the hardcoded default:

```
github.com, github.io, githubusercontent.com, githubassets.com, github.dev, codespaces.new
```

Any URL whose host is not on that list is rewritten to `(<sanitized-domain>/redacted)` by `sanitizeUrlDomains` before the text ever reaches the prompt construction step. The redaction is logged to `/tmp/gh-aw/redacted-urls.log`, but the agent itself has no awareness of which URLs were altered.

## Reproduction

1. Create a workflow that triggers on `issues: [opened, labeled]`, declares `network.allowed` including some external domain (e.g. `cnn.com`), and instructs the agent to `fetch` the URL from the issue body.
2. Compile with `gh aw compile`.
3. Open an issue whose body contains `https://cnn.com/some-article`.
4. Inspect the agent transcript — the URL the agent received is `cnn.com/redacted`, and any `fetch` call against it returns 404.
5. Inspect the compiled `.lock.yml` — the `id: sanitized` step has no `env:` block (or only `GH_AW_ALLOWED_BOTS`), while the downstream `collect_output` step does have the full `GH_AW_ALLOWED_DOMAINS` value. The two are inconsistent.

## Suggested Fix

In the `NeedsTextOutput` branch of the activation-job builder, emit the same `GH_AW_ALLOWED_DOMAINS` env var that `generateOutputCollectionStep` already emits (the value produced by `computeExpandedAllowedDomainsForSanitization` / `computeAllowedDomainsForSanitization`). That single line reuses logic that already exists and brings the incoming-text sanitizer into line with the outgoing-content sanitizer.

Optional follow-ups that would also be nice:

- Surface a warning in `compute_text.cjs` when redactions occur (e.g. one-line summary to the step summary), so authors notice when user-supplied URLs are being stripped.
- Document the incoming-text sanitizer and its relationship to `network.allowed` and `safe-outputs.allowed-domains` — today the only mention of sanitization in the docs is on the output side, so authors reasonably assume `network.allowed` covers "URLs my agent is allowed to read."

## Workaround

Until this is fixed, workflows that need to accept user-supplied URLs must either:

- post-process `steps.sanitized.outputs.text` in a custom step (fragile), or
- instruct users to put the URL somewhere the sanitizer doesn't touch — e.g. as a label, a custom field, or a structured issue-form dropdown — which defeats the point of a free-form URL field.

Neither is a good long-term answer; the right fix is to thread the allow-list through to the sanitize step.

## Environment

- `gh-aw` v0.69.0
- Runner: `ubuntu:24.04`
- Engine: Copilot (reproduces regardless of engine — the sanitizer runs before the engine dispatch)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`compute_text` step strips all non-GitHub URLs from issue/PR/discussion bodies before the agent sees them #27638

Summary

Impact

Expected Behavior

Actual Behavior

Reproduction

Suggested Fix

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

compute_text step strips all non-GitHub URLs from issue/PR/discussion bodies before the agent sees them #27638

Description

Summary

Impact

Expected Behavior

Actual Behavior

Reproduction

Suggested Fix

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`compute_text` step strips all non-GitHub URLs from issue/PR/discussion bodies before the agent sees them #27638