feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines by jamesadevine · Pull Request #492 · githubnext/ado-aw

jamesadevine · 2026-05-10T06:27:24Z

Summary

Adds a new top-level ado-aw-debug: front-matter section that gates two debug-only knobs intended for dogfooding ado-aw pipelines from Azure DevOps back into githubnext/ado-aw:

skip-integrity: bool — OR-ed with the existing --skip-integrity CLI flag.
create-issue: — files a GitHub issue against an operator-configured target repo when the agent calls the new create-issue MCP tool.

The create-issue tool is not a regular safe output. It is default-deny at three independent layers — the MCP filter, the compiler, and Stage 3 — so it can't be reached unless the operator explicitly opts in via ado-aw-debug.create-issue:.

Front-matter shape

ado-aw-debug:
  skip-integrity: true                    # optional; OR-ed with --skip-integrity
  create-issue:
    target-repo: githubnext/ado-aw        # required; operator-only (agent can't override)
    title-prefix: "[pipeline-failure] "
    labels: [pipeline-failure]
    allowed-labels: ["agent-*"]           # default-DENY (empty = no agent labels)
    assignees: [jamesdevine]
    max: 3

Three-layer default-deny gate

MCP layer — DEBUG_ONLY_TOOLS strips create-issue from the SafeOutputs router unless explicitly enabled. Holds even when --enabled-tools is empty (the regular permissive default).
Compiler layer — --enabled-tools create-issue is emitted only when ado-aw-debug.create-issue: is set; safe-outputs.create-issue: is rejected outright at compile time so the tool can't be smuggled in via the regular safe-outputs surface.
Executor layer — new ExecutionContext.debug_enabled_tools set is populated only from ado-aw-debug:. Stage 3 refuses any create-issue NDJSON entry whose tool name is absent from that set, closing the gap where a forged entry could otherwise bypass the MCP-layer gate.

The third layer was added in response to a rubber-duck review that surfaced the MCP-only gate as insufficient for a write-capable GitHub PAT.

Stage 3 PAT plumbing

New dedicated ADO_AW_DEBUG_GITHUB_TOKEN ADO pipeline variable, separate from the read-only GITHUB_TOKEN the agent sees in Stage 1.
New ExecutionContext.github_token: Option<String> field, sourced from that env var.
Executor uses GitHub REST API: POST /repos/{owner}/{repo}/issues with bearer token, User-Agent: ado-aw/{version}, and X-GitHub-Api-Version: 2022-11-28.
Auto-footer with stable  marker, build ID, run URL, pipeline name, and trigger reason.

Other rubber-duck-driven hardenings

allowed-labels: [] is now default-deny — operators must opt in to unrestricted with ["*"].
Tightened target-repo regex to match GitHub's login spec (no underscores or dots in owner segments).
Final issue title length validated after title-prefix application.
Stage 3 error messages neutralise ##vso[…] sequences in agent-supplied labels so a forged NDJSON entry can't echo a live pipeline command into stdout.

Files

New: src/safeoutputs/create_issue.rs, docs/ado-aw-debug.md, examples/dogfood-failure-reporter.md, tests/fixtures/ado-aw-debug-agent.md.
Modified: 14 files across compiler, MCP server, executor, types, docs, and integration tests.
Diff: 18 files, +1,888/-46.

Out of scope (per planning Q&A)

Pipeline-failure detection workflow that uses create-issue (separate decision).
gh-aw parity beyond the minimal field set (sub-issues, group-by-day, close-older, etc.).
Automated provisioning of the ADO_AW_DEBUG_GITHUB_TOKEN variable via ado-aw configure.

Test plan

Unit tests in src/safeoutputs/create_issue.rs — 23 tests covering deserialisation, validation, sanitisation, the three-layer gate (including a forged-NDJSON scenario), default-deny allowed-labels, post-prefix title-length validation, neutralisation of pipeline-command sequences in error strings, and target-repo regex edge cases.
MCP filter tests in src/mcp.rs — 3 new tests asserting create-issue is stripped when enabled_tools is None, present when explicitly enabled, and stripped when other tools are enabled but create-issue isn't. Coverage test (test_all_known_safe_outputs_covers_router) updated to exempt DEBUG_ONLY_TOOLS.
Compiler tests in src/compile/common.rs — coverage for generate_executor_ado_env (both tokens, each in isolation, neither), generate_enabled_tools_args (debug alone, debug + safe-outputs, no debug), validate_ado_aw_debug_config (missing target-repo, malformed target-repo, pipeline-injection in labels and title-prefix, redirection from safe-outputs.create-issue), and integrity OR-ing.
Front-matter tests in src/compile/types.rs — round-trip of ado-aw-debug:, deny_unknown_fields enforcement.
Integration test in tests/compiler_tests.rs — test_compile_ado_aw_debug_fixture compiles tests/fixtures/ado-aw-debug-agent.md and asserts (a) integrity step omitted, (b) executor env: block exposes ADO_AW_DEBUG_GITHUB_TOKEN, (c) --enabled-tools create-issue is wired in, (d) output is valid YAML. Plus a structural smoke test for the new example file.

cargo test          → 1,356 passed, 0 failed (added ~30 targeted tests)
cargo build         → clean
cargo clippy --all-targets -- -D warnings   → 106 errors (baseline; 0 net-new)

Release-mode test failures (11) are pre-existing: --skip-integrity and --debug-pipeline are debug-only CLI flags by design.

Adds a new top-level `ado-aw-debug:` front-matter section that gates two debug-only knobs intended for dogfooding ado-aw pipelines from Azure DevOps back into `githubnext/ado-aw`: * `skip-integrity: bool` — OR-ed with the existing `--skip-integrity` CLI flag. * `create-issue:` — files a GitHub issue against an operator-configured target repository when the agent calls the `create-issue` MCP tool. The `create-issue` tool is **not** a regular safe output. It is default-deny at three independent layers: 1. The SafeOutputs MCP filter strips it from the tool router via a new `DEBUG_ONLY_TOOLS` constant unless explicitly enabled. 2. The compiler only emits `--enabled-tools create-issue` when `ado-aw-debug.create-issue:` is set, and rejects `safe-outputs.create-issue:` outright so the tool can't be smuggled in via the regular safe-outputs surface. 3. Stage 3 maintains an `ExecutionContext.debug_enabled_tools` set populated only from `ado-aw-debug:`. The executor refuses any `create-issue` NDJSON entry whose tool name is absent from the set, closing the gap where a forged entry could otherwise bypass the MCP-layer gate. Stage 3 authenticates against GitHub using a dedicated `ADO_AW_DEBUG_GITHUB_TOKEN` ADO pipeline variable surfaced through a new `github_token` field on `ExecutionContext`. The token is separate from the read-only `GITHUB_TOKEN` the agent sees in Stage 1. Other notable design choices: * `target-repo` is operator-only; the agent has no parameter to redirect issues elsewhere. * `allowed-labels` is **default-deny** — empty/absent rejects every agent-supplied label. Operators must opt in to unrestricted with `["*"]`. * The `target-repo` validator follows GitHub's login spec (no underscores or dots in owner segments). * Final issue title length is validated **after** `title-prefix` application. * Stage 3 error messages neutralise `##vso[…]` sequences in agent-supplied labels so a forged NDJSON entry can't echo a live pipeline command into stdout. Adds 30+ targeted tests across `safeoutputs::create_issue`, `mcp::tests`, `compile::common`, `compile::types`, and a fixture-based integration test asserting the YAML wiring. All 1356 dev-mode tests pass; clippy net-new errors: 0. See `docs/ado-aw-debug.md` for the full schema, security framing, and PAT setup instructions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-10T06:33:03Z

🔍 Rust PR Review

Summary: Looks good overall — the three-layer default-deny design is sound and the security thinking is thorough. One logic bug worth fixing before merge.

Findings

🐛 Bugs / Logic Issues

src/main.rs:316 — ctx.debug_enabled_tools.insert("create-issue") runs unconditionally after the match serde_json::to_value(ci) block, meaning it fires even when the Err arm is taken. If serialization ever fails, Stage 3 would see create-issue in debug_enabled_tools but find no matching entry in tool_configs, causing get_tool_config to return Default::default() → validate_target_repo("") → a confusing "target-repo is required" error instead of a serialization failure. serde_json::to_value on a String/Vec<String> struct will never fail in practice, but the placement is wrong structurally. The insert belongs inside the Ok(v) arm:
```
match serde_json::to_value(ci) {
    Ok(v) => {
        ctx.tool_configs.insert("create-issue".to_string(), v);
        ctx.debug_enabled_tools.insert("create-issue".to_string()); // ← move here
    }
    Err(e) => log::warn!("Failed to serialize ado-aw-debug.create-issue config: {e}"),
}
// remove the dangling insert
```

⚠️ Suggestions

src/safeoutputs/create_issue.rs:~200 — merge_labels is called for both labels and assignees (merge_labels(&config.assignees, &self.assignees)). The logic is identical so it works, but the name is misleading for a future reader. A rename to merge_dedup_strings (or a private merge_string_lists) would make the intent clearer.
src/safeoutputs/create_issue.rs:~50 — title.len() and body.len() are byte-lengths, not Unicode scalar counts. Unlikely to matter for ADO pipeline titles (almost always ASCII), and GitHub's own character counting is char-based, not byte-based. Not a blocker, but worth a comment that the limit is in bytes, not chars.

✅ What Looks Good

Three-layer gate is architecturally correct: MCP filter (DEBUG_ONLY_TOOLS), compiler (--enabled-tools injection gated on front-matter), and executor (debug_enabled_tools check before any network call). The forged-NDJSON scenario is explicitly covered.
PAT separation: ADO_AW_DEBUG_GITHUB_TOKEN deliberately kept separate from SYSTEM_ACCESSTOKEN and the read-only GITHUB_TOKEN the agent sees. The comment in result.rs makes this invariant clear.
reject_pipeline_injection applied exhaustively across all operator-supplied config fields (target-repo, title-prefix, labels, allowed-labels, assignees) at compile time.
##vso[...] neutralized in error paths: agent-supplied labels are passed through neutralize_pipeline_commands before being echoed into the failure message, preventing ADO command injection even through the error surface.
target-repo regex correctly rejects underscores and dots in the owner segment (matching GitHub's actual login rules), and the extra . / .. guard after the regex is correctly belt-and-suspenders.
deny_unknown_fields on AdoAwDebugConfig prevents typos from silently passing as no-ops.
Test coverage is comprehensive — the forged-NDJSON scenario, default-deny allowed-labels, post-prefix title-length, and all three MCP filter scenarios are all exercised.

Generated by Rust PR Reviewer for issue #492 · ● 504.3K · ◷

* ci(test): lint compiled bash bodies with shellcheck Adds tests/bash_lint_tests.rs, an integration test that compiles a representative set of fixtures and runs shellcheck against every literal bash: body in the generated YAML. The lint catches the actual silent-failure patterns ADO's "fail on last command" default lets through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046 unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes). This replaces the previously proposed approach of sprinkling `set -eo pipefail` across every bash step (PR #492). That approach added boilerplate to ~27 sites without enforcement, drifted as new steps were added, and in two spots actually masked errors more than the original code (`grep ... | tail -1 || true`). Real bugs surfaced and fixed by the new lint: * `src/engine.rs` — `Engine::Copilot::log_dir()` returned `~/.copilot/logs`. Tilde does not expand inside the double-quoted `[ -d "..." ]` test that consumes this value, so the directory check always failed and Copilot logs were silently never collected to the pipeline artifact. Replaced with `$HOME/.copilot/logs`. * `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust `\<newline>` line continuations in their format strings, which strip leading whitespace. The emitted YAML had body lines flush-left against `- bash: |`, producing invalid YAML. Replaced with raw string literals so indentation is preserved. * Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no `|| exit` guard. Added. * `exit $AGENT_EXIT_CODE` (multiple sites) — quoted. * `mkdir -p {{ working_directory }}/safe_outputs` and the matching `cp -a ...` — quoted the substitution. * `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a shellcheck SC2001 finding). Targeted `set -eo pipefail` additions (only where masked-pipeline exit codes matter): * `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2 templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -` silently passes when grep matches nothing because sha256sum returns 0 on empty stdin. Without pipefail, the unverified binary would install successfully. * `src/compile/extensions/trigger_filters.rs` script-download step: same `grep | sha256sum` pattern. * `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would silently install nothing on curl failure. The two pre-existing `set -eo pipefail` instances on the AWF download + docker pull steps (introduced in PR #439) and on the `tee`-piped agent / threat-analysis runs are preserved — those were correct. Skip vs. enforce: * Locally, the test prints a notice and returns early when shellcheck is missing. * CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing shellcheck becomes a hard failure rather than a silent skip. A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean, Node-with-feed-url, and .NET-with-feed-url runtimes plus the cache-memory tool, ensuring every code-generated bash step is reached. The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to catch fixture/generator drift. Documented in AGENTS.md and docs/extending.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(bash-lint): close 1ES coverage gap and fix shellcheck 0.9 SC2002 Two changes the CI surfaced after PR #496 landed locally: 1. **shellcheck 0.9.0 (Ubuntu's pinned) flags SC2002 ("Useless cat") on `cat file | sed ...` patterns that 0.11.0 does not.** Fixed by rewriting the two offending sites in the MCPG start step: * `MCPG_CONFIG=$(cat … | sed | sed | sed)` → `MCPG_CONFIG=$(sed -e … -e … -e … file)`. Semantically equivalent because the three substitutions are over independent placeholder patterns. * `cat … | python3 -m json.tool` → `python3 -m json.tool < …`. Avoids forking `cat` for nothing and is stable across shellcheck versions. 2. **Add a `runtime-coverage-1es-agent.md` fixture and assert that every known compile target is exercised by at least one fixture.** Previously only `1es-test-agent.md` compiled to the 1ES target, and it had no `runtimes:` or `tools.cache-memory`. The code-generated bash bodies from those extensions (Lean install, `.npmrc`, `nuget.config`, cache-memory restore/init) were being linted only on the standalone target. Today their bodies are byte-identical across targets, but a future target-specific divergence would slip past the lint without a 1ES variant. `compile_fixture()` now parses `Generated <target> pipeline:` from stdout, accumulates targets seen, and the test asserts every entry in `REQUIRED_TARGETS = ["standalone", "1es"]` is covered. Sanity- checked that removing the 1ES fixtures causes the test to fail with `no fixture compiles to the following target(s): ["1es"]`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…498) * ci(test): lint compiled bash bodies with shellcheck Adds tests/bash_lint_tests.rs, an integration test that compiles a representative set of fixtures and runs shellcheck against every literal bash: body in the generated YAML. The lint catches the actual silent-failure patterns ADO's "fail on last command" default lets through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046 unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes). This replaces the previously proposed approach of sprinkling `set -eo pipefail` across every bash step (PR #492). That approach added boilerplate to ~27 sites without enforcement, drifted as new steps were added, and in two spots actually masked errors more than the original code (`grep ... | tail -1 || true`). Real bugs surfaced and fixed by the new lint: * `src/engine.rs` — `Engine::Copilot::log_dir()` returned `~/.copilot/logs`. Tilde does not expand inside the double-quoted `[ -d "..." ]` test that consumes this value, so the directory check always failed and Copilot logs were silently never collected to the pipeline artifact. Replaced with `$HOME/.copilot/logs`. * `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust `\<newline>` line continuations in their format strings, which strip leading whitespace. The emitted YAML had body lines flush-left against `- bash: |`, producing invalid YAML. Replaced with raw string literals so indentation is preserved. * Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no `|| exit` guard. Added. * `exit $AGENT_EXIT_CODE` (multiple sites) — quoted. * `mkdir -p {{ working_directory }}/safe_outputs` and the matching `cp -a ...` — quoted the substitution. * `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a shellcheck SC2001 finding). Targeted `set -eo pipefail` additions (only where masked-pipeline exit codes matter): * `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2 templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -` silently passes when grep matches nothing because sha256sum returns 0 on empty stdin. Without pipefail, the unverified binary would install successfully. * `src/compile/extensions/trigger_filters.rs` script-download step: same `grep | sha256sum` pattern. * `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would silently install nothing on curl failure. The two pre-existing `set -eo pipefail` instances on the AWF download + docker pull steps (introduced in PR #439) and on the `tee`-piped agent / threat-analysis runs are preserved — those were correct. Skip vs. enforce: * Locally, the test prints a notice and returns early when shellcheck is missing. * CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing shellcheck becomes a hard failure rather than a silent skip. A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean, Node-with-feed-url, and .NET-with-feed-url runtimes plus the cache-memory tool, ensuring every code-generated bash step is reached. The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to catch fixture/generator drift. Documented in AGENTS.md and docs/extending.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci(workflows): add daily bash-step hygiene auditor agentic workflow Adds `.github/workflows/bash-lint-auditor.md`, a daily agentic workflow that complements the PR-gate lint added in PR #496. The PR gate gives fast feedback on every PR; this workflow runs once a day and lands small, mechanical improvements that the gate can't: * When a finding does slip onto main (e.g. via merge conflict), the auditor fixes it the next morning instead of waiting for the next contributor PR. * Audits stale `# shellcheck disable=` directives — removes ones that no longer fire (i.e. the underlying code has been cleaned up but the suppression was forgotten). * Audits whether the lint's exclude list could be tightened. * Verifies fixture coverage of every bash-step generator and proposes fixture additions when a new generator appears. When the auditor finds something actionable, it opens a focused PR (one concern per PR) with the structured "what was found / how it was fixed / verification" body. When the lint is green and no proactive improvement is feasible, it exits cleanly with `noop`. Configuration notes: * `schedule: daily around 09:00` — fuzzy schedule scattering across the hour, matching the convention of other daily workflows in this repo (e.g. `cyclomatic-complexity-reducer.md`). * `allowed-files` restricts the auditor to bash-generator code paths plus the tests/fixtures it depends on. `protected-files: fallback-to-issue` ensures that if it tries to edit anything else, the change falls back to an issue rather than a PR. * `cache-memory: true` persists state across runs so the auditor doesn't loop on the same suggestion if a maintainer rejects it. * `bash: ["*"]` + `network.allowed: [defaults, rust]` gives the agent what it needs to install shellcheck (via apt with a static- binary fallback) and run cargo against the rust ecosystem. Compiled with `gh aw compile bash-lint-auditor`; the matching `.lock.yml` is included along with new SHAs in `.github/aw/actions-lock.json` (cache, checkout, download-artifact registered for the first time by this workflow's setup steps). Stacked on top of branch `lint-bash-steps` (PR #496) because the auditor relies on `tests/bash_lint_tests.rs` and `tests/fixtures/runtime-coverage-agent.md`, which are introduced there. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

jamesadevine mentioned this pull request May 10, 2026

ci(test): lint compiled bash bodies with shellcheck #496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492

feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492
jamesadevine wants to merge 1 commit intomainfrom
feat/ado-aw-debug-create-issue

jamesadevine commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesadevine commented May 10, 2026

Summary

Front-matter shape

Three-layer default-deny gate

Stage 3 PAT plumbing

Other rubber-duck-driven hardenings

Files

Out of scope (per planning Q&A)

Test plan

Uh oh!

github-actions Bot commented May 10, 2026

🔍 Rust PR Review

Findings

🐛 Bugs / Logic Issues

⚠️ Suggestions

✅ What Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant