feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492
feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492jamesadevine wants to merge 1 commit intomainfrom
Conversation
Adds a new top-level `ado-aw-debug:` front-matter section that gates two debug-only knobs intended for dogfooding ado-aw pipelines from Azure DevOps back into `githubnext/ado-aw`: * `skip-integrity: bool` — OR-ed with the existing `--skip-integrity` CLI flag. * `create-issue:` — files a GitHub issue against an operator-configured target repository when the agent calls the `create-issue` MCP tool. The `create-issue` tool is **not** a regular safe output. It is default-deny at three independent layers: 1. The SafeOutputs MCP filter strips it from the tool router via a new `DEBUG_ONLY_TOOLS` constant unless explicitly enabled. 2. The compiler only emits `--enabled-tools create-issue` when `ado-aw-debug.create-issue:` is set, and rejects `safe-outputs.create-issue:` outright so the tool can't be smuggled in via the regular safe-outputs surface. 3. Stage 3 maintains an `ExecutionContext.debug_enabled_tools` set populated only from `ado-aw-debug:`. The executor refuses any `create-issue` NDJSON entry whose tool name is absent from the set, closing the gap where a forged entry could otherwise bypass the MCP-layer gate. Stage 3 authenticates against GitHub using a dedicated `ADO_AW_DEBUG_GITHUB_TOKEN` ADO pipeline variable surfaced through a new `github_token` field on `ExecutionContext`. The token is separate from the read-only `GITHUB_TOKEN` the agent sees in Stage 1. Other notable design choices: * `target-repo` is operator-only; the agent has no parameter to redirect issues elsewhere. * `allowed-labels` is **default-deny** — empty/absent rejects every agent-supplied label. Operators must opt in to unrestricted with `["*"]`. * The `target-repo` validator follows GitHub's login spec (no underscores or dots in owner segments). * Final issue title length is validated **after** `title-prefix` application. * Stage 3 error messages neutralise `##vso[…]` sequences in agent-supplied labels so a forged NDJSON entry can't echo a live pipeline command into stdout. Adds 30+ targeted tests across `safeoutputs::create_issue`, `mcp::tests`, `compile::common`, `compile::types`, and a fixture-based integration test asserting the YAML wiring. All 1356 dev-mode tests pass; clippy net-new errors: 0. See `docs/ado-aw-debug.md` for the full schema, security framing, and PAT setup instructions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — the three-layer default-deny design is sound and the security thinking is thorough. One logic bug worth fixing before merge. Findings🐛 Bugs / Logic Issues
|
* ci(test): lint compiled bash bodies with shellcheck Adds tests/bash_lint_tests.rs, an integration test that compiles a representative set of fixtures and runs shellcheck against every literal bash: body in the generated YAML. The lint catches the actual silent-failure patterns ADO's "fail on last command" default lets through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046 unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes). This replaces the previously proposed approach of sprinkling `set -eo pipefail` across every bash step (PR #492). That approach added boilerplate to ~27 sites without enforcement, drifted as new steps were added, and in two spots actually masked errors more than the original code (`grep ... | tail -1 || true`). Real bugs surfaced and fixed by the new lint: * `src/engine.rs` — `Engine::Copilot::log_dir()` returned `~/.copilot/logs`. Tilde does not expand inside the double-quoted `[ -d "..." ]` test that consumes this value, so the directory check always failed and Copilot logs were silently never collected to the pipeline artifact. Replaced with `$HOME/.copilot/logs`. * `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust `\<newline>` line continuations in their format strings, which strip leading whitespace. The emitted YAML had body lines flush-left against `- bash: |`, producing invalid YAML. Replaced with raw string literals so indentation is preserved. * Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no `|| exit` guard. Added. * `exit $AGENT_EXIT_CODE` (multiple sites) — quoted. * `mkdir -p {{ working_directory }}/safe_outputs` and the matching `cp -a ...` — quoted the substitution. * `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a shellcheck SC2001 finding). Targeted `set -eo pipefail` additions (only where masked-pipeline exit codes matter): * `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2 templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -` silently passes when grep matches nothing because sha256sum returns 0 on empty stdin. Without pipefail, the unverified binary would install successfully. * `src/compile/extensions/trigger_filters.rs` script-download step: same `grep | sha256sum` pattern. * `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would silently install nothing on curl failure. The two pre-existing `set -eo pipefail` instances on the AWF download + docker pull steps (introduced in PR #439) and on the `tee`-piped agent / threat-analysis runs are preserved — those were correct. Skip vs. enforce: * Locally, the test prints a notice and returns early when shellcheck is missing. * CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing shellcheck becomes a hard failure rather than a silent skip. A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean, Node-with-feed-url, and .NET-with-feed-url runtimes plus the cache-memory tool, ensuring every code-generated bash step is reached. The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to catch fixture/generator drift. Documented in AGENTS.md and docs/extending.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(bash-lint): close 1ES coverage gap and fix shellcheck 0.9 SC2002 Two changes the CI surfaced after PR #496 landed locally: 1. **shellcheck 0.9.0 (Ubuntu's pinned) flags SC2002 ("Useless cat") on `cat file | sed ...` patterns that 0.11.0 does not.** Fixed by rewriting the two offending sites in the MCPG start step: * `MCPG_CONFIG=$(cat … | sed | sed | sed)` → `MCPG_CONFIG=$(sed -e … -e … -e … file)`. Semantically equivalent because the three substitutions are over independent placeholder patterns. * `cat … | python3 -m json.tool` → `python3 -m json.tool < …`. Avoids forking `cat` for nothing and is stable across shellcheck versions. 2. **Add a `runtime-coverage-1es-agent.md` fixture and assert that every known compile target is exercised by at least one fixture.** Previously only `1es-test-agent.md` compiled to the 1ES target, and it had no `runtimes:` or `tools.cache-memory`. The code-generated bash bodies from those extensions (Lean install, `.npmrc`, `nuget.config`, cache-memory restore/init) were being linted only on the standalone target. Today their bodies are byte-identical across targets, but a future target-specific divergence would slip past the lint without a 1ES variant. `compile_fixture()` now parses `Generated <target> pipeline:` from stdout, accumulates targets seen, and the test asserts every entry in `REQUIRED_TARGETS = ["standalone", "1es"]` is covered. Sanity- checked that removing the 1ES fixtures causes the test to fail with `no fixture compiles to the following target(s): ["1es"]`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…498) * ci(test): lint compiled bash bodies with shellcheck Adds tests/bash_lint_tests.rs, an integration test that compiles a representative set of fixtures and runs shellcheck against every literal bash: body in the generated YAML. The lint catches the actual silent-failure patterns ADO's "fail on last command" default lets through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046 unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes). This replaces the previously proposed approach of sprinkling `set -eo pipefail` across every bash step (PR #492). That approach added boilerplate to ~27 sites without enforcement, drifted as new steps were added, and in two spots actually masked errors more than the original code (`grep ... | tail -1 || true`). Real bugs surfaced and fixed by the new lint: * `src/engine.rs` — `Engine::Copilot::log_dir()` returned `~/.copilot/logs`. Tilde does not expand inside the double-quoted `[ -d "..." ]` test that consumes this value, so the directory check always failed and Copilot logs were silently never collected to the pipeline artifact. Replaced with `$HOME/.copilot/logs`. * `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust `\<newline>` line continuations in their format strings, which strip leading whitespace. The emitted YAML had body lines flush-left against `- bash: |`, producing invalid YAML. Replaced with raw string literals so indentation is preserved. * Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no `|| exit` guard. Added. * `exit $AGENT_EXIT_CODE` (multiple sites) — quoted. * `mkdir -p {{ working_directory }}/safe_outputs` and the matching `cp -a ...` — quoted the substitution. * `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a shellcheck SC2001 finding). Targeted `set -eo pipefail` additions (only where masked-pipeline exit codes matter): * `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2 templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -` silently passes when grep matches nothing because sha256sum returns 0 on empty stdin. Without pipefail, the unverified binary would install successfully. * `src/compile/extensions/trigger_filters.rs` script-download step: same `grep | sha256sum` pattern. * `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would silently install nothing on curl failure. The two pre-existing `set -eo pipefail` instances on the AWF download + docker pull steps (introduced in PR #439) and on the `tee`-piped agent / threat-analysis runs are preserved — those were correct. Skip vs. enforce: * Locally, the test prints a notice and returns early when shellcheck is missing. * CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing shellcheck becomes a hard failure rather than a silent skip. A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean, Node-with-feed-url, and .NET-with-feed-url runtimes plus the cache-memory tool, ensuring every code-generated bash step is reached. The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to catch fixture/generator drift. Documented in AGENTS.md and docs/extending.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci(workflows): add daily bash-step hygiene auditor agentic workflow Adds `.github/workflows/bash-lint-auditor.md`, a daily agentic workflow that complements the PR-gate lint added in PR #496. The PR gate gives fast feedback on every PR; this workflow runs once a day and lands small, mechanical improvements that the gate can't: * When a finding does slip onto main (e.g. via merge conflict), the auditor fixes it the next morning instead of waiting for the next contributor PR. * Audits stale `# shellcheck disable=` directives — removes ones that no longer fire (i.e. the underlying code has been cleaned up but the suppression was forgotten). * Audits whether the lint's exclude list could be tightened. * Verifies fixture coverage of every bash-step generator and proposes fixture additions when a new generator appears. When the auditor finds something actionable, it opens a focused PR (one concern per PR) with the structured "what was found / how it was fixed / verification" body. When the lint is green and no proactive improvement is feasible, it exits cleanly with `noop`. Configuration notes: * `schedule: daily around 09:00` — fuzzy schedule scattering across the hour, matching the convention of other daily workflows in this repo (e.g. `cyclomatic-complexity-reducer.md`). * `allowed-files` restricts the auditor to bash-generator code paths plus the tests/fixtures it depends on. `protected-files: fallback-to-issue` ensures that if it tries to edit anything else, the change falls back to an issue rather than a PR. * `cache-memory: true` persists state across runs so the auditor doesn't loop on the same suggestion if a maintainer rejects it. * `bash: ["*"]` + `network.allowed: [defaults, rust]` gives the agent what it needs to install shellcheck (via apt with a static- binary fallback) and run cargo against the rust ecosystem. Compiled with `gh aw compile bash-lint-auditor`; the matching `.lock.yml` is included along with new SHAs in `.github/aw/actions-lock.json` (cache, checkout, download-artifact registered for the first time by this workflow's setup steps). Stacked on top of branch `lint-bash-steps` (PR #496) because the auditor relies on `tests/bash_lint_tests.rs` and `tests/fixtures/runtime-coverage-agent.md`, which are introduced there. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Summary
Adds a new top-level
ado-aw-debug:front-matter section that gates two debug-only knobs intended for dogfooding ado-aw pipelines from Azure DevOps back intogithubnext/ado-aw:skip-integrity: bool— OR-ed with the existing--skip-integrityCLI flag.create-issue:— files a GitHub issue against an operator-configured target repo when the agent calls the newcreate-issueMCP tool.The
create-issuetool is not a regular safe output. It is default-deny at three independent layers — the MCP filter, the compiler, and Stage 3 — so it can't be reached unless the operator explicitly opts in viaado-aw-debug.create-issue:.Front-matter shape
Three-layer default-deny gate
DEBUG_ONLY_TOOLSstripscreate-issuefrom the SafeOutputs router unless explicitly enabled. Holds even when--enabled-toolsis empty (the regular permissive default).--enabled-tools create-issueis emitted only whenado-aw-debug.create-issue:is set;safe-outputs.create-issue:is rejected outright at compile time so the tool can't be smuggled in via the regular safe-outputs surface.ExecutionContext.debug_enabled_toolsset is populated only fromado-aw-debug:. Stage 3 refuses anycreate-issueNDJSON entry whose tool name is absent from that set, closing the gap where a forged entry could otherwise bypass the MCP-layer gate.The third layer was added in response to a rubber-duck review that surfaced the MCP-only gate as insufficient for a write-capable GitHub PAT.
Stage 3 PAT plumbing
ADO_AW_DEBUG_GITHUB_TOKENADO pipeline variable, separate from the read-onlyGITHUB_TOKENthe agent sees in Stage 1.ExecutionContext.github_token: Option<String>field, sourced from that env var.POST /repos/{owner}/{repo}/issueswith bearer token,User-Agent: ado-aw/{version}, andX-GitHub-Api-Version: 2022-11-28.<!-- ado-aw -->marker, build ID, run URL, pipeline name, and trigger reason.Other rubber-duck-driven hardenings
allowed-labels: []is now default-deny — operators must opt in to unrestricted with["*"].target-reporegex to match GitHub's login spec (no underscores or dots in owner segments).title-prefixapplication.##vso[…]sequences in agent-supplied labels so a forged NDJSON entry can't echo a live pipeline command into stdout.Files
src/safeoutputs/create_issue.rs,docs/ado-aw-debug.md,examples/dogfood-failure-reporter.md,tests/fixtures/ado-aw-debug-agent.md.Out of scope (per planning Q&A)
create-issue(separate decision).ADO_AW_DEBUG_GITHUB_TOKENvariable viaado-aw configure.Test plan
src/safeoutputs/create_issue.rs— 23 tests covering deserialisation, validation, sanitisation, the three-layer gate (including a forged-NDJSON scenario), default-denyallowed-labels, post-prefix title-length validation, neutralisation of pipeline-command sequences in error strings, andtarget-reporegex edge cases.src/mcp.rs— 3 new tests assertingcreate-issueis stripped whenenabled_toolsisNone, present when explicitly enabled, and stripped when other tools are enabled butcreate-issueisn't. Coverage test (test_all_known_safe_outputs_covers_router) updated to exemptDEBUG_ONLY_TOOLS.src/compile/common.rs— coverage forgenerate_executor_ado_env(both tokens, each in isolation, neither),generate_enabled_tools_args(debug alone, debug + safe-outputs, no debug),validate_ado_aw_debug_config(missingtarget-repo, malformedtarget-repo, pipeline-injection in labels andtitle-prefix, redirection fromsafe-outputs.create-issue), and integrity OR-ing.src/compile/types.rs— round-trip ofado-aw-debug:,deny_unknown_fieldsenforcement.tests/compiler_tests.rs—test_compile_ado_aw_debug_fixturecompilestests/fixtures/ado-aw-debug-agent.mdand asserts (a) integrity step omitted, (b) executorenv:block exposesADO_AW_DEBUG_GITHUB_TOKEN, (c)--enabled-tools create-issueis wired in, (d) output is valid YAML. Plus a structural smoke test for the new example file.Release-mode test failures (11) are pre-existing:
--skip-integrityand--debug-pipelineare debug-only CLI flags by design.