Skip to content

feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492

Open
jamesadevine wants to merge 1 commit intomainfrom
feat/ado-aw-debug-create-issue
Open

feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines#492
jamesadevine wants to merge 1 commit intomainfrom
feat/ado-aw-debug-create-issue

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Adds a new top-level ado-aw-debug: front-matter section that gates two debug-only knobs intended for dogfooding ado-aw pipelines from Azure DevOps back into githubnext/ado-aw:

  • skip-integrity: bool — OR-ed with the existing --skip-integrity CLI flag.
  • create-issue: — files a GitHub issue against an operator-configured target repo when the agent calls the new create-issue MCP tool.

The create-issue tool is not a regular safe output. It is default-deny at three independent layers — the MCP filter, the compiler, and Stage 3 — so it can't be reached unless the operator explicitly opts in via ado-aw-debug.create-issue:.

Front-matter shape

ado-aw-debug:
  skip-integrity: true                    # optional; OR-ed with --skip-integrity
  create-issue:
    target-repo: githubnext/ado-aw        # required; operator-only (agent can't override)
    title-prefix: "[pipeline-failure] "
    labels: [pipeline-failure]
    allowed-labels: ["agent-*"]           # default-DENY (empty = no agent labels)
    assignees: [jamesdevine]
    max: 3

Three-layer default-deny gate

  1. MCP layerDEBUG_ONLY_TOOLS strips create-issue from the SafeOutputs router unless explicitly enabled. Holds even when --enabled-tools is empty (the regular permissive default).
  2. Compiler layer--enabled-tools create-issue is emitted only when ado-aw-debug.create-issue: is set; safe-outputs.create-issue: is rejected outright at compile time so the tool can't be smuggled in via the regular safe-outputs surface.
  3. Executor layer — new ExecutionContext.debug_enabled_tools set is populated only from ado-aw-debug:. Stage 3 refuses any create-issue NDJSON entry whose tool name is absent from that set, closing the gap where a forged entry could otherwise bypass the MCP-layer gate.

The third layer was added in response to a rubber-duck review that surfaced the MCP-only gate as insufficient for a write-capable GitHub PAT.

Stage 3 PAT plumbing

  • New dedicated ADO_AW_DEBUG_GITHUB_TOKEN ADO pipeline variable, separate from the read-only GITHUB_TOKEN the agent sees in Stage 1.
  • New ExecutionContext.github_token: Option<String> field, sourced from that env var.
  • Executor uses GitHub REST API: POST /repos/{owner}/{repo}/issues with bearer token, User-Agent: ado-aw/{version}, and X-GitHub-Api-Version: 2022-11-28.
  • Auto-footer with stable <!-- ado-aw --> marker, build ID, run URL, pipeline name, and trigger reason.

Other rubber-duck-driven hardenings

  • allowed-labels: [] is now default-deny — operators must opt in to unrestricted with ["*"].
  • Tightened target-repo regex to match GitHub's login spec (no underscores or dots in owner segments).
  • Final issue title length validated after title-prefix application.
  • Stage 3 error messages neutralise ##vso[…] sequences in agent-supplied labels so a forged NDJSON entry can't echo a live pipeline command into stdout.

Files

  • New: src/safeoutputs/create_issue.rs, docs/ado-aw-debug.md, examples/dogfood-failure-reporter.md, tests/fixtures/ado-aw-debug-agent.md.
  • Modified: 14 files across compiler, MCP server, executor, types, docs, and integration tests.
  • Diff: 18 files, +1,888/-46.

Out of scope (per planning Q&A)

  • Pipeline-failure detection workflow that uses create-issue (separate decision).
  • gh-aw parity beyond the minimal field set (sub-issues, group-by-day, close-older, etc.).
  • Automated provisioning of the ADO_AW_DEBUG_GITHUB_TOKEN variable via ado-aw configure.

Test plan

  • Unit tests in src/safeoutputs/create_issue.rs — 23 tests covering deserialisation, validation, sanitisation, the three-layer gate (including a forged-NDJSON scenario), default-deny allowed-labels, post-prefix title-length validation, neutralisation of pipeline-command sequences in error strings, and target-repo regex edge cases.
  • MCP filter tests in src/mcp.rs — 3 new tests asserting create-issue is stripped when enabled_tools is None, present when explicitly enabled, and stripped when other tools are enabled but create-issue isn't. Coverage test (test_all_known_safe_outputs_covers_router) updated to exempt DEBUG_ONLY_TOOLS.
  • Compiler tests in src/compile/common.rs — coverage for generate_executor_ado_env (both tokens, each in isolation, neither), generate_enabled_tools_args (debug alone, debug + safe-outputs, no debug), validate_ado_aw_debug_config (missing target-repo, malformed target-repo, pipeline-injection in labels and title-prefix, redirection from safe-outputs.create-issue), and integrity OR-ing.
  • Front-matter tests in src/compile/types.rs — round-trip of ado-aw-debug:, deny_unknown_fields enforcement.
  • Integration test in tests/compiler_tests.rstest_compile_ado_aw_debug_fixture compiles tests/fixtures/ado-aw-debug-agent.md and asserts (a) integrity step omitted, (b) executor env: block exposes ADO_AW_DEBUG_GITHUB_TOKEN, (c) --enabled-tools create-issue is wired in, (d) output is valid YAML. Plus a structural smoke test for the new example file.
cargo test          → 1,356 passed, 0 failed (added ~30 targeted tests)
cargo build         → clean
cargo clippy --all-targets -- -D warnings   → 106 errors (baseline; 0 net-new)

Release-mode test failures (11) are pre-existing: --skip-integrity and --debug-pipeline are debug-only CLI flags by design.

Adds a new top-level `ado-aw-debug:` front-matter section that gates
two debug-only knobs intended for dogfooding ado-aw pipelines from
Azure DevOps back into `githubnext/ado-aw`:

* `skip-integrity: bool` — OR-ed with the existing `--skip-integrity`
  CLI flag.
* `create-issue:` — files a GitHub issue against an operator-configured
  target repository when the agent calls the `create-issue` MCP tool.

The `create-issue` tool is **not** a regular safe output. It is
default-deny at three independent layers:

1. The SafeOutputs MCP filter strips it from the tool router via a new
   `DEBUG_ONLY_TOOLS` constant unless explicitly enabled.
2. The compiler only emits `--enabled-tools create-issue` when
   `ado-aw-debug.create-issue:` is set, and rejects
   `safe-outputs.create-issue:` outright so the tool can't be smuggled
   in via the regular safe-outputs surface.
3. Stage 3 maintains an `ExecutionContext.debug_enabled_tools` set
   populated only from `ado-aw-debug:`. The executor refuses any
   `create-issue` NDJSON entry whose tool name is absent from the set,
   closing the gap where a forged entry could otherwise bypass the
   MCP-layer gate.

Stage 3 authenticates against GitHub using a dedicated
`ADO_AW_DEBUG_GITHUB_TOKEN` ADO pipeline variable surfaced through a
new `github_token` field on `ExecutionContext`. The token is separate
from the read-only `GITHUB_TOKEN` the agent sees in Stage 1.

Other notable design choices:

* `target-repo` is operator-only; the agent has no parameter to
  redirect issues elsewhere.
* `allowed-labels` is **default-deny** — empty/absent rejects every
  agent-supplied label. Operators must opt in to unrestricted with
  `["*"]`.
* The `target-repo` validator follows GitHub's login spec (no
  underscores or dots in owner segments).
* Final issue title length is validated **after** `title-prefix`
  application.
* Stage 3 error messages neutralise `##vso[…]` sequences in
  agent-supplied labels so a forged NDJSON entry can't echo a live
  pipeline command into stdout.

Adds 30+ targeted tests across `safeoutputs::create_issue`,
`mcp::tests`, `compile::common`, `compile::types`, and a fixture-based
integration test asserting the YAML wiring. All 1356 dev-mode tests
pass; clippy net-new errors: 0.

See `docs/ado-aw-debug.md` for the full schema, security framing,
and PAT setup instructions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — the three-layer default-deny design is sound and the security thinking is thorough. One logic bug worth fixing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/main.rs:316ctx.debug_enabled_tools.insert("create-issue") runs unconditionally after the match serde_json::to_value(ci) block, meaning it fires even when the Err arm is taken. If serialization ever fails, Stage 3 would see create-issue in debug_enabled_tools but find no matching entry in tool_configs, causing get_tool_config to return Default::default()validate_target_repo("") → a confusing "target-repo is required" error instead of a serialization failure. serde_json::to_value on a String/Vec<String> struct will never fail in practice, but the placement is wrong structurally. The insert belongs inside the Ok(v) arm:

    match serde_json::to_value(ci) {
        Ok(v) => {
            ctx.tool_configs.insert("create-issue".to_string(), v);
            ctx.debug_enabled_tools.insert("create-issue".to_string()); // ← move here
        }
        Err(e) => log::warn!("Failed to serialize ado-aw-debug.create-issue config: {e}"),
    }
    // remove the dangling insert

⚠️ Suggestions

  • src/safeoutputs/create_issue.rs:~200merge_labels is called for both labels and assignees (merge_labels(&config.assignees, &self.assignees)). The logic is identical so it works, but the name is misleading for a future reader. A rename to merge_dedup_strings (or a private merge_string_lists) would make the intent clearer.

  • src/safeoutputs/create_issue.rs:~50title.len() and body.len() are byte-lengths, not Unicode scalar counts. Unlikely to matter for ADO pipeline titles (almost always ASCII), and GitHub's own character counting is char-based, not byte-based. Not a blocker, but worth a comment that the limit is in bytes, not chars.

✅ What Looks Good

  • Three-layer gate is architecturally correct: MCP filter (DEBUG_ONLY_TOOLS), compiler (--enabled-tools injection gated on front-matter), and executor (debug_enabled_tools check before any network call). The forged-NDJSON scenario is explicitly covered.
  • PAT separation: ADO_AW_DEBUG_GITHUB_TOKEN deliberately kept separate from SYSTEM_ACCESSTOKEN and the read-only GITHUB_TOKEN the agent sees. The comment in result.rs makes this invariant clear.
  • reject_pipeline_injection applied exhaustively across all operator-supplied config fields (target-repo, title-prefix, labels, allowed-labels, assignees) at compile time.
  • ##vso[...] neutralized in error paths: agent-supplied labels are passed through neutralize_pipeline_commands before being echoed into the failure message, preventing ADO command injection even through the error surface.
  • target-repo regex correctly rejects underscores and dots in the owner segment (matching GitHub's actual login rules), and the extra . / .. guard after the regex is correctly belt-and-suspenders.
  • deny_unknown_fields on AdoAwDebugConfig prevents typos from silently passing as no-ops.
  • Test coverage is comprehensive — the forged-NDJSON scenario, default-deny allowed-labels, post-prefix title-length, and all three MCP filter scenarios are all exercised.

Generated by Rust PR Reviewer for issue #492 · ● 504.3K ·

jamesadevine added a commit that referenced this pull request May 10, 2026
* ci(test): lint compiled bash bodies with shellcheck

Adds tests/bash_lint_tests.rs, an integration test that compiles a
representative set of fixtures and runs shellcheck against every
literal bash: body in the generated YAML. The lint catches the actual
silent-failure patterns ADO's "fail on last command" default lets
through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046
unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes).

This replaces the previously proposed approach of sprinkling
`set -eo pipefail` across every bash step (PR #492). That approach
added boilerplate to ~27 sites without enforcement, drifted as new
steps were added, and in two spots actually masked errors more than
the original code (`grep ... | tail -1 || true`).

Real bugs surfaced and fixed by the new lint:

* `src/engine.rs` — `Engine::Copilot::log_dir()` returned
  `~/.copilot/logs`. Tilde does not expand inside the double-quoted
  `[ -d "..." ]` test that consumes this value, so the directory check
  always failed and Copilot logs were silently never collected to the
  pipeline artifact. Replaced with `$HOME/.copilot/logs`.
* `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the
  ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust
  `\<newline>` line continuations in their format strings, which strip
  leading whitespace. The emitted YAML had body lines flush-left
  against `- bash: |`, producing invalid YAML. Replaced with raw
  string literals so indentation is preserved.
* Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no
  `|| exit` guard. Added.
* `exit $AGENT_EXIT_CODE` (multiple sites) — quoted.
* `mkdir -p {{ working_directory }}/safe_outputs` and the matching
  `cp -a ...` — quoted the substitution.
* `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten
  to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a
  shellcheck SC2001 finding).

Targeted `set -eo pipefail` additions (only where masked-pipeline
exit codes matter):

* `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2
  templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -`
  silently passes when grep matches nothing because sha256sum returns
  0 on empty stdin. Without pipefail, the unverified binary would
  install successfully.
* `src/compile/extensions/trigger_filters.rs` script-download step:
  same `grep | sha256sum` pattern.
* `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would
  silently install nothing on curl failure.

The two pre-existing `set -eo pipefail` instances on the AWF download
+ docker pull steps (introduced in PR #439) and on the `tee`-piped
agent / threat-analysis runs are preserved — those were correct.

Skip vs. enforce:
* Locally, the test prints a notice and returns early when shellcheck
  is missing.
* CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing
  shellcheck becomes a hard failure rather than a silent skip.

A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean,
Node-with-feed-url, and .NET-with-feed-url runtimes plus the
cache-memory tool, ensuring every code-generated bash step is reached.
The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to
catch fixture/generator drift.

Documented in AGENTS.md and docs/extending.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bash-lint): close 1ES coverage gap and fix shellcheck 0.9 SC2002

Two changes the CI surfaced after PR #496 landed locally:

1. **shellcheck 0.9.0 (Ubuntu's pinned) flags SC2002 ("Useless cat")
   on `cat file | sed ...` patterns that 0.11.0 does not.** Fixed by
   rewriting the two offending sites in the MCPG start step:
   * `MCPG_CONFIG=$(cat … | sed | sed | sed)` →
     `MCPG_CONFIG=$(sed -e … -e … -e … file)`. Semantically equivalent
     because the three substitutions are over independent placeholder
     patterns.
   * `cat … | python3 -m json.tool` → `python3 -m json.tool < …`.

   Avoids forking `cat` for nothing and is stable across shellcheck
   versions.

2. **Add a `runtime-coverage-1es-agent.md` fixture and assert that
   every known compile target is exercised by at least one fixture.**
   Previously only `1es-test-agent.md` compiled to the 1ES target, and
   it had no `runtimes:` or `tools.cache-memory`. The code-generated
   bash bodies from those extensions (Lean install, `.npmrc`,
   `nuget.config`, cache-memory restore/init) were being linted only
   on the standalone target. Today their bodies are byte-identical
   across targets, but a future target-specific divergence would slip
   past the lint without a 1ES variant.

   `compile_fixture()` now parses `Generated <target> pipeline:` from
   stdout, accumulates targets seen, and the test asserts every entry
   in `REQUIRED_TARGETS = ["standalone", "1es"]` is covered. Sanity-
   checked that removing the 1ES fixtures causes the test to fail with
   `no fixture compiles to the following target(s): ["1es"]`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jamesadevine added a commit that referenced this pull request May 10, 2026
…498)

* ci(test): lint compiled bash bodies with shellcheck

Adds tests/bash_lint_tests.rs, an integration test that compiles a
representative set of fixtures and runs shellcheck against every
literal bash: body in the generated YAML. The lint catches the actual
silent-failure patterns ADO's "fail on last command" default lets
through (SC2164 cd-without-||, SC2155 masked-return, SC2086/2046
unquoted variables, SC2154 unset refs, SC2088 tilde-in-quotes).

This replaces the previously proposed approach of sprinkling
`set -eo pipefail` across every bash step (PR #492). That approach
added boilerplate to ~27 sites without enforcement, drifted as new
steps were added, and in two spots actually masked errors more than
the original code (`grep ... | tail -1 || true`).

Real bugs surfaced and fixed by the new lint:

* `src/engine.rs` — `Engine::Copilot::log_dir()` returned
  `~/.copilot/logs`. Tilde does not expand inside the double-quoted
  `[ -d "..." ]` test that consumes this value, so the directory check
  always failed and Copilot logs were silently never collected to the
  pipeline artifact. Replaced with `$HOME/.copilot/logs`.
* `src/runtimes/node/mod.rs` and `src/runtimes/dotnet/mod.rs` — the
  ensure-`.npmrc` and ensure-`nuget.config` step generators used Rust
  `\<newline>` line continuations in their format strings, which strip
  leading whitespace. The emitted YAML had body lines flush-left
  against `- bash: |`, producing invalid YAML. Replaced with raw
  string literals so indentation is preserved.
* Multiple `cd "$DOWNLOAD_DIR"` in `base.yml` / `1es-base.yml` had no
  `|| exit` guard. Added.
* `exit $AGENT_EXIT_CODE` (multiple sites) — quoted.
* `mkdir -p {{ working_directory }}/safe_outputs` and the matching
  `cp -a ...` — quoted the substitution.
* `JSON_CONTENT=$(echo "$RESULT_LINE" | sed 's/.*PFX://')` rewritten
  to `${RESULT_LINE##*PFX:}` (avoids forking sed and removes a
  shellcheck SC2001 finding).

Targeted `set -eo pipefail` additions (only where masked-pipeline
exit codes matter):

* `base.yml` / `1es-base.yml` ado-aw download steps (3 stages × 2
  templates): `grep "ado-aw-linux-x64" checksums.txt | sha256sum -c -`
  silently passes when grep matches nothing because sha256sum returns
  0 on empty stdin. Without pipefail, the unverified binary would
  install successfully.
* `src/compile/extensions/trigger_filters.rs` script-download step:
  same `grep | sha256sum` pattern.
* `src/runtimes/lean/mod.rs` install step: `curl ... | sh` would
  silently install nothing on curl failure.

The two pre-existing `set -eo pipefail` instances on the AWF download
+ docker pull steps (introduced in PR #439) and on the `tee`-piped
agent / threat-analysis runs are preserved — those were correct.

Skip vs. enforce:
* Locally, the test prints a notice and returns early when shellcheck
  is missing.
* CI installs shellcheck and sets `ENFORCE_BASH_LINT=1` so a missing
  shellcheck becomes a hard failure rather than a silent skip.

A new `tests/fixtures/runtime-coverage-agent.md` exercises the Lean,
Node-with-feed-url, and .NET-with-feed-url runtimes plus the
cache-memory tool, ensuring every code-generated bash step is reached.
The lint enforces a `REQUIRED_STEP_DISPLAY_NAMES` coverage list to
catch fixture/generator drift.

Documented in AGENTS.md and docs/extending.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* ci(workflows): add daily bash-step hygiene auditor agentic workflow

Adds `.github/workflows/bash-lint-auditor.md`, a daily agentic workflow
that complements the PR-gate lint added in PR #496. The PR gate gives
fast feedback on every PR; this workflow runs once a day and lands
small, mechanical improvements that the gate can't:

* When a finding does slip onto main (e.g. via merge conflict), the
  auditor fixes it the next morning instead of waiting for the next
  contributor PR.
* Audits stale `# shellcheck disable=` directives — removes ones that
  no longer fire (i.e. the underlying code has been cleaned up but
  the suppression was forgotten).
* Audits whether the lint's exclude list could be tightened.
* Verifies fixture coverage of every bash-step generator and proposes
  fixture additions when a new generator appears.

When the auditor finds something actionable, it opens a focused PR
(one concern per PR) with the structured "what was found / how it was
fixed / verification" body. When the lint is green and no proactive
improvement is feasible, it exits cleanly with `noop`.

Configuration notes:

* `schedule: daily around 09:00` — fuzzy schedule scattering across
  the hour, matching the convention of other daily workflows in this
  repo (e.g. `cyclomatic-complexity-reducer.md`).
* `allowed-files` restricts the auditor to bash-generator code paths
  plus the tests/fixtures it depends on. `protected-files:
  fallback-to-issue` ensures that if it tries to edit anything else,
  the change falls back to an issue rather than a PR.
* `cache-memory: true` persists state across runs so the auditor
  doesn't loop on the same suggestion if a maintainer rejects it.
* `bash: ["*"]` + `network.allowed: [defaults, rust]` gives the
  agent what it needs to install shellcheck (via apt with a static-
  binary fallback) and run cargo against the rust ecosystem.

Compiled with `gh aw compile bash-lint-auditor`; the matching
`.lock.yml` is included along with new SHAs in
`.github/aw/actions-lock.json` (cache, checkout, download-artifact
registered for the first time by this workflow's setup steps).

Stacked on top of branch `lint-bash-steps` (PR #496) because the
auditor relies on `tests/bash_lint_tests.rs` and
`tests/fixtures/runtime-coverage-agent.md`, which are introduced
there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant