feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys by jamesadevine · Pull Request #563 · githubnext/ado-aw

jamesadevine · 2026-05-15T21:10:24Z

Summary

Adds an end-to-end daily smoke suite for every production safe output and tightens the compiler so silently-dropped safe-outputs: keys can no longer hide.

26 daily smoke fixtures under `tests/safe-outputs/`

Each fixture is a minimal agentic-pipeline .md that calls exactly one safe-output tool with predictable literal values, plus a committed .lock.yml. Schedule baked into every fixture; after the operator registers them once in the AgentPlayground ADO sandbox the suite runs unattended.

22 production safe outputs (daily around 03:00) — noop, missing-data, missing-tool, report-incomplete, create-work-item, comment-on-work-item, update-work-item, link-work-items, create-branch, create-git-tag, create-wiki-page, update-wiki-page, add-build-tag, queue-build, create-pull-request, add-pr-comment, reply-to-pr-comment, resolve-pr-thread, submit-pr-review, update-pr, upload-build-attachment, upload-workitem-attachment, upload-pipeline-artifact.
noop-target.md — trivial target pipeline that the queue-build smoke fires at.
janitor.md (weekly on monday around 02:00) — prunes ado-aw-smoke-* artifacts older than 30 days.
smoke-failure-reporter.md (daily around 04:30) — queries the ADO REST builds endpoint, finds failed smoke runs, and uses ado-aw-debug.create-issue (feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines #492) to file [smoke-failure] ... issues against githubnext/ado-aw. allowed-labels: [], max: 5, title-prefix dedupes against open issues. The ADO_AW_DEBUG_GITHUB_TOKEN PAT must live only on this pipeline.

README.md documents the suite layout; REGISTERED.md is a contributor-maintained fixture → ADO pipeline ID table plus the manual-handoff checklist (service connections, perma-PR, variable group, token provisioning).

Compiler: fail loud on unknown safe-output keys

generate_enabled_tools_args used to skip unknown keys with only a console warning. Authoring this PR surfaced two real-world consequences:

The fixture filename rename create_pr → create_pull_request (see below) initially used create-pr: in fixtures and compiled cleanly with the tool dropped at runtime.
The long-standing reply-to-pr-comment MCP-name discrepancy (see below) is exactly this class of silent failure.

The new validate_safe_outputs_keys in src/compile/common.rs runs in the validation block before pipeline emission. Unknown keys now bail with a structured error listing all known tools that share the typo's first hyphen-segment, e.g.:

safe-outputs contains unrecognised tool name(s):
  - create-pr (similar known tools: create-branch, create-git-tag, create-pull-request, create-wiki-page, create-work-item)

Valid safe-output keys are listed in docs/safe-outputs.md. Each key must
match exactly the kebab-case name registered by a tool in src/safeoutputs/
(e.g. `create-pull-request`, not `create-pr`).

The memory migration key remains a soft warning (back-compat); DEBUG_ONLY_TOOLS keys keep their more specific error from validate_ado_aw_debug_config. The plain Levenshtein "did you mean X?" heuristic was rejected — it returned misleading suggestions like update-pr for create-pr because both end in -pr. Listing every same-prefix candidate is honest and lets the operator pick.

File ↔ YAML-key alignment

src/safeoutputs/create_pr.rs → src/safeoutputs/create_pull_request.rs (now matches YAML key create-pull-request). git mv preserves history.
Fix real bug: reply_to_pr_comment.rs declared MCP tool name reply-to-pr-review-comment, while docs/safe-outputs.md, README.md, prompts/*, and site/ all said reply-to-pr-comment. Updated Rust to match docs across tool_result!, mcp.rs registration, execute.rs dispatch, four unit tests, and internal log strings. Technically a behavioural rename of the wire-level MCP name, but no user could have invoked the previously-registered name successfully (it wasn't documented anywhere), so this is a fix rather than a breaking change.

Test plan

cargo build — clean (warnings unchanged from baseline).
cargo test — every existing test passes plus 9 new unit tests for validate_safe_outputs_keys and related_safe_output_names:
- known-keys accept,
- empty safe-outputs: accept,
- typo reject + create-pull-request appears in the similar-tools hint for create-pr,
- distant typo reject without misleading hint,
- create-issue not double-flagged (deferred to validate_ado_aw_debug_config),
- memory migration soft-path,
- invalid characters reject with a structural error,
- related_safe_output_names("create-pr") includes every create-* and excludes update-pr,
- related_safe_output_names("fabricated-tool-name") returns empty.
cargo clippy --all-targets --all-features -- -D warnings — 100+ baseline errors, 0 net-new from changed files (src/compile/common.rs, src/safeoutputs/create_pull_request.rs, src/safeoutputs/reply_to_pr_comment.rs, src/safeoutputs/mod.rs, src/mcp.rs, src/execute.rs).
tests/bash_lint_tests.rs with ENFORCE_BASH_LINT=1 and shellcheck on PATH — passes (existing fixtures only; the new smoke fixtures are validated separately in the sweep below).
Fixture sweep — all 26 fixtures pass ado-aw compile and ado-aw check; the resulting bash bodies (mostly set -euo pipefail + printf setup steps) pass shellcheck.

Follow-ups (out of scope, manual)

Register one ADO pipeline per *.lock.yml in https://dev.azure.com/msazuresphere/AgentPlayground, fill in tests/safe-outputs/REGISTERED.md in a follow-up docs PR.
Provision ADO_AW_DEBUG_GITHUB_TOKEN (secret, scoped Issues: Read and write on githubnext/ado-aw only) on the smoke-failure-reporter pipeline.
Confirm engine.model: gpt-5-mini is available to copilot in the tenant before enabling the smokes; if not, update the model field across the fixture set.
Flesh out setup: placeholders in janitor.md and resolve-pr-thread.md with real az / REST commands.

…ut keys * Add tests/safe-outputs/ — 26 daily-scheduled agentic-pipeline fixtures covering all 22 production safe outputs plus a janitor, a queue-build target, and a smoke-failure-reporter that uses ado-aw-debug.create-issue (PR #492). * Add validate_safe_outputs_keys to fail compile on unrecognised safe-outputs keys (previously silently dropped with only a console warning). Includes a related_safe_output_names helper that lists known tools sharing the typo's first hyphen-segment as a hint. * Rename create_pr.rs -> create_pull_request.rs so the file name matches the YAML key. Fix the long-standing reply_to_pr_comment.rs name mismatch: the MCP tool was registered as reply-to-pr-review-comment while every user-facing doc, README, prompt, and site page said reply-to-pr-comment. Tool name now matches the docs across mcp.rs tool registration, execute.rs dispatch, and unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-15T21:21:41Z

🔍 Rust PR Review

Summary: Looks good — correctness solid, tests comprehensive. A few minor points worth addressing.

Findings

⚠️ Suggestions

src/compile/common.rs:1636-1642 — The warning-and-skip path for unknown safe-output tool names in generate_enabled_tools_args is now dead code on the normal compile path. validate_safe_outputs_keys runs at line 3000 (validation phase) and bails hard before generate_enabled_tools_args can ever see an unknown key. The comment at line 1657 ("Every user-specified key was either invalid or unrecognized") is no longer accurate for the unknown case. Since generate_enabled_tools_args is pub, the defensive check isn't harmful, but it creates a misleading impression that warnings are the error surface here. Consider adding a comment like // unreachable in practice: validate_safe_outputs_keys bails earlier or removing the unrecognized-key branch now that the validator owns that responsibility.
src/compile/common.rs:1904-1910 — validate_safe_outputs_keys bails on the first invalid-character key but collects all unknown-but-syntactically-valid keys before reporting. If a user has two bad-character keys they'll only see the first. Minor UX inconsistency — consider collecting character-invalid keys into the same unknown vec (with a different message) so the report is complete in one pass.
src/compile/common.rs:1964 — key.split('-').next().unwrap_or(key) — the .unwrap_or(key) fallback is unreachable: str::split().next() always returns Some(...) (even for "", it yields Some("")). The empty-string case is already handled by the if head.is_empty() guard below. The unwrap_or is harmless but a reader might wonder why it's there. unwrap_or_default() or an inline comment would be cleaner.

✅ What Looks Good

The validate_safe_outputs_keys / validate_ado_aw_debug_config split is well-designed: debug-only tools get a purpose-specific error, unknowns get the typo-hint error, memory gets the soft migration warning. The ordering (line 3000 then 3006) makes the sequencing explicit.
The reply-to-pr-comment rename is thorough — all five sites updated (mcp.rs #[tool(name=...)], execute.rs dispatch, tool_result! macro, log strings, four unit test assertions).
9 unit tests cover all meaningful branches of validate_safe_outputs_keys and related_safe_output_names, including the dedup-for-debug-only and soft-migration cases.
Rejecting create-pr-style typos with a create-* candidate list (rather than a Levenshtein nearest-neighbor) is the right call — the PR description's reasoning about update-pr false-matches is solid.

Generated by Rust PR Reviewer for issue #563 · ● 1.1M · ◷

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563
jamesadevine wants to merge 1 commit into
mainfrom
feat/safe-output-smoke-suite

jamesadevine commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesadevine commented May 15, 2026

Summary

26 daily smoke fixtures under tests/safe-outputs/

Compiler: fail loud on unknown safe-output keys

File ↔ YAML-key alignment

Test plan

Follow-ups (out of scope, manual)

Uh oh!

github-actions Bot commented May 15, 2026

🔍 Rust PR Review

Findings

⚠️ Suggestions

✅ What Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

26 daily smoke fixtures under `tests/safe-outputs/`