Skip to content

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563

Open
jamesadevine wants to merge 1 commit into
mainfrom
feat/safe-output-smoke-suite
Open

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563
jamesadevine wants to merge 1 commit into
mainfrom
feat/safe-output-smoke-suite

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Adds an end-to-end daily smoke suite for every production safe output and tightens the compiler so silently-dropped safe-outputs: keys can no longer hide.

26 daily smoke fixtures under tests/safe-outputs/

Each fixture is a minimal agentic-pipeline .md that calls exactly one safe-output tool with predictable literal values, plus a committed .lock.yml. Schedule baked into every fixture; after the operator registers them once in the AgentPlayground ADO sandbox the suite runs unattended.

  • 22 production safe outputs (daily around 03:00) — noop, missing-data, missing-tool, report-incomplete, create-work-item, comment-on-work-item, update-work-item, link-work-items, create-branch, create-git-tag, create-wiki-page, update-wiki-page, add-build-tag, queue-build, create-pull-request, add-pr-comment, reply-to-pr-comment, resolve-pr-thread, submit-pr-review, update-pr, upload-build-attachment, upload-workitem-attachment, upload-pipeline-artifact.
  • noop-target.md — trivial target pipeline that the queue-build smoke fires at.
  • janitor.md (weekly on monday around 02:00) — prunes ado-aw-smoke-* artifacts older than 30 days.
  • smoke-failure-reporter.md (daily around 04:30) — queries the ADO REST builds endpoint, finds failed smoke runs, and uses ado-aw-debug.create-issue (feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines #492) to file [smoke-failure] ... issues against githubnext/ado-aw. allowed-labels: [], max: 5, title-prefix dedupes against open issues. The ADO_AW_DEBUG_GITHUB_TOKEN PAT must live only on this pipeline.

README.md documents the suite layout; REGISTERED.md is a contributor-maintained fixture → ADO pipeline ID table plus the manual-handoff checklist (service connections, perma-PR, variable group, token provisioning).

Compiler: fail loud on unknown safe-output keys

generate_enabled_tools_args used to skip unknown keys with only a console warning. Authoring this PR surfaced two real-world consequences:

  1. The fixture filename rename create_prcreate_pull_request (see below) initially used create-pr: in fixtures and compiled cleanly with the tool dropped at runtime.
  2. The long-standing reply-to-pr-comment MCP-name discrepancy (see below) is exactly this class of silent failure.

The new validate_safe_outputs_keys in src/compile/common.rs runs in the validation block before pipeline emission. Unknown keys now bail with a structured error listing all known tools that share the typo's first hyphen-segment, e.g.:

safe-outputs contains unrecognised tool name(s):
  - create-pr (similar known tools: create-branch, create-git-tag, create-pull-request, create-wiki-page, create-work-item)

Valid safe-output keys are listed in docs/safe-outputs.md. Each key must
match exactly the kebab-case name registered by a tool in src/safeoutputs/
(e.g. `create-pull-request`, not `create-pr`).

The memory migration key remains a soft warning (back-compat); DEBUG_ONLY_TOOLS keys keep their more specific error from validate_ado_aw_debug_config. The plain Levenshtein "did you mean X?" heuristic was rejected — it returned misleading suggestions like update-pr for create-pr because both end in -pr. Listing every same-prefix candidate is honest and lets the operator pick.

File ↔ YAML-key alignment

  • src/safeoutputs/create_pr.rssrc/safeoutputs/create_pull_request.rs (now matches YAML key create-pull-request). git mv preserves history.
  • Fix real bug: reply_to_pr_comment.rs declared MCP tool name reply-to-pr-review-comment, while docs/safe-outputs.md, README.md, prompts/*, and site/ all said reply-to-pr-comment. Updated Rust to match docs across tool_result!, mcp.rs registration, execute.rs dispatch, four unit tests, and internal log strings. Technically a behavioural rename of the wire-level MCP name, but no user could have invoked the previously-registered name successfully (it wasn't documented anywhere), so this is a fix rather than a breaking change.

Test plan

  • cargo build — clean (warnings unchanged from baseline).
  • cargo test — every existing test passes plus 9 new unit tests for validate_safe_outputs_keys and related_safe_output_names:
    • known-keys accept,
    • empty safe-outputs: accept,
    • typo reject + create-pull-request appears in the similar-tools hint for create-pr,
    • distant typo reject without misleading hint,
    • create-issue not double-flagged (deferred to validate_ado_aw_debug_config),
    • memory migration soft-path,
    • invalid characters reject with a structural error,
    • related_safe_output_names("create-pr") includes every create-* and excludes update-pr,
    • related_safe_output_names("fabricated-tool-name") returns empty.
  • cargo clippy --all-targets --all-features -- -D warnings — 100+ baseline errors, 0 net-new from changed files (src/compile/common.rs, src/safeoutputs/create_pull_request.rs, src/safeoutputs/reply_to_pr_comment.rs, src/safeoutputs/mod.rs, src/mcp.rs, src/execute.rs).
  • tests/bash_lint_tests.rs with ENFORCE_BASH_LINT=1 and shellcheck on PATH — passes (existing fixtures only; the new smoke fixtures are validated separately in the sweep below).
  • Fixture sweep — all 26 fixtures pass ado-aw compile and ado-aw check; the resulting bash bodies (mostly set -euo pipefail + printf setup steps) pass shellcheck.

Follow-ups (out of scope, manual)

  1. Register one ADO pipeline per *.lock.yml in https://dev.azure.com/msazuresphere/AgentPlayground, fill in tests/safe-outputs/REGISTERED.md in a follow-up docs PR.
  2. Provision ADO_AW_DEBUG_GITHUB_TOKEN (secret, scoped Issues: Read and write on githubnext/ado-aw only) on the smoke-failure-reporter pipeline.
  3. Confirm engine.model: gpt-5-mini is available to copilot in the tenant before enabling the smokes; if not, update the model field across the fixture set.
  4. Flesh out setup: placeholders in janitor.md and resolve-pr-thread.md with real az / REST commands.

…ut keys

* Add tests/safe-outputs/ — 26 daily-scheduled agentic-pipeline fixtures covering all 22 production safe outputs plus a janitor, a queue-build target, and a smoke-failure-reporter that uses ado-aw-debug.create-issue (PR #492).

* Add validate_safe_outputs_keys to fail compile on unrecognised safe-outputs keys (previously silently dropped with only a console warning). Includes a related_safe_output_names helper that lists known tools sharing the typo's first hyphen-segment as a hint.

* Rename create_pr.rs -> create_pull_request.rs so the file name matches the YAML key. Fix the long-standing reply_to_pr_comment.rs name mismatch: the MCP tool was registered as reply-to-pr-review-comment while every user-facing doc, README, prompt, and site page said reply-to-pr-comment. Tool name now matches the docs across mcp.rs tool registration, execute.rs dispatch, and unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — correctness solid, tests comprehensive. A few minor points worth addressing.

Findings

⚠️ Suggestions

  • src/compile/common.rs:1636-1642 — The warning-and-skip path for unknown safe-output tool names in generate_enabled_tools_args is now dead code on the normal compile path. validate_safe_outputs_keys runs at line 3000 (validation phase) and bails hard before generate_enabled_tools_args can ever see an unknown key. The comment at line 1657 ("Every user-specified key was either invalid or unrecognized") is no longer accurate for the unknown case. Since generate_enabled_tools_args is pub, the defensive check isn't harmful, but it creates a misleading impression that warnings are the error surface here. Consider adding a comment like // unreachable in practice: validate_safe_outputs_keys bails earlier or removing the unrecognized-key branch now that the validator owns that responsibility.

  • src/compile/common.rs:1904-1910validate_safe_outputs_keys bails on the first invalid-character key but collects all unknown-but-syntactically-valid keys before reporting. If a user has two bad-character keys they'll only see the first. Minor UX inconsistency — consider collecting character-invalid keys into the same unknown vec (with a different message) so the report is complete in one pass.

  • src/compile/common.rs:1964key.split('-').next().unwrap_or(key) — the .unwrap_or(key) fallback is unreachable: str::split().next() always returns Some(...) (even for "", it yields Some("")). The empty-string case is already handled by the if head.is_empty() guard below. The unwrap_or is harmless but a reader might wonder why it's there. unwrap_or_default() or an inline comment would be cleaner.

✅ What Looks Good

  • The validate_safe_outputs_keys / validate_ado_aw_debug_config split is well-designed: debug-only tools get a purpose-specific error, unknowns get the typo-hint error, memory gets the soft migration warning. The ordering (line 3000 then 3006) makes the sequencing explicit.
  • The reply-to-pr-comment rename is thorough — all five sites updated (mcp.rs #[tool(name=...)], execute.rs dispatch, tool_result! macro, log strings, four unit test assertions).
  • 9 unit tests cover all meaningful branches of validate_safe_outputs_keys and related_safe_output_names, including the dedup-for-debug-only and soft-migration cases.
  • Rejecting create-pr-style typos with a create-* candidate list (rather than a Levenshtein nearest-neighbor) is the right call — the PR description's reasoning about update-pr false-matches is solid.

Generated by Rust PR Reviewer for issue #563 · ● 1.1M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant