Skip to content

Safe-output tool 'create_pull_request_review_comment' not found at runtime despite correct declaration #25656

@JanKrivanek

Description

@JanKrivanek

Summary

The Copilot CLI agent reports Tool 'create_pull_request_review_comment' does not exist at runtime, even though the tool is correctly declared in the workflow's safe-outputs configuration and appears in the compiled lock file's prompt, tools metadata, and validation config.

Reproduction

Repository: dotnet/msbuild
Workflow: review-on-open.agent.lock.yml / review.agent.lock.yml
gh-aw version: v0.67.1
AWF version: v0.25.13

Workflow configuration

The safe-outputs section in shared/review-shared.md correctly declares all three tools:

safe-outputs:
  create-pull-request-review-comment:
    max: 30
  submit-pull-request-review:
    max: 1
  add-comment:
    max: 5

The compiled lock file correctly includes these in the prompt (Tools: add_comment(max:5), create_pull_request_review_comment(max:30), submit_pull_request_review), the tools_meta.json, validation.json, and the safe-outputs handler config.

Observed behavior

The agent attempts to call create_pull_request_review_comment and receives:

✗ create_pull_request_review_comment src/Tasks/GetReferenceAssemblyPaths.cs · pr_number: "13495", p…
  └ Tool 'create_pull_request_review_comment' does not exist.

This causes the agent to fall back to add_comment for everything, losing inline review context. In some runs, the agent produces no safe outputs at all, which then cascades: detection is skipped (no output_types), safe_outputs is skipped (needs.detection.result != 'success'), and no review is posted despite the agent having done the analysis.

Affected runs

Run Workflow Outcome Issue
#13 (attempt 1) on-open Agent exited code 1, review posted via fallback #13519
#13 (attempt 2) on-open Agent succeeded but detection+safe_outputs skipped, no review posted
#25 command No safe outputs generated #13521
#22 command Completed but tool errors in logs
#12 on-open Completed OK (intermittent)

Additional observations

  1. Intermittent: Some runs succeed with the same workflow config (e.g., run Weekly Research Report: AI Workflow Automation Landscape and Strategic Opportunities - August 2025 #12, Remove ai-inference, opencode, genaiscript agentic engines for now #10, Weekly Research Report: AI Workflow Automation Landscape and Strategic Opportunities - August 2025 #9). The tool declaration hasn't changed between runs.

  2. Cascading skip: When create_pull_request_review_comment fails and the agent produces no structured output, the detection job is skipped (output_types == ''). Because safe_outputs is gated on needs.detection.result == 'success', it is also skipped — meaning even add_comment fallback output is never posted. Consider changing the condition to needs.detection.result != 'failure' so that a skipped detection doesn't block output posting.

  3. Long execution times: Runs that encounter the tool error tend to take 18-25 minutes for a 2-file review, likely due to retry/fallback loops in the agent.

Expected behavior

The create_pull_request_review_comment safe-output tool should be available to the agent at runtime when declared in safe-outputs and compiled into the lock file.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions