Skip to content

safe-outputs.call-workflow workers cancel each other under parallel fan-out due to gh-aw-copilot-${{ github.workflow }} job concurrency group' #35161

@DeagleGross

Description

@DeagleGross

safe-outputs.call-workflow workers cancel each other under parallel fan-out due to gh-aw-copilot-${{ github.workflow }} job concurrency group

TL;DR

The compiler injects a job-level concurrency block on the agent job of every workflow:

concurrency:
  group: "gh-aw-copilot-${{ github.workflow }}"

In a reusable worker invoked via workflow_call (e.g. by a safe-outputs.call-workflow orchestrator), ${{ github.workflow }} resolves to the caller's workflow name, not the worker's. As a result, every parallel invocation of the worker from the same caller evaluates the concurrency group to the same string. GitHub Actions only keeps one waiting request per group; the rest are cancelled with:

Canceling since a higher priority waiting request for gh-aw-copilot-<caller-workflow-name> exists

This makes safe-outputs.call-workflow lossy as a fan-out primitive whenever ≥3 worker invocations land close together in time. In a recent test of mine, 3 out of 5 worker runs were cancelled.

Environment

  • gh aw version: v0.74.8
  • Engine: copilot (default)
  • Runner: ubuntu-latest on github.com
  • Behavior reproduces consistently

Reproduction

Two workflows, an orchestrator that fans out to a worker via safe-outputs.call-workflow.

orchestrator.md (caller)

---
on:
  issues:
    types: [opened]
permissions:
  contents: read
  issues: read
safe-outputs:
  call-workflow:
    workflows: [reviewer-worker]
    max: 1
---

# Orchestrator

Always call the `reviewer_worker` MCP tool with:
- `issue_number`: `${{ github.event.issue.number }}`
- `proposed_comment`: a one-line placeholder string

reviewer-worker.md (worker)

---
on:
  workflow_call:
    inputs:
      issue_number:
        type: string
        required: true
      proposed_comment:
        type: string
        required: true
      payload:
        type: string
        required: false

permissions:
  contents: read
  issues: read

safe-outputs:
  add-comment:
    max: 1
    target: "*"
---

# Worker

Echo `${{ inputs.proposed_comment }}` to a comment on issue `${{ inputs.issue_number }}` via `add_comment`. No reasoning.

Trigger

Open 5 GitHub issues in quick succession (e.g. via a small bash loop with gh issue create). The orchestrator fires 5 times, each invoking the worker once.

Expected

5 worker runs complete; 5 comments posted (one per issue). Worker invocations are independent — different issue_number inputs, distinct github.run_ids, distinct logical work units.

Actual (~60% loss rate)

Several worker runs are cancelled mid-flight with the message above. Workflow run graph shows:

  • call-{worker} / pre_activation
  • call-{worker} / activation
  • call-{worker} / agent ❌ cancelled
  • call-{worker} / detection — skipped
  • call-{worker} / safe_outputs — skipped (so no comment posted)
  • call-{worker} / conclusion

In one of my real runs, 3 of 5 parallel worker invocations got cancelled this way. The job-level annotation on the cancelled jobs reads:

Canceling since a higher priority waiting request for gh-aw-copilot-Issue Triage Agent for dotnet/aspnetcore exists

— note Issue Triage Agent for dotnet/aspnetcore is the orchestrator's name, not the worker's.

Root cause

Inside the generated reviewer-worker.lock.yml, the agent job carries:

  agent:
    ...
    concurrency:
      group: "gh-aw-copilot-${{ github.workflow }}"
    ...

GitHub's documented behavior for ${{ github.workflow }} inside a reusable workflow run: it resolves to the caller's workflow name. So when caller Issue Triage Agent invokes worker Reviewer Worker five times in parallel, all five worker runs evaluate the group as:

gh-aw-copilot-Issue Triage Agent

GitHub Actions concurrency without cancel-in-progress: true does still cancel queued runs once a third arrives (only one queued slot per group), producing the observed cancellation pattern.

I verified the workflow-level concurrency at the top of the worker behaves the same way — but that one was easy to override from frontmatter:

concurrency:
  group: gh-aw-reviewer-worker-${{ inputs.issue_number || github.run_id }}
  cancel-in-progress: false

The agent-job-level gh-aw-copilot-... block however is injected by the compiler and is not user-overridable from frontmatter as far as I can tell from the v0.74.8 codegen.

Proposed fix

The agent-job concurrency group should be unique per invocation when the workflow is a workflow_call target. Two options:

Option A — caller-side identity (simplest)

Use github.run_id (which IS unique per worker invocation, even in workflow_call) instead of, or as a suffix to, github.workflow:

concurrency:
  group: "gh-aw-copilot-${{ github.workflow }}-${{ github.run_id }}"

Pro: no behavior change for non-workflow_call workflows (run_id is still unique).
Con: each run gets its own slot, so the gh-aw-copilot serialization at the agent level is effectively disabled. If that serialization exists to throttle Copilot CLI rate-limit pressure, this option removes the throttle.

Option B — Worker-aware identity

Suffix with a compile-time string identifying the workflow file itself (e.g. the GH_AW_WORKFLOW_ID_SANITIZED env var that the compiler already emits):

concurrency:
  group: "gh-aw-copilot-reviewer-worker"  # compile-time hardcoded

Pro: workers retain a serialization slot but in their own namespace, distinct from caller's. Different workers across the repo still each get a slot, matching the original intent.
Con: still serializes all invocations of the same worker, even when they're for different inputs. In the fan-out case above, all 5 worker invocations for 5 different issues still serialize (slow, but correct — no cancellations).

Workarounds (for anyone hitting this today)

  1. Don't fire more than 2 worker invocations in close succession. With queue size 1, two concurrent runs work (one runs, one queues); three is where the cancellations start. If using safe-outputs.call-workflow from an issues.opened-triggered orchestrator on a slow-traffic repo, you may never hit this in practice.

  2. Space dispatches by ~90s (the time it takes one worker's agent step to release the slot). Worked for me to retry 3 cancelled runs sequentially.

  3. Accept workflow-level serialization in the worker — don't override the default workflow-level concurrency on the worker. Workers will serialize end-to-end (one at a time, ~5 min each), but no cancellations. Slow but correct.

Related observation (separate footgun)

When manually adding a concurrency: block to a workflow_call worker, do NOT use ${{ github.workflow }} in the group key — it resolves to the caller, not the worker, and will deadlock with the caller's own concurrency group when they share an issue number suffix. Error:

Canceling since a deadlock was detected for concurrency group: gh-aw-<caller-name>-<issue#> between a top level workflow and <caller-job-name>

The fix is to hardcode the worker's identity:

concurrency:
  group: gh-aw-my-worker-${{ github.run_id }}
  cancel-in-progress: false

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions