safe-outputs.call-workflow workers cancel each other under parallel fan-out due to gh-aw-copilot-${{ github.workflow }} job concurrency group
TL;DR
The compiler injects a job-level concurrency block on the agent job of every workflow:
concurrency:
group: "gh-aw-copilot-${{ github.workflow }}"
In a reusable worker invoked via workflow_call (e.g. by a safe-outputs.call-workflow orchestrator), ${{ github.workflow }} resolves to the caller's workflow name, not the worker's. As a result, every parallel invocation of the worker from the same caller evaluates the concurrency group to the same string. GitHub Actions only keeps one waiting request per group; the rest are cancelled with:
Canceling since a higher priority waiting request for gh-aw-copilot-<caller-workflow-name> exists
This makes safe-outputs.call-workflow lossy as a fan-out primitive whenever ≥3 worker invocations land close together in time. In a recent test of mine, 3 out of 5 worker runs were cancelled.
Environment
gh aw version: v0.74.8
- Engine:
copilot (default)
- Runner:
ubuntu-latest on github.com
- Behavior reproduces consistently
Reproduction
Two workflows, an orchestrator that fans out to a worker via safe-outputs.call-workflow.
orchestrator.md (caller)
---
on:
issues:
types: [opened]
permissions:
contents: read
issues: read
safe-outputs:
call-workflow:
workflows: [reviewer-worker]
max: 1
---
# Orchestrator
Always call the `reviewer_worker` MCP tool with:
- `issue_number`: `${{ github.event.issue.number }}`
- `proposed_comment`: a one-line placeholder string
reviewer-worker.md (worker)
---
on:
workflow_call:
inputs:
issue_number:
type: string
required: true
proposed_comment:
type: string
required: true
payload:
type: string
required: false
permissions:
contents: read
issues: read
safe-outputs:
add-comment:
max: 1
target: "*"
---
# Worker
Echo `${{ inputs.proposed_comment }}` to a comment on issue `${{ inputs.issue_number }}` via `add_comment`. No reasoning.
Trigger
Open 5 GitHub issues in quick succession (e.g. via a small bash loop with gh issue create). The orchestrator fires 5 times, each invoking the worker once.
Expected
5 worker runs complete; 5 comments posted (one per issue). Worker invocations are independent — different issue_number inputs, distinct github.run_ids, distinct logical work units.
Actual (~60% loss rate)
Several worker runs are cancelled mid-flight with the message above. Workflow run graph shows:
call-{worker} / pre_activation ✅
call-{worker} / activation ✅
call-{worker} / agent ❌ cancelled
call-{worker} / detection — skipped
call-{worker} / safe_outputs — skipped (so no comment posted)
call-{worker} / conclusion ✅
In one of my real runs, 3 of 5 parallel worker invocations got cancelled this way. The job-level annotation on the cancelled jobs reads:
Canceling since a higher priority waiting request for gh-aw-copilot-Issue Triage Agent for dotnet/aspnetcore exists
— note Issue Triage Agent for dotnet/aspnetcore is the orchestrator's name, not the worker's.
Root cause
Inside the generated reviewer-worker.lock.yml, the agent job carries:
agent:
...
concurrency:
group: "gh-aw-copilot-${{ github.workflow }}"
...
GitHub's documented behavior for ${{ github.workflow }} inside a reusable workflow run: it resolves to the caller's workflow name. So when caller Issue Triage Agent invokes worker Reviewer Worker five times in parallel, all five worker runs evaluate the group as:
gh-aw-copilot-Issue Triage Agent
GitHub Actions concurrency without cancel-in-progress: true does still cancel queued runs once a third arrives (only one queued slot per group), producing the observed cancellation pattern.
I verified the workflow-level concurrency at the top of the worker behaves the same way — but that one was easy to override from frontmatter:
concurrency:
group: gh-aw-reviewer-worker-${{ inputs.issue_number || github.run_id }}
cancel-in-progress: false
The agent-job-level gh-aw-copilot-... block however is injected by the compiler and is not user-overridable from frontmatter as far as I can tell from the v0.74.8 codegen.
Proposed fix
The agent-job concurrency group should be unique per invocation when the workflow is a workflow_call target. Two options:
Option A — caller-side identity (simplest)
Use github.run_id (which IS unique per worker invocation, even in workflow_call) instead of, or as a suffix to, github.workflow:
concurrency:
group: "gh-aw-copilot-${{ github.workflow }}-${{ github.run_id }}"
Pro: no behavior change for non-workflow_call workflows (run_id is still unique).
Con: each run gets its own slot, so the gh-aw-copilot serialization at the agent level is effectively disabled. If that serialization exists to throttle Copilot CLI rate-limit pressure, this option removes the throttle.
Option B — Worker-aware identity
Suffix with a compile-time string identifying the workflow file itself (e.g. the GH_AW_WORKFLOW_ID_SANITIZED env var that the compiler already emits):
concurrency:
group: "gh-aw-copilot-reviewer-worker" # compile-time hardcoded
Pro: workers retain a serialization slot but in their own namespace, distinct from caller's. Different workers across the repo still each get a slot, matching the original intent.
Con: still serializes all invocations of the same worker, even when they're for different inputs. In the fan-out case above, all 5 worker invocations for 5 different issues still serialize (slow, but correct — no cancellations).
Workarounds (for anyone hitting this today)
-
Don't fire more than 2 worker invocations in close succession. With queue size 1, two concurrent runs work (one runs, one queues); three is where the cancellations start. If using safe-outputs.call-workflow from an issues.opened-triggered orchestrator on a slow-traffic repo, you may never hit this in practice.
-
Space dispatches by ~90s (the time it takes one worker's agent step to release the slot). Worked for me to retry 3 cancelled runs sequentially.
-
Accept workflow-level serialization in the worker — don't override the default workflow-level concurrency on the worker. Workers will serialize end-to-end (one at a time, ~5 min each), but no cancellations. Slow but correct.
Related observation (separate footgun)
When manually adding a concurrency: block to a workflow_call worker, do NOT use ${{ github.workflow }} in the group key — it resolves to the caller, not the worker, and will deadlock with the caller's own concurrency group when they share an issue number suffix. Error:
Canceling since a deadlock was detected for concurrency group: gh-aw-<caller-name>-<issue#> between a top level workflow and <caller-job-name>
The fix is to hardcode the worker's identity:
concurrency:
group: gh-aw-my-worker-${{ github.run_id }}
cancel-in-progress: false
safe-outputs.call-workflowworkers cancel each other under parallel fan-out due togh-aw-copilot-${{ github.workflow }}job concurrency groupTL;DR
The compiler injects a job-level concurrency block on the agent job of every workflow:
In a reusable worker invoked via
workflow_call(e.g. by asafe-outputs.call-workfloworchestrator),${{ github.workflow }}resolves to the caller's workflow name, not the worker's. As a result, every parallel invocation of the worker from the same caller evaluates the concurrency group to the same string. GitHub Actions only keeps one waiting request per group; the rest are cancelled with:This makes
safe-outputs.call-workflowlossy as a fan-out primitive whenever ≥3 worker invocations land close together in time. In a recent test of mine, 3 out of 5 worker runs were cancelled.Environment
gh awversion: v0.74.8copilot(default)ubuntu-latestongithub.comReproduction
Two workflows, an orchestrator that fans out to a worker via
safe-outputs.call-workflow.orchestrator.md(caller)reviewer-worker.md(worker)Trigger
Open 5 GitHub issues in quick succession (e.g. via a small bash loop with
gh issue create). The orchestrator fires 5 times, each invoking the worker once.Expected
5 worker runs complete; 5 comments posted (one per issue). Worker invocations are independent — different
issue_numberinputs, distinctgithub.run_ids, distinct logical work units.Actual (~60% loss rate)
Several worker runs are cancelled mid-flight with the message above. Workflow run graph shows:
call-{worker} / pre_activation✅call-{worker} / activation✅call-{worker} / agent❌ cancelledcall-{worker} / detection— skippedcall-{worker} / safe_outputs— skipped (so no comment posted)call-{worker} / conclusion✅In one of my real runs, 3 of 5 parallel worker invocations got cancelled this way. The job-level annotation on the cancelled jobs reads:
— note
Issue Triage Agent for dotnet/aspnetcoreis the orchestrator's name, not the worker's.Root cause
Inside the generated
reviewer-worker.lock.yml, the agent job carries:GitHub's documented behavior for
${{ github.workflow }}inside a reusable workflow run: it resolves to the caller's workflow name. So when callerIssue Triage Agentinvokes workerReviewer Workerfive times in parallel, all five worker runs evaluate the group as:GitHub Actions concurrency without
cancel-in-progress: truedoes still cancel queued runs once a third arrives (only one queued slot per group), producing the observed cancellation pattern.I verified the workflow-level concurrency at the top of the worker behaves the same way — but that one was easy to override from frontmatter:
The agent-job-level
gh-aw-copilot-...block however is injected by the compiler and is not user-overridable from frontmatter as far as I can tell from the v0.74.8 codegen.Proposed fix
The agent-job concurrency group should be unique per invocation when the workflow is a
workflow_calltarget. Two options:Option A — caller-side identity (simplest)
Use
github.run_id(which IS unique per worker invocation, even inworkflow_call) instead of, or as a suffix to,github.workflow:Pro: no behavior change for non-
workflow_callworkflows (run_id is still unique).Con: each run gets its own slot, so the gh-aw-copilot serialization at the agent level is effectively disabled. If that serialization exists to throttle Copilot CLI rate-limit pressure, this option removes the throttle.
Option B — Worker-aware identity
Suffix with a compile-time string identifying the workflow file itself (e.g. the
GH_AW_WORKFLOW_ID_SANITIZEDenv var that the compiler already emits):Pro: workers retain a serialization slot but in their own namespace, distinct from caller's. Different workers across the repo still each get a slot, matching the original intent.
Con: still serializes all invocations of the same worker, even when they're for different inputs. In the fan-out case above, all 5 worker invocations for 5 different issues still serialize (slow, but correct — no cancellations).
Workarounds (for anyone hitting this today)
Don't fire more than 2 worker invocations in close succession. With queue size 1, two concurrent runs work (one runs, one queues); three is where the cancellations start. If using
safe-outputs.call-workflowfrom anissues.opened-triggered orchestrator on a slow-traffic repo, you may never hit this in practice.Space dispatches by ~90s (the time it takes one worker's agent step to release the slot). Worked for me to retry 3 cancelled runs sequentially.
Accept workflow-level serialization in the worker — don't override the default workflow-level concurrency on the worker. Workers will serialize end-to-end (one at a time, ~5 min each), but no cancellations. Slow but correct.
Related observation (separate footgun)
When manually adding a
concurrency:block to aworkflow_callworker, do NOT use${{ github.workflow }}in the group key — it resolves to the caller, not the worker, and will deadlock with the caller's own concurrency group when they share an issue number suffix. Error:The fix is to hardcode the worker's identity: