Prevent workflow_call worker fan-out cancellations by namespacing agent concurrency + enabling queued pending runs#35173
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…queue Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
workflow_call worker fan-out cancellations by namespacing agent concurrency + enabling queued pending runs
There was a problem hiding this comment.
Pull request overview
Fixes lossy safe-outputs.call-workflow fan-out by namespacing the compiler-generated agent job concurrency group per worker (using the worker's compile-time workflow ID instead of ${{ github.workflow }}, which resolves to the caller in reusable workflows) and by emitting queue: max so pending invocations are queued instead of dropped.
Changes:
- In
GenerateJobConcurrencyConfig, switch the default group togh-aw-{engine}-{workflow-id}whenon.workflow_callis present and a workflow ID is known. - Append
queue: maxto compiler-generated agent job concurrency, gated by the existinggroup-concurrency-queuefeature flag. - Update concurrency test expectations and add cases covering the workflow_call namespace and the queue-max opt-out.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/concurrency.go | Adds workflow_call-aware group namespacing and queue: max emission (with feature-flag opt-out) for default agent job concurrency. |
| pkg/workflow/concurrency_test.go | Updates existing expected outputs to include queue: max and adds workflow_call namespace + queue-max opt-out test cases. |
| .github/workflows/daily-compiler-threat-spec-optimizer.lock.yml | Recompiled lock file (schedule cron change, heredoc hash changes, allowDomains update); agent concurrency block does not appear to have been regenerated to include queue: max. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 1
| - cron: "6 3 * * 1" | ||
| # Friendly format: weekly on monday around 03:00 (scattered) |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #35173 does not have the 'implementation' label and has only 48 new lines of code in default business logic directories (threshold: 100). |
|
🧪 Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — clean fix with good coverage. Approving with two minor test-gap suggestions.
📋 Key Themes & Highlights
Root cause correctly addressed
The fix is precise: ${{ github.workflow }} is a late-binding GitHub Actions expression that resolves to the caller name inside a reusable worker, causing all fan-out workers to share one concurrency group. Using the compile-time WorkflowID instead removes the dependency on runtime context entirely.
queue: max default
Adding queue: max by default is the right call for fan-out workflows — GitHub's single-pending-run default silently drops queued runs under bursty load. The existing features.group-concurrency-queue: false opt-out is correctly plumbed and tested.
Test gaps (inline comments)
- No pure
workflow_call-only trigger test — the new case pairsworkflow_callwithschedule; aworkflow_call-only variant would nail the canonical scenario. - No empty
WorkflowID+workflow_calltest — the silent fallback to${{ github.workflow }}should be documented as a known edge case or eliminated with an earlier invariant check.
Positive highlights
- ✅ Surgical change — non-
workflow_callpaths are untouched - ✅ Feature-flag opt-out works symmetrically in both
concurrency.goandnotify_comment.go - ✅ Lock file diff is a clean downstream artefact (hash rotation + cron change + domain addition)
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.5M
| workflowData: &WorkflowData{ | ||
| On: "on:\n workflow_call:\n inputs:\n item:\n required: false\n type: string\n schedule:\n - cron: '0 0 * * *'", | ||
| WorkflowID: "reviewer-worker", | ||
| EngineConfig: &EngineConfig{ |
There was a problem hiding this comment.
[/tdd] The new workflow_call test pairs the trigger with schedule, but there is no case for a pure workflow_call-only trigger. Because hasWorkflowCallTrigger is the gating condition, a workflow_call-only workflow is the canonical scenario — worth an explicit case.
💡 Suggested additional case
{
name: "workflow_call-only trigger uses workflow-id namespace",
workflowData: &WorkflowData{
On: "on:\n workflow_call:\n inputs:\n item:\n type: string",
WorkflowID: "reviewer-worker",
EngineConfig: &EngineConfig{ID: "copilot"},
},
expected: `concurrency:
group: "gh-aw-copilot-reviewer-worker"
queue: max`,
description: "pure workflow_call trigger must also use compile-time WorkflowID",
},| // For workflow_call workers, github.workflow resolves to the caller workflow | ||
| // name, so use the compile-time workflow ID to avoid cross-worker collisions. | ||
| groupValue := fmt.Sprintf("gh-aw-%s-${{ github.workflow }}", engineID) | ||
| if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" { |
There was a problem hiding this comment.
[/diagnose] The silent fallback when WorkflowID is empty deserves a test. If a workflow_call workflow somehow compiles with an empty WorkflowID, the group silently reverts to the buggy ${{ github.workflow }} form — the original problem returns with no warning. Consider adding a test case (empty WorkflowID + workflow_call → still gets ${{ github.workflow }} group) so the behaviour is documented and any future regression is caught.
💡 Suggested test case
{
name: "workflow_call with empty WorkflowID falls back to github.workflow group",
workflowData: &WorkflowData{
On: "on:\n workflow_call:\n",
WorkflowID: "", // empty — should fall back
EngineConfig: &EngineConfig{ID: "copilot"},
},
expected: `concurrency:
group: "gh-aw-copilot-${{ github.workflow }}"
queue: max`,
description: "empty WorkflowID falls back gracefully; documents the known limitation",
},This also serves as a living contract: if WorkflowID is guaranteed non-empty for workflow_call workflows, a require.NotEmpty assertion in the compilation path would be safer than a silent fallback.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 80/100 — Excellent
📊 Metrics & Test Classification (2 tests analyzed)
Note on inflation: The 39 added test lines include ~20 lines updating expected values in existing table rows (adding Test Classification Details
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References:
|
There was a problem hiding this comment.
Non-blocking observations
The core fix (stable workflow_call group + queue: max for fan-out) is correct and well-motivated. Two non-blocking concerns worth addressing:
Findings
queue: max default scope is wider than the problem it fixes. The fan-out cancellation problem only affects workflow_call workers (and discriminated fan-outs), but queue: max is now emitted for all schedule/workflow_call/mixed workflows. Slow scheduled workflows with no fan-out will now accumulate an unbounded queue instead of keeping the GitHub default of 1 pending run. The group-concurrency-queue: false opt-out exists but isn't documented prominently. Commented inline on concurrency.go:85.
Test gap: workflow_call + ConcurrencyJobDiscriminator together. The two new code paths are tested independently but the combined case (worker with a discriminator) has no test. Commented on concurrency_test.go.
The stale lock file (missing queue: max in the concurrency block) was already flagged in a previous review comment — recompiling after the Go change is merged will resolve it.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.6M
| } | ||
| concurrencyConfig := fmt.Sprintf("concurrency:\n group: \"%s\"", groupValue) | ||
| if isGroupConcurrencyQueueEnabled(workflowData) { | ||
| concurrencyConfig += "\n queue: max" |
There was a problem hiding this comment.
queue: max as the default for all schedule/workflow_call workflows may cause unbounded run pile-up for long-running workflows. If a scheduled workflow takes longer than its cadence, or a workflow\_dispatch workflow is triggered in rapid bursts, runs queue indefinitely — blocking until they finish or are manually cancelled. The old GitHub default (1 pending run) was safer for non-fan-out cases.
💡 Detail and suggestion
The fix for fan-out worker cancellations is correct, but applying queue: max globally (rather than only when workflow_call is present or a discriminator is set) imposes a surprising default on ordinary scheduled workflows with no fan-out.
The group-concurrency-queue: false feature flag opt-out exists but isn't prominently documented. Worth noting in AGENTS.md or the authoring guide:
# For non-fan-out scheduled workflows, opt out of unbounded queuing:
features:
group-concurrency-queue: falseAlternatively, only emit queue: max when workflow_call is detected or when a discriminator is present (the actual fan-out cases), leaving plain schedule-only workflows at the default single-pending-run behaviour.
| }, | ||
| { | ||
| name: "Job discriminator ignored when push trigger (special trigger, no default group)", | ||
| workflowData: &WorkflowData{ |
There was a problem hiding this comment.
Missing test for workflow\_call + ConcurrencyJobDiscriminator combination. The new branch that substitutes WorkflowID for ${{ github.workflow }} and the discriminator-appending logic are tested independently but never together — a regression in their interaction would go undetected.
💡 Suggested test case to add
{
name: "workflow_call with discriminator uses workflow-id-based group",
workflowData: &WorkflowData{
On: "on:\n workflow_call:\n inputs:\n item_id:\n required: false\n type: string",
WorkflowID: "reviewer-worker",
EngineConfig: &EngineConfig{ID: "copilot"},
ConcurrencyJobDiscriminator: "${{ inputs.item_id }}",
},
expected: `concurrency:
group: "gh-aw-copilot-reviewer-worker-${{ inputs.item_id }}"
queue: max`,
description: "workflow_call with discriminator should append discriminator to workflow-id group",
},Also worth adding a case with workflow_call trigger and empty WorkflowID to cover the fallback to ${{ github.workflow }}.
|
@copilot review all comments and review comments |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
All 157 lock files have been recompiled and now include |
…uts-concurrency-issue
safe-outputs.call-workflowfan-out was lossy because compiler-injected agent job concurrency usedgh-aw-{engine}-${{ github.workflow }}; in reusable workers,${{ github.workflow }}resolves to the caller, causing parallel worker invocations to collide in one group and cancel each other. This change makes default worker grouping stable per worker identity and retains queued invocations instead of dropping pending runs.Agent concurrency group: workflow-call aware
GenerateJobConcurrencyConfig, default generated group now uses workflow ID namespace for reusable workflows:gh-aw-{engine}-${{ github.workflow }}gh-aw-{engine}-{workflow-id}whenon.workflow_callis present.workflow_callbehavior remains unchanged.Generated job concurrency now preserves pending fan-out
queue: max(respecting existingfeatures.group-concurrency-queue: falseopt-out).Coverage updates
queue: maxwhere defaults are generated.workflow_callworkers using workflow-ID-based group namespace.queue: max.