Skip to content

fix: hard-bake workflow ID in workflow_call concurrency group to prevent deadlock#43078

Merged
pelikhan merged 3 commits into
mainfrom
copilot/call-workflow-worker-lockfile-concurrency-deadlock
Jul 3, 2026
Merged

fix: hard-bake workflow ID in workflow_call concurrency group to prevent deadlock#43078
pelikhan merged 3 commits into
mainfrom
copilot/call-workflow-worker-lockfile-concurrency-deadlock

Conversation

Copilot AI commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

When a gateway workflow invokes a worker via call_workflow, GitHub Actions evaluates github.workflow in the reusable workflow to the caller's name — not the callee's. Both lockfiles ended up with the same concurrency group (e.g. gh-aw-Test Copilot Call Workflow), causing GitHub to detect a deadlock and cancel the worker before any job records were created, marking the run failure. This was a 100% failure rate for every call_workflow test since the feature shipped.

Changes

  • pkg/workflow/concurrency.gobuildConcurrencyGroupKeys: When the workflow declares a workflow_call trigger and has a known compile-time WorkflowID, substitute the hard-baked filename-based ID instead of ${{ github.workflow }}. Falls back to ${{ github.workflow }} when WorkflowID is absent (e.g. programmatic compilation without a path).

    // Before — both gateway and worker collapse to the same runtime value:
    keys := []string{"gh-aw", "${{ github.workflow }}"}
    
    // After — worker uses its own stable, compile-time ID:
    workflowKey := "${{ github.workflow }}"
    if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" {
        workflowKey = workflowData.WorkflowID  // e.g. "call-test-copilot-call-worker"
    }
    keys := []string{"gh-aw", workflowKey}

    This mirrors the identical fix already present in GenerateJobConcurrencyConfig for job-level concurrency.

  • pkg/workflow/concurrency_test.go: Two new cases in TestGenerateConcurrencyConfig — one asserting the hard-baked ID when WorkflowID is set, one asserting the github.workflow fallback when it is not.

  • All 258 .lock.yml lockfiles recompiled to pick up the updated concurrency groups for any existing workflow_call workers.


Generated by 👨‍🍳 PR Sous Chef · 6.07 AIC · ⌖ 7.49 AIC · ⊞ 6.4K ·

…ent deadlock

When a gateway workflow calls a worker via `call_workflow`, GitHub Actions
resolves `github.workflow` to the *calling* workflow's name in reusable
workflows. This caused both the gateway lockfile and the worker lockfile to
emit the same concurrency group (`gh-aw-${{ github.workflow }}`), resulting
in a deadlock that GitHub Actions detected and cancelled.

Fix `buildConcurrencyGroupKeys` to mirror the existing logic in
`GenerateJobConcurrencyConfig`: when the workflow has a `workflow_call`
trigger and a known compile-time `WorkflowID`, substitute the hard-baked
filename-based ID instead of the `${{ github.workflow }}` expression.

Recompile all 258 workflow lockfiles so existing worker lockfiles pick up
the updated concurrency group immediately.

Closes #42846

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix concurrency deadlock in call_workflow with reusable workflows fix: hard-bake workflow ID in workflow_call concurrency group to prevent deadlock Jul 3, 2026
Copilot AI requested a review from pelikhan July 3, 2026 03:31
@pelikhan pelikhan marked this pull request as ready for review July 3, 2026 03:33
Copilot AI review requested due to automatic review settings July 3, 2026 03:33
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #43078 does not have the 'implementation' label and has only 39 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a GitHub Actions workflow_call deadlock caused by github.workflow resolving to the caller’s workflow name inside reusable workflows, making gateway + worker share the same concurrency group.

Changes:

  • Update workflow-level concurrency key generation to prefer the compile-time WorkflowID for workflow_call workflows (falling back to ${{ github.workflow }} when WorkflowID is unavailable).
  • Add unit tests covering both the WorkflowID hard-bake behavior and the fallback behavior.
  • Recompile generated workflow lockfiles so workflow_call workers use the new concurrency group value.
Show a summary per file
File Description
pkg/workflow/concurrency.go Uses compile-time WorkflowID for workflow_call workflow-level concurrency groups to avoid gateway/worker collisions.
pkg/workflow/concurrency_test.go Adds regression tests for workflow_call concurrency group behavior with and without WorkflowID.
.github/workflows/smoke-workflow-call.lock.yml Updates generated concurrency group to the hard-baked workflow ID.
.github/workflows/smoke-workflow-call-with-inputs.lock.yml Updates generated concurrency group to the hard-baked workflow ID.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions github-actions Bot mentioned this pull request Jul 3, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix concurrency deadlock in workflow_call workers

The fix is correct, well-scoped, and consistent with the existing job-level concurrency fix in GenerateJobConcurrencyConfig.

Root cause analysis ✅

GitHub Actions resolves github.workflow to the caller's name when a reusable workflow is invoked via workflow_call. Both gateway and worker therefore produced the same concurrency group string, and with cancel-in-progress semantics GitHub detects a deadlock and cancels the child run before any job records are created.

Implementation ✅

  • The fix in buildConcurrencyGroupKeys (line 236–238) mirrors the identical fix already present in GenerateJobConcurrencyConfig (line 74–76), keeping both levels consistent.
  • Fallback to ${{ github.workflow }} when WorkflowID == "" preserves backward compatibility for programmatic compilation paths that don't carry a file path.
  • The two affected lock files (smoke-workflow-call and smoke-workflow-call-with-inputs) are the only workflow_call workers in the repository, which is consistent with the diff showing only 4 changed files despite 258 total lock files.

Tests ✅

Two new test cases cover both branches of the condition — hard-baked ID when WorkflowID is set, and ${{ github.workflow }} fallback when it is not.

Minor observations (non-blocking)

  • Mixed-trigger workflows (workflow_call + others): the hard-baked ID is applied even when the workflow might be triggered via workflow_dispatch. This is safe because in those cases ${{ github.workflow }} would resolve to the workflow's own name anyway — the same value as WorkflowID.
  • cancel-in-progress is not enabled for pure workflow_call workers (gated on isPullRequestWorkflow), so these groups function as serialization queues rather than cancellation groups. This is the intended behavior for worker workflows.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 27.5 AIC · ⌖ 6.04 AIC · ⊞ 4.9K

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 2 test(s): 2 design, 0 implementation, 0 violation(s).

📊 Metrics (2 tests)
Metric Value
Analyzed 2 (Go: 2, JS: 0)
✅ Design 2 (100%)
⚠️ Implementation 0 (0%)
Edge/error coverage 2 (100%)
Duplicate clusters 0
Inflation YES (30 test lines / 9 prod lines = 3.3:1)
🚨 Violations 0
Test File Classification Issues
workflow_call worker uses hard-baked workflow ID to avoid caller-name deadlock pkg/workflow/concurrency_test.go:354 ✅ design_test / behavioral_contract / high_value Inflation (3.3:1, justified by YAML format)
workflow_call without WorkflowID falls back to github.workflow expression pkg/workflow/concurrency_test.go:372 ✅ design_test / behavioral_contract / high_value Inflation (shared with above)
⚠️ Flagged Tests (1 — inflation note)

Both new table rows (pkg/workflow/concurrency_test.go) — Test inflation ratio is 3.3:1 (30 added test lines vs 9 added production lines). This exceeds the 2:1 threshold, penalising 10 points. However, the ratio is structurally expected for table-driven YAML comparison tests — each row must define a full WorkflowData struct, expected multi-line YAML, and a description. The inflation is not indicative of low-quality or padded tests.

Verdict

Passed. 0% implementation tests (threshold: 30%). No coding-guideline violations. Both new tests directly cover the behavioral invariant introduced by the fix: workflow_call workers hard-bake the compile-time workflow ID to prevent concurrency group collisions with their calling gateway.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • awmgmcpg

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "awmgmcpg"

See Network Configuration for more information.

🧪 Test quality analysis by Test Quality Sentinel · 29.1 AIC · ⌖ 12 AIC · ⊞ 6.8K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. 0% implementation tests (threshold: 30%). Both new table-driven tests are high-value behavioral contracts directly covering the concurrency deadlock fix.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnosing-bugs and /codebase-design — this is a well-targeted fix for a gnarly GitHub Actions behaviour.

📋 Key Themes & Highlights

Key Themes

  • Pattern duplication: the hasWorkflowCallTrigger && WorkflowID != "" guard is now present at two independent sites in concurrency.go. Extracting a small helper (e.g. workflowConcurrencyID) would make the logic single-source-of-truth and easier to evolve.
  • Test completeness: the two new test cases cover the primary happy path and the empty-ID fallback well. A mixed-trigger case (workflow_call + workflow_dispatch) would close the last gap.

Positive Highlights

  • ✅ Root cause properly identified and explained — the PR description traces the exact GitHub Actions evaluation rule at fault.
  • ✅ Fix mirrors the identical pattern already present in GenerateJobConcurrencyConfig, so the codebase stays consistent.
  • ✅ Regression tests included in the same commit — both cases are meaningful specifications, not just coverage padding.
  • ✅ Comment in the changed lines is clear and explains the why, not just the what.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 55.4 AIC · ⌖ 6.79 AIC · ⊞ 6.6K
Comment /matt to run again

Comment thread pkg/workflow/concurrency.go Outdated
// share the same concurrency group, causing a deadlock. Hard-bake the compile-time
// workflow ID (filename without extension) to give each worker a unique group.
workflowKey := "${{ github.workflow }}"
if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/codebase-design] The guard hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" is now duplicated verbatim at line 74 (GenerateJobConcurrencyConfig) and here. If the rule evolves, both sites must be updated in sync.

💡 Suggested refactor

Extract a small helper that returns the correct workflow-level key:

// workflowConcurrencyID returns the stable key for a workflow.
// For workflow_call workers, github.workflow resolves to the caller
// name at runtime, so we return the compile-time WorkflowID when available.
func workflowConcurrencyID(data *WorkflowData) string {
    if hasWorkflowCallTrigger(data.On) && data.WorkflowID != "" {
        return data.WorkflowID
    }
    return "${{ github.workflow }}"
}

Both call sites become a single readable expression and the reasoning lives in one place.

@copilot please address this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 91ceef3: extracted workflowCallBaseKey(...) and reused it from both GenerateJobConcurrencyConfig and buildConcurrencyGroupKeys so workflow_call key selection is single-source-of-truth.

Comment thread pkg/workflow/concurrency_test.go Outdated
},
isAliasTrigger: false,
expected: `concurrency:
group: "gh-aw-call-test-copilot-call-worker"`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The test covers the happy path and fallback, but is missing a case where workflow_call is combined with other triggers (e.g., workflow_call + workflow_dispatch). GitHub Actions allows mixed triggers, and it is worth confirming the fix still kicks in under that scenario.

💡 Suggested additional test case
{
    name: "workflow_call mixed with workflow_dispatch still uses hard-baked ID",
    workflowData: &WorkflowData{
        On: `on:
  workflow_call:
  workflow_dispatch:`,
        WorkflowID:  "mixed-call-worker",
        Concurrency: "",
    },
    isAliasTrigger: false,
    expected: `concurrency:
  group: "gh-aw-mixed-call-worker"`,
    description: "workflow_call + workflow_dispatch still prefers compile-time ID",
},

This guards against future trigger-detection changes inadvertently flipping the logic on mixed workflows.

@copilot please address this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 91ceef3: added a mixed-trigger regression in TestGenerateConcurrencyConfig (workflow_call + workflow_dispatch) to verify workflow_call detection still applies.

@pelikhan

pelikhan commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadlock fix is correct in intent but introduces a new concurrency starvation risk for fan-out callers. Three issues need to be addressed before merge.

Findings summary

High — Static group starves parallel worker invocations (line 239)

The generated group "gh-aw-<WorkflowID>" is fully static. GitHub Actions allows at most 1 running + 1 queued run per group (no cancel-in-progress, no queue: max at workflow level). A third concurrent call to the same worker drops the second pending run silently. For fan-out patterns this is a reliability regression. GenerateJobConcurrencyConfig already handles the analogous job-level case with queue: max — the workflow-level group needs similar treatment or an explicit run-isolating token.

Medium — Fallback to ${{ github.workflow }} is silent (line 236)

When WorkflowID == "", the code silently reverts to the original deadlock-prone value. No log line, no compile warning. Any programmatic compile without a path will reproduce the 100%-failure bug with no diagnostic signal.

Low — Test suite misses the mixed workflow_call + workflow_dispatch trigger case (concurrency_test.go)

The new tests cover pure workflow_call only. workflow_call + workflow_dispatch workers (a common real-world pattern) are untested.

🔎 Code quality review by PR Code Quality Reviewer · 187.2 AIC · ⌖ 5.39 AIC · ⊞ 5.4K
Comment /review to run again

if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" {
workflowKey = workflowData.WorkflowID
}
keys := []string{"gh-aw", workflowKey}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static concurrency group will serialise (or starve) parallel invocations of the same reusable worker.

The group "gh-aw-<WorkflowID>" is now fully static. GitHub Actions allows at most 1 running + 1 pending run per concurrency group when cancel-in-progress is absent. A third concurrent invocation of the same worker silently cancels the second pending one — dropped work with no error surfaced to the caller.

call_workflow is designed for fan-out: multiple jobs in the same (or different) gateway workflows can invoke the same worker simultaneously. Under the new static group those runs queue behind each other and the third drops the second.

💡 Suggested fix

Include a caller-unique token so parallel invocations can coexist:

if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" {
    // github.run_id is unique per invocation; prevents queue collisions for fan-out callers
    workflowKey = fmt.Sprintf("%s-${{ github.run_id }}", workflowData.WorkflowID)
}

Alternatively, if sequential serialisation is intentional for all workflow_call workers, document it explicitly and note that the third concurrent caller will lose its run silently.

Note: GenerateJobConcurrencyConfig already handles this at job level by emitting queue: max (line 85). The workflow-level group created here has no equivalent protection.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 91ceef3: workflow-level workflow_call concurrency now appends ${{ github.run_id }} (<workflow-id>-${{ github.run_id }}) to prevent fan-out invocations from contending on a single static group.

Comment thread pkg/workflow/concurrency.go Outdated
// share the same concurrency group, causing a deadlock. Hard-bake the compile-time
// workflow ID (filename without extension) to give each worker a unique group.
workflowKey := "${{ github.workflow }}"
if hasWorkflowCallTrigger(workflowData.On) && workflowData.WorkflowID != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent regression: fallback to ${{ github.workflow }} for a workflow_call worker re-introduces the original deadlock with no warning emitted.

When WorkflowID == "" the code falls back to the broken pre-fix value. The PR description acknowledges this fallback exists ("programmatic compilation without a path"), but nothing in the logs or output surfaces it. A compile invoked without a path will silently produce the same deadlock-prone group that caused the 100% failure rate described in the PR.

💡 Suggested fix

At minimum, emit a warning when the fallback is taken so the misconfiguration is visible:

workflowKey := "${{ github.workflow }}"
if hasWorkflowCallTrigger(workflowData.On) {
    if workflowData.WorkflowID != "" {
        workflowKey = workflowData.WorkflowID
    } else {
        concurrencyLog.Print("WARNING: workflow_call worker has no WorkflowID; " +
            "falling back to github.workflow which will cause deadlock " +
            "if gateway and worker share the same caller name")
    }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 91ceef3: when workflow_call is detected and WorkflowID is empty, we now emit a warning log from workflowCallBaseKey before falling back to ${{ github.workflow }}.

group: "gh-aw-${{ github.workflow }}-${{ github.event.issue.number || github.run_id }}"`,
description: "Rendered slash_command YAML (issue_comment + workflow_dispatch) uses issue number via isIssueWorkflow",
},
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the mixed workflow_call + workflow_dispatch trigger case — which is exactly what the two updated lock files use.

Both smoke-workflow-call.lock.yml and smoke-workflow-call-with-inputs.lock.yml declare only workflow_call in their on: section (verified in the lock files), so the fix is correct for those specific workers. However, there is no test asserting the behaviour when On contains both workflow_call and workflow_dispatch (a common real-world pattern for workers that can also be manually triggered). If a future worker uses that combination, the fix applies — but no test validates the group produced or the absence of cancel-in-progress.

💡 Suggested addition
{
    name: "workflow_call + workflow_dispatch worker still uses hard-baked ID",
    workflowData: &WorkflowData{
        On: `on:
  workflow_call:
  workflow_dispatch:`,
        WorkflowID:  "my-worker",
        Concurrency: "",
    },
    isAliasTrigger: false,
    expected: `concurrency:
  group: "gh-aw-my-worker"`,
    description: "Mixed triggers: workflow_call detection still wins and hard-bakes the ID",
},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 91ceef3: added explicit mixed-trigger test coverage (workflow_call + workflow_dispatch) and asserted the generated group keeps workflow_call behavior.

…rage

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Ran pr-finisher flow and addressed in-scope review feedback in 91ceef3. Local validation passed (make fmt, make lint, make test-unit, make test, make recompile, make agent-report-progress); CI on this new head needs a maintainer re-trigger.

@gh-aw-bot

Copy link
Copy Markdown
Collaborator

@copilot please run the pr-finisher skill, address unresolved review comments, and rerun checks once the branch is up to date.

Generated by 👨‍🍳 PR Sous Chef · 6.07 AIC · ⌖ 7.49 AIC · ⊞ 6.4K ·

Copilot AI commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please run the pr-finisher skill, address unresolved review comments, and rerun checks once the branch is up to date.

...

Ran the pr-finisher pass on PR #43078. All in-scope unresolved review feedback is already addressed in 91ceef3, and local validation is green (make fmt, make lint, make test-unit, make test, make recompile, make agent-report-progress).

I also inspected the latest PR workflow runs for this branch: they are action_required with no jobs started (stale/needs human rerun context for Copilot agent pushes), so CI on the current head remains unverified until a maintainer re-triggers checks.

Copilot AI requested a review from gh-aw-bot July 3, 2026 04:33
@pelikhan pelikhan merged commit d65e6d7 into main Jul 3, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/call-workflow-worker-lockfile-concurrency-deadlock branch July 3, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

call_workflow: worker lockfile concurrency deadlock — github.workflow resolves to calling workflow name in reusable workflows

4 participants