Fix dependency cycle in GroupBasedPartitioner._can_merge_partitions by Hyungkeun-Park-Nota · Pull Request #18397 · pytorch/executorch

Hyungkeun-Park-Nota · 2026-03-23T07:58:24Z

Summary

GroupBasedPartitioner._can_merge_partitions() only checks downstream dependencies from p2, assuming p2 is always topologically before p1. This assumption fails when partition groups contain nodes spanning wide topological ranges, causing false-negative cycle detection and ultimately AssertionError: Invalid partition, found dependency cycles at fuse_as_graphmodule time.

Root cause: Dynamic quantization inserts choose_qparams nodes that are shared across multiple GEMM ops consuming the same activation. The DSJ (Disjoint Set Join) phase merges these ops into groups whose nodes interleave in topological order. When _merge_partitions later tries to combine two such interleaved groups, the single-direction check (p2 only) misses the cycle path from p1 → external → p2.

Fix:

Check external users from both p1 and p2 (combined_nodes) instead of only p2.
Add a validate_partition() safety net (BFS on live graph edges) to catch any cycle the pre-computed _DependencyViewer might miss.

Reproduction

The issue is triggered when lowering a cross-attention transformer decoder with XnnpackDynamicallyQuantizedPartitioner. Multiple decoder layers share the same encoder output for K/V projections, causing choose_qparams sharing → DSJ group interleaving → false merge → dependency cycle.

Minimal reproduction (no external dependencies beyond PyTorch + ExecuTorch):

import math, torch, torch.nn as nn

class DecoderLayer(nn.Module):
    def __init__(self, d=256):
        super().__init__()
        self.q_proj = nn.Linear(d, d, bias=False)
        self.k_proj = nn.Linear(d, d, bias=False)
        self.v_proj = nn.Linear(d, d, bias=False)
        self.out_proj = nn.Linear(d, d, bias=False)
        self.ffn1 = nn.Linear(d, d * 2, bias=False)
        self.ffn2 = nn.Linear(d * 2, d, bias=False)
        self.norm1 = nn.LayerNorm(d)
        self.norm2 = nn.LayerNorm(d)

    def forward(self, x, mem):
        q, k, v = self.q_proj(x), self.k_proj(mem), self.v_proj(mem)
        attn = torch.softmax(torch.bmm(q, k.transpose(-2, -1)) / math.sqrt(q.size(-1)), dim=-1)
        x = self.norm1(x + self.out_proj(torch.bmm(attn, v)))
        return self.norm2(x + self.ffn2(torch.relu(self.ffn1(x))))

class TwoLayerDecoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer0 = DecoderLayer()
        self.layer1 = DecoderLayer()

    def forward(self, query, memory):
        return self.layer1(self.layer0(query, memory), memory)

# Export → dynamic quant → lower
from torch.ao.quantization.quantizer.xnnpack_quantizer import XNNPACKQuantizer, get_symmetric_quantization_config
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackDynamicallyQuantizedPartitioner
from executorch.exir import to_edge_transform_and_lower

model = TwoLayerDecoder().eval()
q, m = torch.randn(1, 10, 256), torch.randn(1, 20, 256)

exported = torch.export.export(model, (q, m), strict=False)
quantizer = XNNPACKQuantizer().set_global(get_symmetric_quantization_config(is_per_channel=True, is_dynamic=True))
prepared = prepare_pt2e(exported.module(), quantizer)
with torch.no_grad(): prepared(q, m)
converted = convert_pt2e(prepared)
re_exported = torch.export.export(converted, (q, m), strict=False)

# Before fix: AssertionError: Invalid partition, found dependency cycles
to_edge_transform_and_lower(re_exported, partitioner=[XnnpackDynamicallyQuantizedPartitioner()])

Test plan

Added test_interleaved_groups_no_false_merge in exir/backend/test/test_group_partitioner.py
Verified the test fails without the fix and passes with the fix
Existing test_group_partitioner.py tests pass

cc @JacobSzwejbka @angelayi @GregoryComer @digantdesai @cbilgin

pytorch-bot · 2026-03-23T07:58:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18397

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[CI[B200] Smoke test encounters CUDA Unknown error for dgxb200-03 and dgxb200-04

❌ 3 New Failures, 2 Unrelated Failures

As of commit 50a2447 with merge base 520566c ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / cpu-test / test-aot / test-aot (gh)
backends/cadence/aot/tests/test_pass_filter.py::TestPassFiltering::test_filter_opt_level_None
Cadence Build & Test / cpu-test / test-ops / test-ops (gh)
examples/cadence/operators/test_requantize_op.py::CadenceRequantizeOpCases::test_cadence_requantize_out_03
Lint / lintrunner-mypy (gh)
>>> Lint for backends/arm/test/models/stable_diffusion/stable_diffusion_module_test_configs.py:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2026-03-23T07:58:31Z

Hi @Hyungkeun-Park-Nota!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Hyungkeun-Park-Nota · 2026-03-23T08:23:09Z

@pytorchbot label "release notes: xnnpack"

meta-cla · 2026-03-23T08:23:17Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

nil-is-all · 2026-03-24T17:04:23Z

@JacobSzwejbka bringing this to your attention

…erge_partitions The previous implementation only checked downstream dependencies from p2, assuming p2 always precedes p1 in topological order. This assumption breaks when partition groups contain nodes spanning wide topological ranges — for example, when dynamic quantization inserts a shared `choose_qparams` node consumed by GEMM ops in different sequential transformer decoder layers. In that case the two groups *interleave* in topological order, and the single-direction check misses cycles flowing from p1 through external nodes back into p2. This change: 1. Collects external users from *both* p1 and p2 (combined_nodes) instead of only p2. 2. Adds a `validate_partition` safety net that performs a direct BFS on the live graph edges, catching any cycle the pre-computed `_DependencyViewer` might miss. Fixes `AssertionError: Invalid partition, found dependency cycles` when lowering cross-attention transformer decoders (e.g. DETR) with `XnnpackDynamicallyQuantizedPartitioner`.

- Replace torch.ao.quantization imports with torchao.quantization.pt2e.quantize_pt2e and executorch.backends.xnnpack.quantizer.xnnpack_quantizer - Fix UFMT formatting issues in forward() signatures and torch.bmm/norm2 calls

GregoryComer · 2026-04-14T21:32:22Z

@Hyungkeun-Park-Nota Thanks for the contribution. Can you fix the lints? Once CI is green, I can go ahead and merge.

meta-codesync · 2026-04-14T21:40:46Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D100871526.

GregoryComer · 2026-04-15T22:12:41Z

I just re-triggered CI. @Hyungkeun-Park-Nota can you make one other small change? We'll need to add a few deps to the buck build. We should be able to merge after that.

In exir‎/backend‎/test‎/‎BUCK, can you add these dependencies to the test_group_partitioner target?

//executorch/backends/xnnpack/partition:xnnpack_partitioner
//executorch/backends/xnnpack/quantizer:xnnpack_quantizer
//executorch/exir:lib
//pytorch/ao:torchao

Thanks!

GregoryComer · 2026-04-16T23:49:26Z

Mypy lint is pre-existing

Hyungkeun-Park-Nota requested review from JacobSzwejbka and larryliu0820 as code owners March 23, 2026 07:58

pytorch-bot Bot added the release notes: xnnpack Changes to the XNNPack backend delegate label Mar 23, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026

nil-is-all added the module: exir Issues related to Export IR and the code under exir/ label Mar 23, 2026

Hyungkeun-Park-Nota force-pushed the fix/group-partitioner-cycle-detection branch from bc3bec6 to 73425f1 Compare March 24, 2026 03:40

Hyungkeun-Park-Nota force-pushed the fix/group-partitioner-cycle-detection branch from 73425f1 to 4971367 Compare March 28, 2026 07:20

Hyungkeun-Park-Nota force-pushed the fix/group-partitioner-cycle-detection branch from 4971367 to cbc1d31 Compare March 30, 2026 06:59

nil-is-all added the module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ label Apr 1, 2026

fix: fix lint errors in test_interleaved_groups_no_false_merge

10ea9c1

- Replace torch.ao.quantization imports with torchao.quantization.pt2e.quantize_pt2e and executorch.backends.xnnpack.quantizer.xnnpack_quantizer - Fix UFMT formatting issues in forward() signatures and torch.bmm/norm2 calls

Hyungkeun-Park-Nota force-pushed the fix/group-partitioner-cycle-detection branch from b695175 to 10ea9c1 Compare April 14, 2026 05:47

fix: apply ufmt formatting to test_interleaved_groups_no_false_merge

fef81a5

fix: add missing deps to test_group_partitioner buck target

50a2447

GregoryComer approved these changes Apr 16, 2026

View reviewed changes

GregoryComer merged commit 3998693 into pytorch:main Apr 16, 2026
159 of 164 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dependency cycle in GroupBasedPartitioner._can_merge_partitions#18397

Fix dependency cycle in GroupBasedPartitioner._can_merge_partitions#18397
GregoryComer merged 4 commits intopytorch:mainfrom
Hyungkeun-Park-Nota:fix/group-partitioner-cycle-detection

Hyungkeun-Park-Nota commented Mar 23, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

meta-cla Bot commented Mar 23, 2026

Uh oh!

Hyungkeun-Park-Nota commented Mar 23, 2026

Uh oh!

meta-cla Bot commented Mar 23, 2026

Uh oh!

nil-is-all commented Mar 24, 2026

Uh oh!

GregoryComer commented Apr 14, 2026

Uh oh!

meta-codesync Bot commented Apr 14, 2026

Uh oh!

GregoryComer commented Apr 15, 2026

Uh oh!

GregoryComer commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hyungkeun-Park-Nota commented Mar 23, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Test plan

Uh oh!

pytorch-bot Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18397

❗ 1 Active SEVs

❌ 3 New Failures, 2 Unrelated Failures

Uh oh!

meta-cla Bot commented Mar 23, 2026

Action Required

Process

Uh oh!

Hyungkeun-Park-Nota commented Mar 23, 2026

Uh oh!

meta-cla Bot commented Mar 23, 2026

Uh oh!

nil-is-all commented Mar 24, 2026

Uh oh!

GregoryComer commented Apr 14, 2026

Uh oh!

meta-codesync Bot commented Apr 14, 2026

Uh oh!

GregoryComer commented Apr 15, 2026

Uh oh!

GregoryComer commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hyungkeun-Park-Nota commented Mar 23, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 23, 2026 •

edited

Loading