Fix dependency cycle in GroupBasedPartitioner._can_merge_partitions#18397
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18397
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 2 Unrelated FailuresAs of commit 50a2447 with merge base 520566c ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
@pytorchbot label "release notes: xnnpack" |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
bc3bec6 to
73425f1
Compare
|
@JacobSzwejbka bringing this to your attention |
73425f1 to
4971367
Compare
…erge_partitions The previous implementation only checked downstream dependencies from p2, assuming p2 always precedes p1 in topological order. This assumption breaks when partition groups contain nodes spanning wide topological ranges — for example, when dynamic quantization inserts a shared `choose_qparams` node consumed by GEMM ops in different sequential transformer decoder layers. In that case the two groups *interleave* in topological order, and the single-direction check misses cycles flowing from p1 through external nodes back into p2. This change: 1. Collects external users from *both* p1 and p2 (combined_nodes) instead of only p2. 2. Adds a `validate_partition` safety net that performs a direct BFS on the live graph edges, catching any cycle the pre-computed `_DependencyViewer` might miss. Fixes `AssertionError: Invalid partition, found dependency cycles` when lowering cross-attention transformer decoders (e.g. DETR) with `XnnpackDynamicallyQuantizedPartitioner`.
4971367 to
cbc1d31
Compare
- Replace torch.ao.quantization imports with torchao.quantization.pt2e.quantize_pt2e and executorch.backends.xnnpack.quantizer.xnnpack_quantizer - Fix UFMT formatting issues in forward() signatures and torch.bmm/norm2 calls
b695175 to
10ea9c1
Compare
|
@Hyungkeun-Park-Nota Thanks for the contribution. Can you fix the lints? Once CI is green, I can go ahead and merge. |
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D100871526. |
|
I just re-triggered CI. @Hyungkeun-Park-Nota can you make one other small change? We'll need to add a few deps to the buck build. We should be able to merge after that. In exir/backend/test/BUCK, can you add these dependencies to the test_group_partitioner target?
Thanks! |
|
Mypy lint is pre-existing |
Summary
GroupBasedPartitioner._can_merge_partitions()only checks downstream dependencies fromp2, assumingp2is always topologically beforep1. This assumption fails when partition groups contain nodes spanning wide topological ranges, causing false-negative cycle detection and ultimatelyAssertionError: Invalid partition, found dependency cyclesatfuse_as_graphmoduletime.Root cause: Dynamic quantization inserts
choose_qparamsnodes that are shared across multiple GEMM ops consuming the same activation. The DSJ (Disjoint Set Join) phase merges these ops into groups whose nodes interleave in topological order. When_merge_partitionslater tries to combine two such interleaved groups, the single-direction check (p2 only) misses the cycle path from p1 → external → p2.Fix:
p1andp2(combined_nodes) instead of onlyp2.validate_partition()safety net (BFS on live graph edges) to catch any cycle the pre-computed_DependencyViewermight miss.Reproduction
The issue is triggered when lowering a cross-attention transformer decoder with
XnnpackDynamicallyQuantizedPartitioner. Multiple decoder layers share the same encoder output for K/V projections, causingchoose_qparamssharing → DSJ group interleaving → false merge → dependency cycle.Minimal reproduction (no external dependencies beyond PyTorch + ExecuTorch):
Test plan
test_interleaved_groups_no_false_mergeinexir/backend/test/test_group_partitioner.pytest_group_partitioner.pytests passcc @JacobSzwejbka @angelayi @GregoryComer @digantdesai @cbilgin