Skip to content

Conversation

YufengShi-dudu
Copy link
Collaborator

@YufengShi-dudu YufengShi-dudu commented Sep 26, 2025

  • ConvertMmToBmmPass converts an MM node to BMM nodes, turns input and output tensors from rank-2 to rank-3 via unsqueeze/squeeze, and inserts q-dq before and after BMM node when necessary.

  • After ConvertMmToBmmPass:

  x -> q   -> dq   -> unsqueeze -> q_2 -> dq_2 ->
                                                 \
                                                bmm -> q_4 -> dq_4
                                                 /
  y -> q_1 -> dq_1 -> unsqueeze -> q_3 -> dq_3 ->
  • Therefore, if the original matmul was 2D, the bmm already has DQ nodes on its inputs and Q node on its output. If AnnotateDecomposedMatmulPass (Arm backend: Add support for single input matmul #10654) is still applied in this case, it produces illegal sequences such as: x -> q -> unsqueeze -> q_2 (invalid)

  • Fix by checking whether the BMM is already surrounded by DQ nodes on its inputs and Q nodes on its output.

Change-Id: I9949d59b0b4a96fa34a88b0734014567ea6f24cc

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

- ConvertMmToBmmPass converts an MM node to BMM nodes, turns input
  and output tensors from rank-2 to rank-3 via unsqueeze/squeeze,
  and inserts q-dq before and after BMM node when necessary.

- After ConvertMmToBmmPass:
  x -> q   -> dq   -> unsqueeze -> q_2 -> dq_2 ->
                                                 \
                                                bmm -> q_4 -> dq_4
                                                 /
  y -> q_1 -> dq_1 -> unsqueeze -> q_3 -> dq_3 ->

- Therefore, if the original matmul was 2D, the bmm already has
  DQ nodes on its inputs and Q node on its output.
  If AnnotateDecomposedMatmulPass (pytorch#10654) is still applied in
  this case, it produces illegal sequences such as:
  x -> q -> unsqueeze -> q_2 (invalid)

- Fix by checking whether the BMM is already surrounded by DQ nodes
  on its inputs and Q nodes on its output.

Change-Id: I9949d59b0b4a96fa34a88b0734014567ea6f24cc
Signed-off-by: Yufeng Shi <yufeng.shi@arm.com>
Co-authored-by: Oscar Andersson <oscar.andersson@arm.com>
@YufengShi-dudu YufengShi-dudu added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels Sep 26, 2025
Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14624

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit bb2fbb9 with merge base dcc3978 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@YufengShi-dudu
Copy link
Collaborator Author

If possible, I suggest we get this fix into the 1.0 release branch

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YufengShi-dudu for the bug fix. Please mark the PR to be cherry-picked in 1.0. Thanks again.

@zingo zingo merged commit 9a7fb42 into pytorch:main Oct 6, 2025
276 of 279 checks passed
@YufengShi-dudu
Copy link
Collaborator Author

@pytorchbot cherry-pick --onto release/1.0 -c regression

pytorchbot pushed a commit that referenced this pull request Oct 7, 2025
- ConvertMmToBmmPass converts an MM node to BMM nodes, turns input and
output tensors from rank-2 to rank-3 via unsqueeze/squeeze, and inserts
q-dq before and after BMM node when necessary.

- After ConvertMmToBmmPass:
```
  x -> q   -> dq   -> unsqueeze -> q_2 -> dq_2 ->
                                                 \
                                                bmm -> q_4 -> dq_4
                                                 /
  y -> q_1 -> dq_1 -> unsqueeze -> q_3 -> dq_3 ->
```

- Therefore, if the original matmul was 2D, the bmm already has DQ nodes
on its inputs and Q node on its output. If AnnotateDecomposedMatmulPass
(#10654) is still applied in this case, it produces illegal sequences
such as: x -> q -> unsqueeze -> q_2 (invalid)

- Fix by checking whether the BMM is already surrounded by DQ nodes on
its inputs and Q nodes on its output.

Change-Id: I9949d59b0b4a96fa34a88b0734014567ea6f24cc

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

Signed-off-by: Yufeng Shi <yufeng.shi@arm.com>
Co-authored-by: Oscar Andersson <oscar.andersson@arm.com>
(cherry picked from commit 9a7fb42)
@pytorchbot
Copy link
Collaborator

Cherry picking #14624

The cherry pick PR is at #14845 and it is recommended to link a regression cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: none Do not include this in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants