-
Notifications
You must be signed in to change notification settings - Fork 820
[cortex_m] Fix linear weight layout: transpose in AOT pass, align meta/ref impl #16782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16782
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes shape inference for the Cortex-M quantized_linear fake/meta function.
Changes:
- Updates
quantized_linear_metato compute output shape using a different weight dimension.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Copilot raised valid points, converting to draft , need to dig into this issue further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| kernel_sum_tensor = self._compute_kernel_sum( | ||
| weights_tensor, bias_tensor, -input_zp, -weight_zp | ||
| ) | ||
|
|
||
| # Transpose weights from PyTorch format [out_features, in_features] | ||
| # to CMSIS-NN format [in_features, out_features] | ||
| weights_transposed = weights_tensor.T.contiguous() | ||
|
|
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_compute_kernel_sum() already transposes weights internally (weights.T), and this block transposes weights_tensor again to materialize weights_transposed. This duplicates the layout conversion logic in two places and does an extra transpose at AOT time.
Consider refactoring so the transpose is computed once (e.g., compute weights_transposed first and pass it into _compute_kernel_sum, or have _compute_kernel_sum accept already-transposed weights).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Transpose weights once from PyTorch format [out_features, in_features] | ||
| # to CMSIS-NN format [in_features, out_features] | ||
| weights_transposed = weights_tensor.T.contiguous() | ||
| # Pass already-transposed weights to kernel_sum computation | ||
| kernel_sum_tensor = self._compute_kernel_sum( | ||
| weights_tensor, bias_tensor, -input_zp, -weight_zp | ||
| weights_transposed, bias_tensor, -input_zp, -weight_zp | ||
| ) |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass now changes the expected weight layout for cortex_m.quantized_linear to [in_features, out_features]. Please add a regression test that inspects the post-pass graph/parameters (not just numerical output) to ensure weights are actually stored/transmitted in the transposed layout; otherwise the Python reference path can still pass while the CMSIS-NN runtime path regresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, Will do in follow up PR
[cortex_m] Fix linear weight layout: transpose in AOT pass, align meta/ref impl Summary: The linear path in ConvertToCortexMPass was not transposing weights unlike conv2d, causing inconsistency with the C++ runtime which expects weights in [in_features, out_features] format per CMSIS-NN. Changes: - convert_to_cortex_m_pass.py: Transpose linear weights [out, in] -> [in, out] - operators.py: Update meta to use weights.shape[1] for output dimension - operators.py: Remove .T from ref impl (weights pre-transposed by pass) - operators.py: Transpose once, pass to _compute_kernel_sum Fixes MV2 output shape mismatch: [1, 1280] -> [1, 1000] MV2 on Corstone-300/E8 with CMSIS-NN kernels This fix ensures the AOT-compiled .pte file has correctly shaped output tensors for any model using quantized_linear (MV2, ResNet, MV3, etc.).
| kernel_sum_tensor, | ||
| ) | ||
|
|
||
| weights_transposed_node = create_constant_placeholder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete old weights here if they have no users left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will thoroughly test and provide fix in follow up PR : #16866
|
Looks reasonable and nicely commented which makes it easy to understand! Any ideas why this was not caught in the unittests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Quick glance and no tests actually seem to cover this particular usecase Will add test cases in the follow |
|
Hey CI is failing after this update: https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=cortex&useRegexFilter=true, can you have another look on this? |
Yes I am looking into it by giving fwd fix |
…lign meta/ref impl (pytorch#16782)" This reverts commit 06f10b9.
#16910) …lign meta/ref impl (#16782)" This reverts commit 06f10b9. ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. Co-authored-by: Github Executorch <github_executorch@arm.com>
Summary:
Test plan
Run MV2 lowered to CMSIS-NN ops on E8 alif board