Skip to content

Fix nightly operator test failures across nxp_rt600 and DLA_V130 backends#18951

Merged
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
ethansfng:export-D100873709
Apr 17, 2026
Merged

Fix nightly operator test failures across nxp_rt600 and DLA_V130 backends#18951
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
ethansfng:export-D100873709

Conversation

@ethansfng
Copy link
Copy Markdown
Contributor

Summary:
Fix 6 categories of nightly operator test failures by addressing FACTO test generation constraints and kernel bugs:

FACTO constraint fixes (facto_util.py):

  • div.Tensor_mode: Remove int64 from dtype constraints — nxp_rt600 lacks native int64 support, causing off-by-1 rounding errors via _to_copy fallback
  • permute_copy.default: Restrict dtypes to float32/int32 — int8/uint8 cause ISS crashes since xa_nn_transpose doesn't handle sub-word integer types
  • pow.Tensor_Scalar: Add Value.Ge(0) constraint — negative inputs produce NaN via negative^fractional, which DSP backends don't implement per IEEE 754

Kernel bug fixes (op_add.cpp, op_sub.cpp):

  • Fix || vs && logic error in broadcast type dispatch that caused int32 data to be reinterpreted as float32, producing garbage output on DLA_V130
  • Add missing broadcast dispatch cases for Int+Int, Long+Long, Int+Long, Long+Int
  • Change static_cast to static_cast to avoid precision loss for large int32 values

HiFi kernel guard fixes:

  • op_where.cpp: Disable optimized nnlib path when condition tensor needs broadcasting — xa_nn_elm_select_broadcast_4D only computes strides for inp1/inp2, not the condition tensor
  • op_permute_copy.cpp: Disable xa_nn_transpose_32_32 for Float — the nnlib function crashes the ISS for certain tensor shapes; fall back to correct generic implementation
  • op_softmax.cpp: Disable optimized nnlib path when softmax dim is not the last dimension — the permuted path allocates temp memory exceeding the budget on resource-constrained targets

Differential Revision: D100873709

…ends

Summary:
Fix 6 categories of nightly operator test failures by addressing FACTO test generation constraints and kernel bugs:

**FACTO constraint fixes (facto_util.py):**
- div.Tensor_mode: Remove int64 from dtype constraints — nxp_rt600 lacks native int64 support, causing off-by-1 rounding errors via _to_copy fallback
- permute_copy.default: Restrict dtypes to float32/int32 — int8/uint8 cause ISS crashes since xa_nn_transpose doesn't handle sub-word integer types
- pow.Tensor_Scalar: Add Value.Ge(0) constraint — negative inputs produce NaN via negative^fractional, which DSP backends don't implement per IEEE 754

**Kernel bug fixes (op_add.cpp, op_sub.cpp):**
- Fix || vs && logic error in broadcast type dispatch that caused int32 data to be reinterpreted as float32, producing garbage output on DLA_V130
- Add missing broadcast dispatch cases for Int+Int, Long+Long, Int+Long, Long+Int
- Change static_cast<float> to static_cast<double> to avoid precision loss for large int32 values

**HiFi kernel guard fixes:**
- op_where.cpp: Disable optimized nnlib path when condition tensor needs broadcasting — xa_nn_elm_select_broadcast_4D only computes strides for inp1/inp2, not the condition tensor
- op_permute_copy.cpp: Disable xa_nn_transpose_32_32 for Float — the nnlib function crashes the ISS for certain tensor shapes; fall back to correct generic implementation
- op_softmax.cpp: Disable optimized nnlib path when softmax dim is not the last dimension — the permuted path allocates temp memory exceeding the budget on resource-constrained targets

Differential Revision: D100873709
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18951

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (3 Unrelated Failures)

As of commit 450cdbf with merge base a489707 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 16, 2026

@ethansfng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100873709.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot merged commit 4618b80 into pytorch:main Apr 17, 2026
161 of 173 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants