Arm backend: Lower MXFP Linear to TOSA by martinlsm · Pull Request #19969 · pytorch/executorch

martinlsm · 2026-06-03T06:33:07Z

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

Add fake TOSA dialect support and serializer lowering for CAST_TO_BLOCK_SCALED. Co-authored-by: Sebastian Larsson <sebastian.larsson@arm.com> Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: Ic7cdab5134f0fb9502f5985563f0662286ef5fb7

Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Co-authored-by: Sebastian Larsson <sebastian.larsson@arm.com> Change-Id: Iab2e1cf2ed21047bbc2a7a51604b9230fe2f2819

pytorch-bot · 2026-06-03T06:33:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19969

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 0603d37 with merge base f0d9991 ():

NEW FAILURE - The following job has failed:

trunk / unittest-release / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_resnet50_model

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / test-arm-backend-ethos-u (test_smaller_stories_llama) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

martinlsm · 2026-06-03T06:33:24Z

@pytorchbot label ciflow/trunk

martinlsm · 2026-06-03T06:33:32Z

@pytorchbot label "partner: arm"

martinlsm · 2026-06-03T06:33:44Z

@pytorchbot label "release notes: arm"

Copilot

Pull request overview

This PR adds end-to-end support in the Arm backend to lower MXFP Linear into explicit TOSA MXFP operators by introducing new TOSA dialect ops (cast-to-block-scaled + block-scaled matmul), wiring up serialization visitors, and inserting a rewrite pass in the Arm TOSA pipeline. It also expands dtype mapping/serialization to cover MXFP-related FP8 types and updates/extends the test suite accordingly.

Changes:

Add FP8 dtype mappings and broaden spec checks to recognize MXFP-enabled FP8 usage.
Introduce TOSA dialect ops + Arm operator visitors for CAST_TO_BLOCK_SCALED and MATMUL_T_BLOCK_SCALED.
Add RewriteMXFPLinearPass and update Arm tests/pipelines to validate the new lowering path.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
backends/arm/tosa/mapping.py	Map additional FP8 dtypes and validate FP8 support against TOSA extensions.
backends/arm/tosa/dialect/ops/matmul_t_block_scaled.py	Define a fake TOSA op for MXFP block-scaled matmul with shape/dtype validation.
backends/arm/tosa/dialect/ops/cast_to_block_scaled.py	Define a fake TOSA op for MXFP block-scaled casting with shape/dtype validation.
backends/arm/tosa/dialect/init.py	Ensure newly added dialect ops modules are imported/registered.
backends/arm/test/targets.bzl	Rehome MXFP linear op test and add new TOSA dialect tests to Bazel targets.
backends/arm/test/passes/test_rewrite_mxfp_linear_pass.py	Add pass-level tests asserting custom MXFP linear op is rewritten into TOSA MXFP ops.
backends/arm/test/ops/mxfp/test_mxfp_linear.py	Refactor and expand MXFP linear tests using new pipelines; add channels-last case; add VGF xfails.
backends/arm/test/ops/mxfp/common.py	Add shared MXFP pipeline helpers/stages for TOSA/VGF test execution.
backends/arm/test/ops/mxfp/init.py	Add package marker for MXFP op tests.
backends/arm/test/misc/tosa_dialect/test_tosa_dialect_mxfp_linear.py	Add fake-op level tests for `MATMUL_T_BLOCK_SCALED`.
backends/arm/test/misc/tosa_dialect/test_tosa_dialect_cast_to_block_scaled.py	Add fake-op level tests for `CAST_TO_BLOCK_SCALED`.
backends/arm/process_node.py	Extend tensor serialization path to support `torch.float8_e8m0fnu` via `ml_dtypes`.
backends/arm/operators/op_tosa_matmul_t_block_scaled.py	Add serializer visitor for `MATMUL_T_BLOCK_SCALED`.
backends/arm/operators/op_tosa_cast_to_block_scaled.py	Add serializer visitor for multi-output `CAST_TO_BLOCK_SCALED`.
backends/arm/operators/init.py	Import/register the new operator visitor modules.
backends/arm/operator_support/tosa_supported_operators.py	Allow MX custom op partitioning under mxfp; adjust dtype disallow list for FP8 under mxfp.
backends/arm/_passes/rewrite_mxfp_linear.py	Implement the rewrite of `tosa_mxfp.linear` into explicit TOSA MXFP ops.
backends/arm/_passes/arm_pass_manager.py	Insert the new rewrite pass into the TOSA lowering pipeline.
backends/arm/_passes/init.py	Export the new rewrite pass from the passes package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                ):
                    return False

        return True


+        tosa_spec.support_extension("fp8e5m2") or tosa_spec.support_extension("mxfp")
+    ):
        disallowed_dtypes.append(torch.float8_e5m2)
    if tosa_spec.is_U55_subset:


+        inputs: List[TosaArg],
+        output: TosaArg,
+    ) -> None:
+        validate_num_inputs(self.target, inputs, 2)


+        # TODO(MLETORCH-2018): This is a local workaround for multi-output TOSA ops.
+        # Remove it once twe can handle multiple outputs generally.
+        output_names = _ordered_getitem_output_names(node)


+from executorch.backends.arm.operators.operator_validation_utils import (
+    validate_num_inputs,
+)
+from executorch.backends.arm.tosa.mapping import TosaArg


+            f"{CastToBlockScaledVisitor.target}: Expected exactly two getitem outputs, got {len(ordered_users)}"
+        )
+
+    return [user.name for user in ordered_users]


zingo

OK to merge if tests are OK and a bests effort have been made to update buck2 files.

zingo · 2026-06-03T14:05:14Z

The timeout fails on test_smaller_stories is known and being worked on separately.

martinlsm · 2026-06-03T14:15:18Z

OK to merge if tests are OK and a bests effort have been made to update buck2 files.

Yes, I believe things look all right and buck2 files are (hopefully) updated correctly.

kirklandsign · 2026-06-04T17:59:20Z

Hi @martinlsm @zingo I probably need to revert this PR. It's breaking our internal CI and mind re-landing this from internal first with our engineer?

digantdesai · 2026-06-04T19:40:15Z

Example error from internal CI -

executorch/backends/arm/test/__rewrite_conv_pass__/rewrite_conv_pass#link-tree/torch/_ops.py:1385: in __getattr__
    raise AttributeError(
E   AttributeError: '_OpNamespace' 'tosa_mxfp' object has no attribute 'linear'

zingo · 2026-06-04T21:40:30Z

Hi @martinlsm @zingo I probably need to revert this PR. It's breaking our internal CI and mind re-landing this from internal first with our engineer?

Of cause, sorry for the problem, i created an revert here from this PR if you need it feel free approve and merge directly if you need. I can't self approve and due to time zones diffs we dont get back here in about 8h.

kirklandsign · 2026-06-05T06:08:22Z

Thank you @zingo

Also I made some modifications to make buck work:
backends/arm/test/BUCK

load("@fbcode_macros//build_defs:build_file_migration.bzl", "fbcode_target", "non_fbcode_target")
# Copyright 2025-2026 Arm Limited and/or its affiliates.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
load(":targets.bzl", "define_arm_tests")


oncall("executorch")

fbcode_target(_kind = runtime.python_library,
    name = "conftest",
    srcs = ["conftest.py"],
    deps = [
        "//executorch/exir:lib",
        "//executorch/exir/backend:compile_spec_schema",
        "fbsource//third-party/pypi/pytest:pytest",
    ]
)

fbcode_target(_kind = runtime.python_library,
    name = "runner_utils",
    srcs = ["runner_utils.py"],
    resources = {
        "fbsource//third-party/flatbuffers:flatc-host": "flatbuffers-flatc",
    },
    deps = [
        ":conftest",
        "//executorch/backends/arm:arm_compile_spec",
        "//executorch/backends/arm:ethosu",
        "//executorch/backends/arm/tosa:compile_spec",
        "//executorch/backends/arm/tosa:schemas",
        "//executorch/backends/arm:vgf",
        "//executorch/backends/arm/tosa:specification",
        "//executorch/exir:lib",
        "//executorch/exir/backend:compile_spec_schema",
    ]
)

fbcode_target(_kind = runtime.python_library,
    name = "common",
    srcs = ["common.py"],
    deps = [
        ":runner_utils",
        "//executorch/backends/arm/tosa:tosa",
        "fbsource//third-party/pypi/pytest:pytest",
    ]
)

fbcode_target(_kind = runtime.python_library,
    name = "arm_tester_serialize",
    srcs = ["tester/serialize.py"],
    deps = [
        "//executorch/backends/test/harness:tester",
        "//executorch/devtools/backend_debug:delegation_info",
    ]
)

fbcode_target(_kind = runtime.python_library,
    name = "arm_tester_lib",
    srcs = glob(["tester/*.py"], exclude = ["tester/serialize.py"]),
    deps = [
        ":common",
        "//executorch/backends/test/harness:tester",
        "//executorch/backends/arm:ethosu",
        "//executorch/backends/arm/quantizer:lib",
        "//executorch/backends/arm/tosa:mapping",
        "//executorch/backends/arm:vgf",
        "//executorch/backends/arm:_factory",
        "//executorch/devtools/backend_debug:delegation_info",
        "//executorch/exir/backend:operator_support",
        "fbsource//third-party/pypi/tabulate:tabulate",
    ]
)


fbcode_target(_kind = runtime.python_library,
    name = "arm_tester",
    deps = [
        "//executorch/backends/arm/test:arm_tester_lib",
        "//executorch/backends/arm/test:arm_tester_serialize",
    ]
)

fbcode_target(_kind = runtime.python_library,
    name = "mxfp_test_common",
    srcs = [
        "ops/mxfp/__init__.py",
        "ops/mxfp/common.py",
    ],
    deps = [
        ":arm_tester" if runtime.is_oss else "//executorch/backends/arm/test/tester/fb:arm_tester_fb",
        "//executorch/backends/arm:ao_ext",
        "//executorch/backends/test/harness:tester",
    ],
)

fbcode_target(_kind = define_arm_tests,)

backends/arm/test/targets.bzl

# load("//caffe2/test/fb:defs.bzl", "define_tests")
load("@fbcode_macros//build_defs:python_pytest.bzl", "python_pytest")
load("@bazel_skylib//lib:paths.bzl", "paths")
load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")

_ENABLE_VGF = True

def define_arm_tests():
    # TODO [fbonly] Add more tests
    test_files = []

    # Passes
    test_files += native.glob(["passes/test_*.py"])

    # Operators
    test_files += [
        "ops/test_add.py",
        "ops/test_addmm.py",
        "ops/test_avg_pool2d.py",
        "ops/test_cat.py",
        "ops/test_conv2d.py",
        "ops/test_linear.py",
        "ops/test_log10.py",
        "ops/test_max_pool1d.py",
        "ops/test_mul.py",
        "ops/mxfp/test_mxfp_linear.py",
        "ops/test_permute.py",
        "ops/test_rsqrt.py",
        "ops/test_slice.py",
        "ops/test_sigmoid.py",
        "ops/test_softmax.py",
        "ops/test_sub.py",
        "ops/test_sum.py",
        "ops/test_tanh.py",
        "ops/test_view.py",
        "ops/test_cos.py",
        "ops/test_to_copy.py",
        "ops/test_exp.py",
        "ops/test_reciprocal.py",
        "ops/test_mean_dim.py",
        "ops/test_var.py",
        "ops/test_conv1d.py",
        "ops/test_gelu.py",
        "ops/test_bmm.py",
        "ops/test_split.py",
    ]

    # Quantization
    test_files += [
        "quantizer/test_generic_annotater.py",
        "quantizer/test_uint8_io_quantization.py",
    ]

    # Misc tests
    test_files += [
        "misc/test_compile_spec.py",
        # "misc/test_evaluate_model.py",
        "misc/test_pass_pipeline_config.py",
        "misc/tosa_dialect/test_tosa_dialect_cast_to_block_scaled.py",
        "misc/tosa_dialect/test_tosa_dialect_mxfp_linear.py",
        "misc/tosa_dialect/test_tosa_resize.py",
        "misc/test_tosa_spec.py",
        "misc/test_bn_relu_folding_qat.py",
        "misc/test_custom_partition.py",
        "misc/test_debug_hook.py",
        "misc/test_mxfp_linear_ao.py",
        "misc/test_post_quant_device_switch.py",
        # "misc/test_dim_order.py", (TODO - T238390249)
    ]

    # Deprecation tests
    test_files += [
        "deprecation/test_arm_compile_spec_deprecation.py",
    ]

    TESTS = {}

    for test_file in test_files:
        test_file_name = paths.basename(test_file)
        test_name = test_file_name.replace("test_", "").replace(".py", "")

        python_pytest(
            name = test_name,
            srcs = [test_file],
            pytest_config = "pytest.ini",
            resources = ["conftest.py"],
            compile = "with-source",
            typing = False,
            skip_on_mode_mac = True,
            env = {} if runtime.is_oss else ({
                "MODEL_CONVERTER_PATH": "$(location fbsource//third-party/pypi/ai-ml-sdk-model-converter/0.8.0:model-converter-bin)",
                "MODEL_CONVERTER_LIB_DIR": "$(location fbsource//third-party/nvidia-nsight-systems:linux-x86_64)/host-linux-x64",
                "LAVAPIPE_LIB_PATH": "$(location fbsource//third-party/mesa:vulkan_lvp)",
                "EMULATION_LAYER_TENSOR_SO": "$(location fbsource//third-party/arm-ml-emulation-layer/v0.9.0/src:libVkLayer_Tensor)",
                "EMULATION_LAYER_GRAPH_SO": "$(location fbsource//third-party/arm-ml-emulation-layer/v0.9.0/src:libVkLayer_Graph)",
                "EMULATION_LAYER_TENSOR_JSON": "$(location fbsource//third-party/arm-ml-emulation-layer/v0.9.0/src:VkLayer_Tensor_json)",
                "EMULATION_LAYER_GRAPH_JSON": "$(location fbsource//third-party/arm-ml-emulation-layer/v0.9.0/src:VkLayer_Graph_json)",
            } if _ENABLE_VGF else {}),
            preload_deps = [
                "//executorch/kernels/quantized:custom_ops_generated_lib",
            ] + ([] if runtime.is_oss or not _ENABLE_VGF else [
                "fbsource//third-party/khronos:vulkan",
                "//executorch/backends/arm/runtime:vgf_backend",
            ]),
            deps = [
                "//executorch/backends/arm/test:arm_tester" if runtime.is_oss else "//executorch/backends/arm/test/tester/fb:arm_tester_fb",
                "//executorch/backends/arm/test:conftest",
                "//executorch/backends/arm/test:mxfp_test_common",
                "//executorch/backends/arm/test/misc:dw_convs_shared_weights_module",
                "//executorch/backends/arm:ao_ext",
                "//executorch/backends/arm:ethosu",
                "//executorch/backends/arm/tosa:compile_spec",
                "//executorch/backends/arm/tosa:partitioner",
                "//executorch/backends/arm:vgf",
                "//executorch/backends/test:graph_builder",
                "//executorch/backends/test:program_builder",
                "//executorch/exir:lib",
                "fbsource//third-party/pypi/pytest:pytest",
                "fbsource//third-party/pypi/parameterized:parameterized",
                "fbsource//third-party/tosa_tools:tosa_reference_model",
            ],
        )

Reverts #19969

martinlsm · 2026-06-05T10:57:55Z

Thanks @kirklandsign for providing the correct files. We reverted the PR, but here is a resubmission that contains your fixed buck2 files: #20065

martinlsm added 2 commits June 3, 2026 08:28

Arm backend: Lower MXFP Linear to TOSA

0603d37

Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Co-authored-by: Sebastian Larsson <sebastian.larsson@arm.com> Change-Id: Iab2e1cf2ed21047bbc2a7a51604b9230fe2f2819

Copilot AI review requested due to automatic review settings June 3, 2026 06:33

martinlsm requested a review from digantdesai as a code owner June 3, 2026 06:33

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026

Copilot started reviewing on behalf of martinlsm June 3, 2026 06:33 View session

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Jun 3, 2026

pytorch-bot Bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jun 3, 2026

pytorch-bot Bot added the release notes: arm Changes to the ARM backend delegate label Jun 3, 2026

Copilot AI reviewed Jun 3, 2026

View reviewed changes

zingo approved these changes Jun 3, 2026

View reviewed changes

martinlsm merged commit b63adec into pytorch:main Jun 3, 2026
471 of 478 checks passed

martinlsm deleted the lower-linear branch June 3, 2026 14:15

zingo mentioned this pull request Jun 4, 2026

Revert "Arm backend: Lower MXFP Linear to TOSA" #20047

Merged

zingo added a commit that referenced this pull request Jun 5, 2026

Revert "Arm backend: Lower MXFP Linear to TOSA" (#20047)

7f19a2e

Reverts #19969

martinlsm mentioned this pull request Jun 5, 2026

Arm backend: Lower MXFP Linear to TOSA #20065

Open

Conversation

martinlsm commented Jun 3, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19969

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

martinlsm commented Jun 3, 2026

Uh oh!

martinlsm commented Jun 3, 2026

Uh oh!

martinlsm commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

zingo left a comment

Choose a reason for hiding this comment

Uh oh!

zingo commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinlsm commented Jun 3, 2026

Uh oh!

Uh oh!

kirklandsign commented Jun 4, 2026

Uh oh!

digantdesai commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zingo commented Jun 4, 2026

Uh oh!

kirklandsign commented Jun 5, 2026

Uh oh!

martinlsm commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

martinlsm commented Jun 3, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

zingo commented Jun 3, 2026 •

edited

Loading

digantdesai commented Jun 4, 2026 •

edited

Loading