pytorch · zingo · Aug 14, 2025 · Aug 8, 2025 · Aug 11, 2025 · Aug 12, 2025
@@ -1,47 +1,74 @@
-# ExecuTorch Arm/TOSA Delegate
+# ExecuTorch Arm&reg; Delegate for TOSA devices
 
 This subtree contains the Arm(R) Delegate implementation for ExecuTorch.
 
 This delegate is structured to, over time, support a number of different Arm devices
 through an AoT flow which targets multiple Arm IP using the TOSA standard.
 
-The expected flow is:
- * torch.nn.module -> TOSA -> command_stream for fully AoT flows e.g. embedded.
- * torch.nn.module -> TOSA for flows supporting a JiT compilation step.
-
-Current backend support is being developed for TOSA to Ethos(TM)-U55/65/85 via the
-ethos-u-vela compilation stack. which follows the fully AoT flow.
-
-## Layout
+For more information on TOSA see https://www.mlplatform.org/tosa/tosa_spec.html
+
+**The expected flows are:**
+* torch.nn.module -> TOSA for development and validation of model export
+* torch.nn.module -> TOSA/VGF for flows supporting a JiT compilation step.
+* torch.nn.module -> TOSA -> command_stream for fully AoT flows e.g. embedded.
+
+**Currently device support is for:**
+* TOSA to Ethos&trade;-U55/65/85 via the ethos-u-vela compilation stack.
+  * This is cross-compiled to the appropriate target CPU
+  * There is a separate arm_executor_runner for bare-metal platforms
+* TOSA to VGF via the model-converter for devices supporting the ML SDK for Vulkan&reg;
+  * The VGF graph represents TOSA directly in a SPIR-V&trade; standardized form.
+  * As the VGF delegate runs on Vulkan, it's required to be built with the Vulkan delegate also present.
+
+**Currently supported development platforms are:**
+* For ahead of time tooling
+  * Linux aarch64
+  * Linux x86_64
+  * macOS with Apple silicon
+* Bare metal builds For the Ethos-U target and Cortex-M targets
+  * Full testing is available in tree for the Corstone&trade; FVPs
+  * This is a reference implementation for porting to silicon targets
+* Linux target support For VGF capable targets
+  * This flow re-uses the common executor_runner
+
+## Layout of key components
 
 Export:
-- `ethosu_backend.py` - Main entrypoint for the EthosUBackend. For more information see the section on
-[Arm Backend Architecture](#arm-backend-architecture). For examples of use see `executorch/examples/arm`.
-- `tosa_mapping.py` - utilities for mapping edge dialect to TOSA
-- `tosa_quant_utils.py` - utilities for mapping quantization information to TOSA encoding
+* `tosa_backend.py` - The TOSA conversion flow all other backends rely on.
+* `ethosu/backend.py` - Main entrypoint for the EthosUBackend.
+* `vgf_backend.py` - Main entrypoint for VgfBackend.
+  * For more information see the section on [Arm Backend Architecture](#arm-backend-architecture).
+* `scripts` - For the core scripts which prepare AoT dependencies such as backend compilers.
 
-Operators:
-- `node_visitor.py` - Base class for edge operator lowering
-- `op_*.py` - Edge operator lowering/serialization to TOSA
+Passes (which prepare the partitioned graphs for TOSA conversion):
+* `_passes\arm_pass_manager.py` - Pass manager. Will decide which passes need to be applied depending on the compile_spec.
+* `_passes\*_pass.py` - Compiler passes derived from ExportPass
 
-Passes:
-- `arm_pass_manager.py` - Pass manager. Will decide which passes need to be applied depending on the compile_spec.
-- `*_pass.py` - Compiler passes derived from ExportPass
+Operators (which handle mapping of operators to TOSA):
+* `operators/node_visitor.py` - Base class for edge operator lowering
+* `operators/op_*.py` - Edge operator lowering/serialization to TOSA
 
 Quantization:
-- `arm_quantizer.py` - Quantizers for Arm backend. Contains the EthosUQuantizer which inherits from the TOSAQuantizer
-- `arm_quantizer_utils.py` - Utilities for quantization
+* `quantizer/arm_quantizer.py` - Quantizers for Arm backend.
+  * Contains the EthosUQuantizer which inherits from the TOSAQuantizer
+  * Contains the VgfQuantizer which inherits from the TOSAQuantizer
+* `arm_quantizer_utils.py` - Utilities for quantization
 
 Runtime:
-- `runtime/ArmEthosUBackend.cpp` - The Arm backend implementation of the ExecuTorch runtime backend (BackendInterface) for Ethos-U
+- `runtime/ArmEthosUBackend.cpp` - The Arm delegate for Ethos-U targets
+- `runtime/VGFBackend.cpp` - The Arm delegate for VGF capable targets
+- `CMakeLists.txt` - the build configuration for both targets
 
 Other:
-- `third-party/` - Dependencies on other code - in particular the TOSA serialization_lib for compiling to TOSA and the ethos-u-core-driver for the bare-metal backend supporting Ethos-U
+- `third-party/` - Dependencies for runtime builds
 - `test/` - Unit test and test support functions
 
+
 ## Testing
 
-After a setup you can run unit tests with the test_arm_baremetal.sh script.
+The tests and related support scripts will test TOSA, Ethos-U and VGF behaviour based on the installed tools. It is expected that the relevant environment preparation has been performed as outlined in ./examples/arm/README.md.
+
+After setup you can run unit tests with the test_arm_baremetal.sh script.
 
 To run the pytests suite run
 
@@ -62,6 +89,7 @@ backends/arm/test/test_arm_baremetal.sh test_full_ethosu_fvp
 ```
 
 ## Unit tests
+
 This is the structure of the test directory
 
 ```
@@ -112,89 +140,51 @@ Please note that installing model test dependencies is a standalone process. Whe
 List of models with specific dependencies:
 - Stable Diffusion: [diffusers](https://github.com/huggingface/diffusers/tree/main)
 
-## Passes
-
-With the default passes in the Arm Ethos-U backend, assuming the model lowers fully to the
-Ethos-U, the exported program is composed of a Quantize node, Ethos-U custom delegate
-and a Dequantize node. In some circumstances, you may want to feed quantized input to the Neural
-Network straight away, e.g. if you have a camera sensor outputting (u)int8 data and keep all the
-arithmetic of the application in the int8 domain. For these cases, you can apply the
-`exir/passes/quantize_io_pass.py`. See the unit test in `executorch/backends/arm/
-test/passes/test_ioquantization_pass.py`for an example how to feed quantized inputs and
-obtain quantized outputs.
-
-
-### Code coverage
-
-To get code coverage:
-
-```
-coverage run --source=<SRC> --rcfile=backends/arm/test/.coveragerc -m pytest \
---config-file=/dev/null backends/arm/test/
-```
-
-All files in `SRC` and its child directories will be analysed for code coverage,
-unless explicitly exluded in the .coveragerc file. If using venv this might be
-under `env/lib/python<VERSION_NUMBER>/site-packages/executorch/`. To get the
-absolute path, run:
-
-```
-python -c "import executorch; print(executorch.__path__)"
-```
-
-This contains a list of paths where the source directory is located. Pick the
-one that is located in `env/lib`. If that does not work try the others. Add
-`backends/arm` to the path in `--source` to only get code coverage for the Arm
-backend.
-
-### A note on unit tests
 
-There are currently 3 ways we unit test our code.
-1. TOSA main inference. These tests are using non-quantized data and ops. Edge IR representation of the module is lowered to a TOSA flatbuffer, which is tested for numerical correcteness using the ```tosa_reference_model``` tool.
-2. TOSA base inference. Same as above, but data and ops are quantized.
-3. Ethos-U55. These tests use quantized data and ops (aka TOSA base inference). Edge IR is lowered to a TOSA flatbuffer, which is fed into the Vela compiler. Theses tests are functional tests and do not test numerical correctness, since that should be guaranteed by TOSA.
+There are currently a number of ways we unit test our code:
+1. TOSA FP. These tests are using non-quantized data and ops. Edge IR representation of the module is lowered to a TOSA flatbuffer, which is tested for numerical correcteness using the ```tosa_reference_model``` tool.
+2. TOSA INT. Same as above, but data and ops integer, and represent a quantized domain.
+3. Ethos-U. These tests use quantized data and ops (aka TOSA base inference). Edge IR is lowered to a TOSA flatbuffer, which is fed into the Vela compiler. Theses tests are functional tests and do not test numerical correctness, since that should be guaranteed by TOSA.
+4. VGF. These tests enable both FP and INT testing for the VGF/SPIR-V representation of TOSA.
 
-In order to distinguise between the different tests, the following suffixes have been added to the respective test case.
-* ```_MI``` for main inference
-* ```_BI``` for base inference
-* ```_U55_BI``` for base inference on U55
+In order to distinguise between general, and more targeted tests, you will find suffixes with FP, INT, U55, VGF, etc.
 
 ## Help & Improvements
 If you have problems or questions, or have suggestions for ways to make
 implementation and testing better, please reach out to the Arm team developing this delegate, or
-create an issue on [github](https://www.github.com/pytorch/executorch/issues).
+create an issue on [github](https://www.github.com/pytorch/executorch/issues) and add the "Partner: Arm" label.
 
 # Arm Backend Architecture
 
 The broad principle with the Arm backend implemention for ExecuTorch is to support multiple Arm devices and device configurations through a largely Homogeneous flow with maximal sharing of class logic.
-The EthosUBackend is currently the one user facing API that target the Ethos-U55 and Ethos-U85 hardware IP. It is using the TOSABackend under the hood to share code and functionality, but also to separate testing possibilities to the TOSA flow itself.
+The EthosUBackend and VgfBackend are the user facing targets available for the the Ethos-U55 and Ethos-U85 hardware IP, and VGF targets. It is using the TOSABackend under the hood to share compiler passes and legalisation, along with other code and functionality, but also to enable separate testing for the TOSA flow itself.
 
 In practice for compilation, this means that the flow goes via [Arm TOSA](https://www.mlplatform.org/tosa/tosa_spec.html) to produce a common IR and quantization behaviour compatible with our various IP, and typically, device-specific backends to further lower to a device specific binary which can happen ahead of time (within the Python development flow) or at runtime (during a JIT compilation stage).
 
-In practice for the runtime, this means we will share common runtime backend functionality, with the aim for features like debugging to be available through common tooling.
-
 
 ## Arm Backend Status and Maturity
 
-The Arm EthosU Backend should be considered a prototype quality at this point, likely subject to significant change and improvement, and with a limited coverage of functionality. We are actively developing this codebase.
+The Arm EthosU Backend should be considered reasonable quality at this point, supporting a large number of operators and major networks.
+The Arm VGF Backend should be considered of Alpha quality, likely subject to significant change and improvement, and with a limited coverage of functionality.
+We are actively developing the codebase for both targets.
 
 ## Current flows
 
-The EthosUBackend has a two stage process,
-- Compile to TOSA to rationalise the graph into known hardware support profiles. Currently this is to v1.0 TOSA INT with specific concern to a subset which gives support on Ethos-U55 and Ethos-U85, the target of the initial prototype efforts. This calls into the TOSABackend.
-- Lower via the ethos-u-vela compilation flow which takes TOSA v1.0 as an input and produces a low level commandstream for the hardware which is then passed via the delegate to the ethos-u-core-driver for direct execution.
+The Arm backends have a two stage process,
+1. Compile to TOSA to by applying FX passes and legalizing the graph into supported TOSA profiles. Currently this is to v1.0 TOSA INT/FP, this is via calls into the TOSABackend.
+1. Lower via the target compilation flow which takes TOSA v1.0 as an input and produces a lower level format for the hardware
+  * For Ethos-U this is a hardware command stream that is possible to directly execute on hardware
+  * For VGF this is a SPIR-V representation of TOSA to enable JiT compilation on the target platform
 
-The EthosUPartitioner is currenly used to ensure the operations converted are Ethos-U compatible, but will be extended to offer spec-correct TOSA Base inference and TOSA Main Inference generation in future.
+All targets provide a partitioner to enable the standard partially delegated flow offered by ExecuTorch.
 
-There is also a generic TOSABackend with accompanying TOSAPartitioner and TOSAQuantizer, which are used by the EthosUBackend and friends. The Arm TOSA Backend can be used by it's own to verify the lowering to the TOSA representation of the model (refer to the unit tests in backends/arm/test which uses the TOSA backend in the test suites).
+There is also a generic TOSABackend with accompanying TOSAPartitioner and TOSAQuantizer, these can be used directly to verify the lowering to the TOSA representation of the model (refer to the unit tests in backends/arm/test which uses the TOSA backend in the test suites).
 
 ### Controlling compilation
 
 It is possible to control the compilation flow to aid in development and debug of both networks and the code itself.
 
-Configuration of the EthosUBackend export flow is controlled by CompileSpec information (essentially used as compilation flags) to determine which of these outputs is produced. In particular this allows for use of the tosa_reference_model to run intermediate output to check for correctness and quantization accuracy without a full loop via hardware implemntation.
-
-As this is in active development see the EthosUBackend for accurate information on [compilation flags](https://github.com/pytorch/executorch/blob/29f6dc9353e90951ed3fae3c57ae416de0520067/backends/arm/arm_backend.py#L319-L324)
+Configuration of the export flow is controlled by CompileSpec information (essentially used as compilation flags) to determine which of these outputs is produced. In particular this allows for compilation flags, capturing intermediate forms during lowering, and use of the tosa_reference_model to run intermediate output to check for correctness and quantization accuracy without a full loop via hardware implemntation.
 
 ## Model specific and optional passes
 The current TOSA version does not support int64. However, int64 is commonly used in many models. In order to lower the operators with int64 inputs and/or outputs to TOSA, a few passes have been developed to handle the int64-related issues. The main idea behind these passes is to replace the uses of int64 with int32 where feasible.

@@ -0,0 +1,90 @@
+#!/usr/bin/env bash
+# Copyright 2025 Arm Limited and/or its affiliates.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Optional parameter:
+# --build_type= "Release" | "Debug" | "RelWithDebInfo"
+# --etdump      build with devtools-etdump support
+
+set -eu
+set -o pipefail
+
+script_dir=$(cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
+et_root_dir=$(cd ${script_dir}/../../.. && pwd)
+et_root_dir=$(realpath ${et_root_dir})
+setup_path_script=${et_root_dir}/examples/arm/ethos-u-scratch/setup_path.sh
+_setup_msg="please refer to ${et_root_dir}/examples/arm/setup.sh to properly install necessary tools."
+
+
+model=""
+build_path="cmake-out"
+converter="model-converter"
+
+help() {
+    echo "Usage: $(basename $0) [options]"
+    echo "Options:"
+    echo "  --model=<MODEL_FILE>    .pte model file to run"
+    echo "  --build=<BUILD_PATH>    Target to build and run for Default: ${build_path}"
+    exit 0
+}
+
+for arg in "$@"; do
+    case $arg in
+      -h|--help) help ;;
+      --model=*) model="${arg#*=}";;
+      --build_path=*) build_path="${arg#*=}";;
+      *)
+      ;;
+    esac
+done
+
+if [[ -z ${model} ]]; then echo "Model name needs to be provided"; exit 1; fi
+
+
+# Source the tools
+# This should be prepared by the setup.sh
+[[ -f ${setup_path_script} ]] \
+    || { echo "Missing ${setup_path_script}. ${_setup_msg}"; exit 1; }
+
+source ${setup_path_script}
+
+# basic checks before we get started
+hash ${converter} \
+    || { echo "Could not find ${converter} on PATH, ${_setup_msg}"; exit 1; }
+
+
+
+runner="${build_path}/executor_runner"
+
+echo "--------------------------------------------------------------------------------"
+echo "Running ${model} with ${runner}"
+echo "WARNING: The VK_ML layer driver will not provide accurate performance information"
+echo "--------------------------------------------------------------------------------"
+
+# Check if stdbuf is intalled and use stdbuf -oL together with tee below to make the output
+# go all the way to the console more directly and not be buffered
+
+if hash stdbuf 2>/dev/null; then
+    nobuf="stdbuf -oL"
+else
+    nobuf=""
+fi
+
+log_file=$(mktemp)
+
+
+${nobuf} ${runner} -model_path ${model} | tee ${log_file}
+echo "[${BASH_SOURCE[0]}] execution complete, $?"
+
+# Most of these can happen for bare metal or linx executor_runner runs.
+echo "Checking for problems in log:"
+! grep -E "^(F|E|\\[critical\\]|Hard fault.|Info: Simulation is stopping. Reason: CPU time has been exceeded.).*$" ${log_file}
+if [ $? != 0 ]; then
+    echo "Found ERROR"
+    rm "${log_file}"
+    exit 1
+fi
+echo "No problems found!"
+rm "${log_file}"
@@ -95,4 +95,4 @@ Finally, run the elf file on FVP using the script
 `executorch/backends/arm/scripts/run_fvp.sh --elf=executorch/mv2_arm_ethos_u55/cmake-out/arm_executor_runner --target=ethos-u55-128`.
 
 ## See Also
-- [Arm Ethos-U Backend Tutorial](tutorial-arm-ethos-u.md)
+- [Arm Ethos-U Backend Tutorial](tutorial-arm.md)
@@ -148,7 +148,7 @@ using-executorch-faqs
 
 Building an ExecuTorch Android Demo App <https://github.com/pytorch-labs/executorch-examples/tree/main/dl3/android/DeepLabV3Demo#executorch-android-demo-app>
 Building an ExecuTorch iOS Demo App <https://github.com/pytorch-labs/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo>
-tutorial-arm-ethos-u.md
+tutorial-arm.md
 ```
 
 ```{toctree}