From 52bb1a413d863b3acad0d54dbcec19629a2bfaa2 Mon Sep 17 00:00:00 2001 From: Agrima Khare Date: Thu, 28 Aug 2025 09:34:40 +0100 Subject: [PATCH 1/3] Arm Backend: Create backends-arm-vgf.md Signed-off-by: Agrima Khare Change-Id: I9293be92b69f9ffd8c2060c4b1bb586e0756a2ac --- docs/source/backends-arm-vgf.md | 204 ++++++++++++++++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 docs/source/backends-arm-vgf.md diff --git a/docs/source/backends-arm-vgf.md b/docs/source/backends-arm-vgf.md new file mode 100644 index 00000000000..97d7bf193e3 --- /dev/null +++ b/docs/source/backends-arm-vgf.md @@ -0,0 +1,204 @@ +# Arm® VGF Backend + +The Arm VGF backend is the ExecuTorch solution for lowering PyTorch models to VGF compatible hardware. +It leverages the TOSA operator set and the [ML SDK for Vulkan®](https://github.com/arm/ai-ml-sdk-for-vulkan?tab=readme-ov-file) to produce a .PTE file. +The VGF backend also supports execution from a .PTE file and provides functionality to extract the corresponding VGF file for integration into various applications. + +## Features + +- Wide operator support for delegating large parts of models to the VGF target. +- A quantizer that optimizes quantization for the VGF target. + +## Target Requirements +The target system must include ML SDK for Vulkan and a Vulkan driver with Vulkan API >= 1.3. + +## Development Requirements + +```{tip} +All requirements can be downloaded using `examples/arm/setup.sh --enable-mlsdk-deps --disable-ethos-u-deps` and added to the path using +`source examples/arm/ethos-u-scratch/setup_path.sh` +``` + +For the AOT flow, compilation of a model to `.pte` format using the VGF backend, the requirements are: +- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR. +- [ML SDK Model Converter](https://github.com/arm/ai-ml-sdk-model-converter) for converting TOSA flatbuffers to VGF files. + +And for building and running your application using the generic executor_runner: +- [Vulkan API](https://www.vulkan.org) should be set up locally for GPU execution support. +- [ML Emulation Layer for Vulkan](https://github.com/arm/ai-ml-emulation-layer-for-vulkan) for testing on Vulkan API. + +## Using the Arm VGF Backend +The [VGF Minimal Example](https://github.com/pytorch/executorch/blob/main/examples/arm/vgf_minimal_example.ipynb) demonstrates how to lower a module using the VGF backend. + +The main configuration point for the lowering is the `VgfCompileSpec` consumed by the partitioner and quantizer. +The full user-facing API is documented below. + +```python +class VgfCompileSpec(tosa_spec: executorch.backends.arm.tosa.specification.TosaSpecification | str | None = None, compiler_flags: list[str] | None = None) +``` +Compile spec for VGF compatible targets. + +Attributes: +- **tosa_spec**: A TosaSpecification, or a string specifying a TosaSpecification. +- **compiler_flags**: Extra compiler flags for converter_backend. + +```python +def VgfCompileSpec.dump_debug_info(self, debug_mode: executorch.backends.arm.common.arm_compile_spec.ArmCompileSpec.DebugMode | None): +``` +Dump debugging information into the intermediates path. + +```python +def VgfCompileSpec.dump_intermediate_artifacts_to(self, output_path: str | None): +``` +Sets a path for dumping intermediate results during lowering such as tosa and pte. + +```python +def VgfCompileSpec.get_intermediate_path(self) -> str | None: +``` +Returns the path for dumping intermediate results during lowering such as tosa and pte. + +```python +def VgfCompileSpec.get_output_format() -> str: +``` +Returns a constant string that is the output format of the class. + + + +### Partitioner API +```python +class VgfPartitioner(compile_spec: executorch.backends.arm.vgf.compile_spec.VgfCompileSpec, additional_checks: Optional[Sequence[torch.fx.passes.operator_support.OperatorSupportBase]] = None) -> None +``` +Partitions subgraphs supported by the Arm Vgf backend. + +Attributes: +- **compile_spec**:List of CompileSpec objects for Vgf backend. +- **additional_checks**: Optional sequence of additional operator support checks. + +```python +def VgfPartitioner.ops_to_not_decompose(self, ep: torch.export.exported_program.ExportedProgram) -> Tuple[List[torch._ops.OpOverload], Optional[Callable[[torch.fx.node.Node], bool]]]: +``` +Returns a list of operator names that should not be decomposed. When these ops are +registered and the `to_backend` is invoked through to_edge_transform_and_lower it will be +guaranteed that the program that the backend receives will not have any of these ops +decomposed. + +Returns: +- **List[torch._ops.OpOverload]**: a list of operator names that should not be decomposed. +- **Optional[Callable[[torch.fx.Node], bool]]]**: an optional callable, acting as a filter, that users can provide + which will be called for each node in the graph that users can use as a filter for certain + nodes that should be continued to be decomposed even though the op they correspond to is + in the list returned by ops_to_not_decompose. + +```python +def VgfPartitioner.partition(self, exported_program: torch.export.exported_program.ExportedProgram) -> executorch.exir.backend.partitioner.PartitionResult: +``` +Returns the input exported program with newly created sub-Modules encapsulating +specific portions of the input "tagged" for delegation. + +The specific implementation is free to decide how existing computation in the +input exported program should be delegated to one or even more than one specific +backends. + +The contract is stringent in that: +* Each node that is intended to be delegated must be tagged +* No change in the original input exported program (ExportedProgram) representation can take +place other than adding sub-Modules for encapsulating existing portions of the +input exported program and the associated metadata for tagging. + +Args: +- **exported_program**: An ExportedProgram in Edge dialect to be partitioned for backend delegation. + +Returns: +- **PartitionResult**: includes the tagged graph and the delegation spec to indicate what backend_id and compile_spec is used for each node and the tag created by the backend developers. + + + +### Quantizer +The VGF quantizer supports [Post Training Quantization (PT2E)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) +and [Quantization-Aware Training (QAT)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_qat.html) quantization. + +Currently the symmetric `int8` config defined by `executorch.backends.arm.quantizer.arm_quantizer.get_symmetric_quantization_config` is +the main config available to use with the VGF quantizer. + +```python +class VgfQuantizer(compile_spec: 'VgfCompileSpec') -> 'None' +``` +Quantizer supported by the Arm Vgf backend. + +Attributes: +- **compile_spec**: VgfCompileSpec, specifies the compilation configuration. + +```python +def VgfQuantizer.set_global(self, quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer': +``` +Set quantization_config for submodules that are not already annotated by name or type filters. + +Args: +- **quantization_config**: Specifies the quantization scheme for the weights and activations + +```python +def VgfQuantizer.set_io(self, quantization_config): +``` +Set quantization_config for input and output nodes. + +Args: +- **quantization_config**: Specifies the quantization scheme for the weights and activations + +```python +def VgfQuantizer.set_module_name(self, module_name: 'str', quantization_config: 'Optional[QuantizationConfig]') -> 'TOSAQuantizer': +``` +Set quantization_config for a submodule with name: `module_name`, for example: +quantizer.set_module_name("blocks.sub"), it will quantize all supported operator/operator +patterns in the submodule with this module name with the given `quantization_config` + +Args: +- **module_name**: Name of the module to which the quantization_config is set. +- **quantization_config**: Specifies the quantization scheme for the weights and activations. + +Returns: +- **TOSAQuantizer**: The quantizer instance with the updated module name configuration + +```python +def VgfQuantizer.set_module_type(self, module_type: 'Callable', quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer': +``` +Set quantization_config for a submodule with type: `module_type`, for example: +quantizer.set_module_name(Sub) or quantizer.set_module_name(nn.Linear), it will quantize all supported operator/operator +patterns in the submodule with this module type with the given `quantization_config` + +Args: +- **module_type**: Type of module to which the quantization_config is set. +- **quantization_config**: Specifies the quantization scheme for the weights and activations. + +Returns: +- **TOSAQuantizer**: The quantizer instance with the updated module type configuration + +```python +def VgfQuantizer.transform_for_annotation(self, model: 'GraphModule') -> 'GraphModule': +``` +An initial pass for transforming the graph to prepare it for annotation. +Currently transforms scalar values to tensor attributes. + +Args: +- **model**: Module that is transformed. + +Returns: + The transformed model. + + +### Supported Quantization Schemes +The quantization schemes supported by the VGF Backend are: +- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow). + - Supports both static and dynamic activations + - Supports per-channel and per-tensor schemes + +Weight-only quantization is not currently supported on VGF + +## Runtime Integration + +The VGF backend can use the default ExecuTorch runner. The steps required for building and running it are explained in the previously mentioned [VGF Backend Tutorial](https://docs.pytorch.org/executorch/stable/tutorial-arm-ethos-u.html). +The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets. + +### VGF Adapter for Model Explorer + +The [VGF Adapter for Model Explorer](https://github.com/arm/vgf-adapter-model-explorer) enables visualization of +VGF files and can be useful for debugging. From 5daff4730f58ebc9dcdac83f15416aa92d460e47 Mon Sep 17 00:00:00 2001 From: Agrima Khare Date: Fri, 12 Sep 2025 20:52:52 +0100 Subject: [PATCH 2/3] Arm Backend: Add vgf docs to overview page Signed-off-by: Agrima Khare Change-Id: Ic6cc8a18b110c7414da32d31177473aca1e706f2 --- docs/source/backends-overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/backends-overview.md b/docs/source/backends-overview.md index dd3aa0354bc..6481fab87c7 100644 --- a/docs/source/backends-overview.md +++ b/docs/source/backends-overview.md @@ -17,4 +17,5 @@ Commonly used hardware backends are listed below. For mobile, consider using XNN - [Qualcomm NPU](backends-qualcomm.md) - [MediaTek NPU](backends-mediatek.md) - [Arm Ethos-U NPU](backends-arm-ethos-u.md) +- [Arm VGF](backends-arm-vgf.md) - [Cadence DSP](backends-cadence.md) From 3acd431ba98ca4dd917f9b7c0aff979a0fab27a0 Mon Sep 17 00:00:00 2001 From: Agrima Khare Date: Fri, 12 Sep 2025 21:14:05 +0100 Subject: [PATCH 3/3] Arm Backend: Add vgf docs to index.md Signed-off-by: Agrima Khare Change-Id: I33c79c21c0b566d90bb03a7f85b5d9711fd42d9f --- docs/source/backends-overview.md | 4 ++-- docs/source/index.md | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/backends-overview.md b/docs/source/backends-overview.md index 6481fab87c7..c83ace26853 100644 --- a/docs/source/backends-overview.md +++ b/docs/source/backends-overview.md @@ -16,6 +16,6 @@ Commonly used hardware backends are listed below. For mobile, consider using XNN - [Vulkan (Android GPU)](backends-vulkan.md) - [Qualcomm NPU](backends-qualcomm.md) - [MediaTek NPU](backends-mediatek.md) -- [Arm Ethos-U NPU](backends-arm-ethos-u.md) -- [Arm VGF](backends-arm-vgf.md) +- [ARM Ethos-U NPU](backends-arm-ethos-u.md) +- [ARM VGF](backends-arm-vgf.md) - [Cadence DSP](backends-cadence.md) diff --git a/docs/source/index.md b/docs/source/index.md index d0c9142cf4a..8afe4e85d78 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -51,6 +51,7 @@ ExecuTorch provides support for: - [MPS](backends-mps) - [Vulkan](backends-vulkan) - [ARM Ethos-U](backends-arm-ethos-u) +- [ARM VGF](backends-arm-vgf) - [Qualcomm](backends-qualcomm) - [MediaTek](backends-mediatek) - [Cadence](backends-cadence)