diff --git a/backends/arm/scripts/TOSA_minimal_example.ipynb b/backends/arm/scripts/TOSA_minimal_example.ipynb index b79780c6a07..77f1c782792 100644 --- a/backends/arm/scripts/TOSA_minimal_example.ipynb +++ b/backends/arm/scripts/TOSA_minimal_example.ipynb @@ -1,266 +1,271 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2025 Arm Limited and/or its affiliates.\n", - "#\n", - "# This source code is licensed under the BSD-style license found in the\n", - "# LICENSE file in the root directory of this source tree." - ] + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2025 Arm Limited and/or its affiliates.\n", + "#\n", + "# This source code is licensed under the BSD-style license found in the\n", + "# LICENSE file in the root directory of this source tree." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# TOSA delegate flow example\n", + "\n", + "This guide walks through the complete process of running a module on Arm TOSA using ExecuTorch, with a focus on TOSA lowering exploration. \n", + "This workflow is intended for validating and experimenting with model lowering to TOSA, and is aimed at contributors and developers, rather than production deployment.\n", + "It’s important to note that the compilation flow and passes applied can vary based on the target, so this flow does not necessarily produce TOSA flatbuffers and PTE files which are optimal (or even compatible) with any one target.\n", + "If something is not working for you, please raise a GitHub issue and tag Arm.\n", + "\n", + "Before you begin:\n", + "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", + "2. Install Arm TOSA dependencies using `examples/arm/setup.sh --disable-ethos-u-deps`\n", + "\n", + "With all commands executed from the base `executorch` folder.\n", + "\n", + "\n", + "\n", + "*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AOT Flow\n", + "\n", + "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "print(torch.__version__)\n", + "\n", + "class Add(torch.nn.Module):\n", + " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", + " return x + y\n", + "\n", + "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", + "\n", + "model = Add()\n", + "model = model.eval()\n", + "exported_program = torch.export.export(model, example_inputs)\n", + "# Use check_guards=False to avoid creating _guards_fn modules that contain call_module nodes\n", + "# which are not supported by ExecuTorch ARM passes during quantization\n", + "graph_module = exported_program.module(check_guards=False)\n", + "\n", + "_ = graph_module.print_readable()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## TOSA backend supports both INT and FP targets.\n", + "\n", + "To lower the graph_module for FP targets using the TOSA backend, we run it through the default FP lowering pipeline.\n", + "\n", + "FP lowering can be customized for different subgraphs; the sequence shown here is the recommended workflow for TOSA. Because we are staying in floating-point precision, no calibration with example inputs is required.\n", + "\n", + "If you print the module again, you will see that nodes are left in FP form (or annotated with any necessary casts) without any quantize/dequantize wrappers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.tosa.compile_spec import TosaCompileSpec\n", + "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", + "from pathlib import Path\n", + "\n", + "target = \"TOSA-1.0+FP\"\n", + "base_name = \"tosa_simple_example\"\n", + "cwd_dir = Path.cwd()\n", + "\n", + "# Create a compilation spec describing the target for configuring the quantizer\n", + "# Dump intermediate artifacts (in this case TOSA flat buffers) to specified location\n", + "compile_spec = TosaCompileSpec(target).dump_intermediate_artifacts_to(str(cwd_dir / base_name))\n", + "\n", + "_ = graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the quantized_graph_module\n", + "lowered_exported_program = torch.export.export(graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To lower the graph_module for INT targets using the TOSA backend, we apply the arm_quantizer.\n", + "\n", + "Quantization can be performed in various ways and tailored to different subgraphs; the sequence shown here represents the recommended workflow for TOSA.\n", + "\n", + "This step also requires calibrating the module with representative inputs.\n", + "\n", + "If you print the module again, you’ll see that each node is now wrapped in quantization/dequantization nodes that embed the calculated quantization parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.tosa.compile_spec import TosaCompileSpec\n", + "from executorch.backends.arm.quantizer import (\n", + " TOSAQuantizer,\n", + " get_symmetric_quantization_config,\n", + ")\n", + "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", + "from pathlib import Path\n", + "\n", + "target = \"TOSA-1.0+INT\"\n", + "base_name = \"tosa_simple_example\"\n", + "cwd_dir = Path.cwd()\n", + "\n", + "# Create a compilation spec describing the target for configuring the quantizer\n", + "# Dump intermediate artifacts (in this case TOSA flat buffers) to specified location\n", + "compile_spec = TosaCompileSpec(target).dump_intermediate_artifacts_to(str(cwd_dir / base_name))\n", + "\n", + "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", + "quantizer = TOSAQuantizer(compile_spec)\n", + "operator_config = get_symmetric_quantization_config()\n", + "quantizer.set_global(operator_config)\n", + "\n", + "# Post training quantization\n", + "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", + "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", + "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", + "\n", + "_ = quantized_graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the quantized_graph_module\n", + "lowered_exported_program = torch.export.export(quantized_graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The lowering in the TOSABackend happens in four steps:\n", + "\n", + "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", + "2. **Partitioning**: Find subgraphs which are supported for running on TOSA\n", + "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the TOSA subgraph(s) compatible with TOSA operator set\n", + "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", + "Step 4 also prints a Network summary for each processed subgraph.\n", + "\n", + "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.tosa.partitioner import TOSAPartitioner\n", + "from executorch.exir import (\n", + " EdgeCompileConfig,\n", + " ExecutorchBackendConfig,\n", + " to_edge_transform_and_lower,\n", + ")\n", + "from executorch.extension.export_util.utils import save_pte_program\n", + "\n", + "# Create partitioner from compile spec\n", + "partitioner = TOSAPartitioner(compile_spec)\n", + "\n", + "# Lower the exported program to the TOSA backend\n", + "edge_program_manager = to_edge_transform_and_lower(\n", + " lowered_exported_program,\n", + " partitioner=[partitioner],\n", + " compile_config=EdgeCompileConfig(\n", + " _check_ir_validity=False,\n", + " ),\n", + " )\n", + "\n", + "# Convert edge program to executorch\n", + "executorch_program_manager = edge_program_manager.to_executorch(\n", + " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", + " )\n", + "\n", + "executorch_program_manager.exported_program().module().print_readable()\n", + "\n", + "# Save pte file\n", + "pte_name = base_name + \".pte\"\n", + "pte_path = cwd_dir / base_name / pte_name\n", + "save_pte_program(executorch_program_manager, str(pte_path))\n", + "assert pte_path.exists(), \"Build failed; no .pte-file found\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use TOSA reference model to verify TOSA graph\n", + "\n", + "After the AOT compilation flow is done, the resulting lowered TOSA graph can be verified using the TOSA reference model tool." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess\n", + "import tosa_reference_model as reference_model\n", + "from executorch.backends.arm.test.runner_utils import TosaReferenceModelDispatch\n", + "\n", + "# Run TOSA graph through reference model using sample inputs\n", + "with TosaReferenceModelDispatch():\n", + " executorch_program_manager.exported_program().module()(*example_inputs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "fileHeader": "", + "fileUid": "d42c90db-849f-43d0-b1ae-a7fc42fe222b", + "isAdHoc": false, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# TOSA delegate flow example\n", - "\n", - "This guide walks through the complete process of running a module on Arm TOSA using ExecuTorch, with a focus on TOSA lowering exploration. \n", - "This workflow is intended for validating and experimenting with model lowering to TOSA, and is aimed at contributors and developers, rather than production deployment.\n", - "It’s important to note that the compilation flow and passes applied can vary based on the target, so this flow does not necessarily produce TOSA flatbuffers and PTE files which are optimal (or even compatible) with any one target.\n", - "If something is not working for you, please raise a GitHub issue and tag Arm.\n", - "\n", - "Before you begin:\n", - "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", - "2. Install Arm TOSA dependencies using `examples/arm/setup.sh --disable-ethos-u-deps`\n", - "\n", - "With all commands executed from the base `executorch` folder.\n", - "\n", - "\n", - "\n", - "*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## AOT Flow\n", - "\n", - "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "\n", - "print(torch.__version__)\n", - "\n", - "class Add(torch.nn.Module):\n", - " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", - " return x + y\n", - "\n", - "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", - "\n", - "model = Add()\n", - "model = model.eval()\n", - "exported_program = torch.export.export(model, example_inputs)\n", - "graph_module = exported_program.module()\n", - "\n", - "_ = graph_module.print_readable()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TOSA backend supports both INT and FP targets.\n", - "\n", - "To lower the graph_module for FP targets using the TOSA backend, we run it through the default FP lowering pipeline.\n", - "\n", - "FP lowering can be customized for different subgraphs; the sequence shown here is the recommended workflow for TOSA. Because we are staying in floating-point precision, no calibration with example inputs is required.\n", - "\n", - "If you print the module again, you will see that nodes are left in FP form (or annotated with any necessary casts) without any quantize/dequantize wrappers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.tosa.compile_spec import TosaCompileSpec\n", - "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", - "from pathlib import Path\n", - "\n", - "target = \"TOSA-1.0+FP\"\n", - "base_name = \"tosa_simple_example\"\n", - "cwd_dir = Path.cwd()\n", - "\n", - "# Create a compilation spec describing the target for configuring the quantizer\n", - "# Dump intermediate artifacts (in this case TOSA flat buffers) to specified location\n", - "compile_spec = TosaCompileSpec(target).dump_intermediate_artifacts_to(str(cwd_dir / base_name))\n", - "\n", - "_ = graph_module.print_readable()\n", - "\n", - "# Create a new exported program using the quantized_graph_module\n", - "lowered_exported_program = torch.export.export(graph_module, example_inputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To lower the graph_module for INT targets using the TOSA backend, we apply the arm_quantizer.\n", - "\n", - "Quantization can be performed in various ways and tailored to different subgraphs; the sequence shown here represents the recommended workflow for TOSA.\n", - "\n", - "This step also requires calibrating the module with representative inputs.\n", - "\n", - "If you print the module again, you’ll see that each node is now wrapped in quantization/dequantization nodes that embed the calculated quantization parameters." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.tosa.compile_spec import TosaCompileSpec\n", - "from executorch.backends.arm.quantizer import (\n", - " TOSAQuantizer,\n", - " get_symmetric_quantization_config,\n", - ")\n", - "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", - "from pathlib import Path\n", - "\n", - "target = \"TOSA-1.0+INT\"\n", - "base_name = \"tosa_simple_example\"\n", - "cwd_dir = Path.cwd()\n", - "\n", - "# Create a compilation spec describing the target for configuring the quantizer\n", - "# Dump intermediate artifacts (in this case TOSA flat buffers) to specified location\n", - "compile_spec = TosaCompileSpec(target).dump_intermediate_artifacts_to(str(cwd_dir / base_name))\n", - "\n", - "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", - "quantizer = TOSAQuantizer(compile_spec)\n", - "operator_config = get_symmetric_quantization_config()\n", - "quantizer.set_global(operator_config)\n", - "\n", - "# Post training quantization\n", - "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", - "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", - "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", - "\n", - "_ = quantized_graph_module.print_readable()\n", - "\n", - "# Create a new exported program using the quantized_graph_module\n", - "lowered_exported_program = torch.export.export(quantized_graph_module, example_inputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The lowering in the TOSABackend happens in four steps:\n", - "\n", - "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", - "2. **Partitioning**: Find subgraphs which are supported for running on TOSA\n", - "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the TOSA subgraph(s) compatible with TOSA operator set\n", - "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", - "Step 4 also prints a Network summary for each processed subgraph.\n", - "\n", - "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.tosa.partitioner import TOSAPartitioner\n", - "from executorch.exir import (\n", - " EdgeCompileConfig,\n", - " ExecutorchBackendConfig,\n", - " to_edge_transform_and_lower,\n", - ")\n", - "from executorch.extension.export_util.utils import save_pte_program\n", - "\n", - "# Create partitioner from compile spec\n", - "partitioner = TOSAPartitioner(compile_spec)\n", - "\n", - "# Lower the exported program to the TOSA backend\n", - "edge_program_manager = to_edge_transform_and_lower(\n", - " lowered_exported_program,\n", - " partitioner=[partitioner],\n", - " compile_config=EdgeCompileConfig(\n", - " _check_ir_validity=False,\n", - " ),\n", - " )\n", - "\n", - "# Convert edge program to executorch\n", - "executorch_program_manager = edge_program_manager.to_executorch(\n", - " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", - " )\n", - "\n", - "executorch_program_manager.exported_program().module().print_readable()\n", - "\n", - "# Save pte file\n", - "pte_name = base_name + \".pte\"\n", - "pte_path = cwd_dir / base_name / pte_name\n", - "save_pte_program(executorch_program_manager, str(pte_path))\n", - "assert pte_path.exists(), \"Build failed; no .pte-file found\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Use TOSA reference model to verify TOSA graph\n", - "\n", - "After the AOT compilation flow is done, the resulting lowered TOSA graph can be verified using the TOSA reference model tool." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import subprocess\n", - "import tosa_reference_model as reference_model\n", - "from executorch.backends.arm.test.runner_utils import TosaReferenceModelDispatch\n", - "\n", - "# Run TOSA graph through reference model using sample inputs\n", - "with TosaReferenceModelDispatch():\n", - " executorch_program_manager.exported_program().module()(*example_inputs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.16" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 2 } diff --git a/docs/source/tutorial-arm-ethos-u.md b/docs/source/tutorial-arm-ethos-u.md index b856e7ade75..5418fbda5c8 100644 --- a/docs/source/tutorial-arm-ethos-u.md +++ b/docs/source/tutorial-arm-ethos-u.md @@ -85,7 +85,9 @@ example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1)) model = Add() model = model.eval() exported_program = torch.export.export(model, example_inputs) -graph_module = exported_program.module() +# Use check_guards=False to avoid creating _guards_fn modules that contain call_module nodes +# which are not supported by ExecuTorch ARM passes during quantization +graph_module = exported_program.module(check_guards=False) from executorch.backends.arm.ethosu import EthosUCompileSpec @@ -217,4 +219,4 @@ If you encountered any bugs or issues following this tutorial please file a bug/ ``` Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates). -``` \ No newline at end of file +``` diff --git a/docs/source/tutorial-arm-vgf.md b/docs/source/tutorial-arm-vgf.md index 5c723053e63..17fadb8c980 100644 --- a/docs/source/tutorial-arm-vgf.md +++ b/docs/source/tutorial-arm-vgf.md @@ -89,7 +89,9 @@ example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1)) model = Add() model = model.eval() exported_program = torch.export.export_for_training(model, example_inputs) -graph_module = exported_program.module() +# Use check_guards=False to avoid creating _guards_fn modules that contain call_module nodes +# which are not supported by ExecuTorch ARM passes during quantization +graph_module = exported_program.module(check_guards=False) from executorch.backends.arm.vgf import VgfCompileSpec @@ -217,4 +219,4 @@ If you encountered any bugs or issues following this tutorial please file a bug/ ``` Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates). -``` \ No newline at end of file +``` diff --git a/examples/arm/ethos_u_minimal_example.ipynb b/examples/arm/ethos_u_minimal_example.ipynb index dc8ea7193aa..f2cc06706f4 100644 --- a/examples/arm/ethos_u_minimal_example.ipynb +++ b/examples/arm/ethos_u_minimal_example.ipynb @@ -1,263 +1,268 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2025 Arm Limited and/or its affiliates.\n", - "#\n", - "# This source code is licensed under the BSD-style license found in the\n", - "# LICENSE file in the root directory of this source tree." - ] + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2025 Arm Limited and/or its affiliates.\n", + "#\n", + "# This source code is licensed under the BSD-style license found in the\n", + "# LICENSE file in the root directory of this source tree." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ethos-U delegate flow example\n", + "\n", + "This guide demonstrates the full flow for running a module on Arm Ethos-U55 using ExecuTorch.\n", + "Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", + "\n", + "Before you begin:\n", + "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", + "2. Install Arm cross-compilation toolchain and simulators using `./examples/arm/setup.sh --i-agree-to-the-contained-eula`\n", + "\n", + "With all commands executed from the base `executorch` folder.\n", + "\n", + "\n", + "\n", + "*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AOT Flow\n", + "\n", + "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "class Add(torch.nn.Module):\n", + " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", + " return x + y\n", + "\n", + "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", + "\n", + "model = Add()\n", + "model = model.eval()\n", + "exported_program = torch.export.export(model, example_inputs)\n", + "# Use check_guards=False to avoid creating _guards_fn modules that contain call_module nodes\n", + "# which are not supported by ExecuTorch ARM passes during quantization\n", + "graph_module = exported_program.module(check_guards=False)\n", + "\n", + "_ = graph_module.print_readable()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n", + "\n", + "Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters.\n", + "\n", + "With the default passes for the Arm Ethos-U backend, assuming the model lowers fully to the Ethos-U, the exported program is composed of a Quantize node, Ethos-U custom delegate and a Dequantize node. In some circumstances, you may want to feed quantized input to the Neural Network straight away, e.g. if you have a camera sensor outputting (u)int8 data and keep all the arithmetic of the application in the int8 domain. For these cases, you can apply the `exir/passes/quantize_io_pass.py`. See the unit test in `backends/arm/test/passes/test_ioquantization_pass.py`for an example how to feed quantized inputs and obtain quantized outputs.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.ethosu import EthosUCompileSpec\n", + "from executorch.backends.arm.quantizer import (\n", + " EthosUQuantizer,\n", + " get_symmetric_quantization_config,\n", + ")\n", + "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", + "\n", + "# Create a compilation spec describing the target for configuring the quantizer\n", + "# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an\n", + "# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n", + "compile_spec = EthosUCompileSpec(\n", + " target=\"ethos-u55-128\",\n", + " system_config=\"Ethos_U55_High_End_Embedded\",\n", + " memory_mode=\"Shared_Sram\",\n", + " extra_flags=[\"--output-format=raw\", \"--debug-force-regor\"]\n", + " )\n", + "\n", + "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", + "quantizer = EthosUQuantizer(compile_spec)\n", + "operator_config = get_symmetric_quantization_config()\n", + "quantizer.set_global(operator_config)\n", + "\n", + "# Post training quantization\n", + "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", + "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", + "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", + "\n", + "_ = quantized_graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the quantized_graph_module\n", + "quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The lowering in the EthosUBackend happens in five steps:\n", + "\n", + "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", + "2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n", + "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n", + "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", + "5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n", + "Step 5 also prints a Network summary for each processed subgraph.\n", + "\n", + "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.ethosu import EthosUPartitioner\n", + "from executorch.exir import (\n", + " EdgeCompileConfig,\n", + " ExecutorchBackendConfig,\n", + " to_edge_transform_and_lower,\n", + ")\n", + "from executorch.extension.export_util.utils import save_pte_program\n", + "\n", + "# Create partitioner from compile spec\n", + "partitioner = EthosUPartitioner(compile_spec)\n", + "\n", + "# Lower the exported program to the Ethos-U backend\n", + "edge_program_manager = to_edge_transform_and_lower(\n", + " quantized_exported_program,\n", + " partitioner=[partitioner],\n", + " compile_config=EdgeCompileConfig(\n", + " _check_ir_validity=False,\n", + " ),\n", + " )\n", + "\n", + "# Convert edge program to executorch\n", + "executorch_program_manager = edge_program_manager.to_executorch(\n", + " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", + " )\n", + "\n", + "_ = executorch_program_manager.exported_program().module().print_readable()\n", + "\n", + "# Save pte file\n", + "save_pte_program(executorch_program_manager, \"ethos_u_minimal_example.pte\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build executor runtime\n", + "\n", + "After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in two steps:\n", + "1. Build and install the executorch libraries and EthosUDelegate.\n", + "2. Build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "# Ensure the arm-none-eabi-gcc toolchain and FVP:s are available on $PATH\n", + "source ethos-u-scratch/setup_path.sh\n", + "\n", + "# Build executorch libraries cross-compiled for arm baremetal to executorch/cmake-out-arm\n", + "cmake --preset arm-baremetal \\\n", + "-DCMAKE_BUILD_TYPE=Release \\\n", + "-B../../cmake-out-arm ../..\n", + "cmake --build ../../cmake-out-arm --target install -j$(nproc)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "source ethos-u-scratch/setup_path.sh\n", + "\n", + "# Build example executor runner application to examples/arm/ethos_u_minimal_example\n", + "cmake -DCMAKE_TOOLCHAIN_FILE=$(pwd)/ethos-u-setup/arm-none-eabi-gcc.cmake \\\n", + " -DCMAKE_BUILD_TYPE=Release \\\n", + " -DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \\\n", + " -DTARGET_CPU=cortex-m55 \\\n", + " -DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \\\n", + " -DMEMORY_MODE=Shared_Sram \\\n", + " -DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \\\n", + " -Bethos_u_minimal_example \\\n", + " executor_runner\n", + "cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run on simulated model\n", + "\n", + "We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "source ethos-u-scratch/setup_path.sh\n", + "\n", + "# Run the example\n", + "../../backends/arm/scripts/run_fvp.sh --elf=ethos_u_minimal_example/arm_executor_runner --target=ethos-u55-128" + ] + } + ], + "metadata": { + "fileHeader": "", + "fileUid": "b66167f2-8f54-41d2-971d-f8356b6332a9", + "isAdHoc": false, + "kernelspec": { + "display_name": "et_env", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Ethos-U delegate flow example\n", - "\n", - "This guide demonstrates the full flow for running a module on Arm Ethos-U55 using ExecuTorch.\n", - "Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", - "\n", - "Before you begin:\n", - "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", - "2. Install Arm cross-compilation toolchain and simulators using `./examples/arm/setup.sh --i-agree-to-the-contained-eula`\n", - "\n", - "With all commands executed from the base `executorch` folder.\n", - "\n", - "\n", - "\n", - "*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## AOT Flow\n", - "\n", - "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "\n", - "class Add(torch.nn.Module):\n", - " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", - " return x + y\n", - "\n", - "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", - "\n", - "model = Add()\n", - "model = model.eval()\n", - "exported_program = torch.export.export(model, example_inputs)\n", - "graph_module = exported_program.module()\n", - "\n", - "_ = graph_module.print_readable()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n", - "\n", - "Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters.\n", - "\n", - "With the default passes for the Arm Ethos-U backend, assuming the model lowers fully to the Ethos-U, the exported program is composed of a Quantize node, Ethos-U custom delegate and a Dequantize node. In some circumstances, you may want to feed quantized input to the Neural Network straight away, e.g. if you have a camera sensor outputting (u)int8 data and keep all the arithmetic of the application in the int8 domain. For these cases, you can apply the `exir/passes/quantize_io_pass.py`. See the unit test in `backends/arm/test/passes/test_ioquantization_pass.py`for an example how to feed quantized inputs and obtain quantized outputs.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.ethosu import EthosUCompileSpec\n", - "from executorch.backends.arm.quantizer import (\n", - " EthosUQuantizer,\n", - " get_symmetric_quantization_config,\n", - ")\n", - "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", - "\n", - "# Create a compilation spec describing the target for configuring the quantizer\n", - "# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an\n", - "# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n", - "compile_spec = EthosUCompileSpec(\n", - " target=\"ethos-u55-128\",\n", - " system_config=\"Ethos_U55_High_End_Embedded\",\n", - " memory_mode=\"Shared_Sram\",\n", - " extra_flags=[\"--output-format=raw\", \"--debug-force-regor\"]\n", - " )\n", - "\n", - "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", - "quantizer = EthosUQuantizer(compile_spec)\n", - "operator_config = get_symmetric_quantization_config()\n", - "quantizer.set_global(operator_config)\n", - "\n", - "# Post training quantization\n", - "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", - "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", - "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", - "\n", - "_ = quantized_graph_module.print_readable()\n", - "\n", - "# Create a new exported program using the quantized_graph_module\n", - "quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The lowering in the EthosUBackend happens in five steps:\n", - "\n", - "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", - "2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n", - "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n", - "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", - "5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n", - "Step 5 also prints a Network summary for each processed subgraph.\n", - "\n", - "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.ethosu import EthosUPartitioner\n", - "from executorch.exir import (\n", - " EdgeCompileConfig,\n", - " ExecutorchBackendConfig,\n", - " to_edge_transform_and_lower,\n", - ")\n", - "from executorch.extension.export_util.utils import save_pte_program\n", - "\n", - "# Create partitioner from compile spec\n", - "partitioner = EthosUPartitioner(compile_spec)\n", - "\n", - "# Lower the exported program to the Ethos-U backend\n", - "edge_program_manager = to_edge_transform_and_lower(\n", - " quantized_exported_program,\n", - " partitioner=[partitioner],\n", - " compile_config=EdgeCompileConfig(\n", - " _check_ir_validity=False,\n", - " ),\n", - " )\n", - "\n", - "# Convert edge program to executorch\n", - "executorch_program_manager = edge_program_manager.to_executorch(\n", - " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", - " )\n", - "\n", - "_ = executorch_program_manager.exported_program().module().print_readable()\n", - "\n", - "# Save pte file\n", - "save_pte_program(executorch_program_manager, \"ethos_u_minimal_example.pte\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Build executor runtime\n", - "\n", - "After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in two steps:\n", - "1. Build and install the executorch libraries and EthosUDelegate.\n", - "2. Build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%bash\n", - "# Ensure the arm-none-eabi-gcc toolchain and FVP:s are available on $PATH\n", - "source ethos-u-scratch/setup_path.sh\n", - "\n", - "# Build executorch libraries cross-compiled for arm baremetal to executorch/cmake-out-arm\n", - "cmake --preset arm-baremetal \\\n", - "-DCMAKE_BUILD_TYPE=Release \\\n", - "-B../../cmake-out-arm ../..\n", - "cmake --build ../../cmake-out-arm --target install -j$(nproc) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%bash \n", - "source ethos-u-scratch/setup_path.sh\n", - "\n", - "# Build example executor runner application to examples/arm/ethos_u_minimal_example\n", - "cmake -DCMAKE_TOOLCHAIN_FILE=$(pwd)/ethos-u-setup/arm-none-eabi-gcc.cmake \\\n", - " -DCMAKE_BUILD_TYPE=Release \\\n", - " -DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \\\n", - " -DTARGET_CPU=cortex-m55 \\\n", - " -DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \\\n", - " -DMEMORY_MODE=Shared_Sram \\\n", - " -DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \\\n", - " -Bethos_u_minimal_example \\\n", - " executor_runner\n", - "cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Run on simulated model\n", - "\n", - "We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%bash \n", - "source ethos-u-scratch/setup_path.sh\n", - "\n", - "# Run the example\n", - "../../backends/arm/scripts/run_fvp.sh --elf=ethos_u_minimal_example/arm_executor_runner --target=ethos-u55-128" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "et_env", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 2 } diff --git a/examples/arm/vgf_minimal_example.ipynb b/examples/arm/vgf_minimal_example.ipynb index 36004f2c7cd..b5f869c400b 100644 --- a/examples/arm/vgf_minimal_example.ipynb +++ b/examples/arm/vgf_minimal_example.ipynb @@ -1,311 +1,316 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Copyright 2025 Arm Limited and/or its affiliates.\n", - "#\n", - "# This source code is licensed under the BSD-style license found in the\n", - "# LICENSE file in the root directory of this source tree." - ] + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2025 Arm Limited and/or its affiliates.\n", + "#\n", + "# This source code is licensed under the BSD-style license found in the\n", + "# LICENSE file in the root directory of this source tree." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# VGF Backend flow example\n", + "\n", + "This guide demonstrates the full flow for lowering a module using the VGF backend using ExecuTorch. \n", + "Tested on Linux x86_64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", + "\n", + "Before you begin:\n", + "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", + "2. Install MLSDK and Tosa using `examples/arm/setup.sh --disable-ethos-u-deps --enable-mlsdk-deps` (For further guidance, refer to https://docs.pytorch.org/executorch/main/tutorial-arm.html)\n", + "3. Export vulkan environment variables and add MLSDK components to PATH and LD_LIBRARY_PATH using `examples/arm/ethos-u-scratch/setup_path.sh`\n", + "\n", + "With all commands executed from the base `executorch` folder.\n", + "\n", + "*Some scripts in this notebook produce long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AOT Flow\n", + "\n", + "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "class Add(torch.nn.Module):\n", + " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", + " return x + y\n", + "\n", + "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", + "\n", + "model = Add()\n", + "model = model.eval()\n", + "exported_program = torch.export.export_for_training(model, example_inputs)\n", + "# Use check_guards=False to avoid creating _guards_fn modules that contain call_module nodes\n", + "# which are not supported by ExecuTorch ARM passes during quantization\n", + "graph_module = exported_program.module(check_guards=False)\n", + "\n", + "_ = graph_module.print_readable()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# VGF backend supports both INT and FP targets. \n", + "\n", + "To lower the graph_module for FP targets using the VGF backend, we run it through the default FP lowering pipeline. \n", + "\n", + "FP lowering can be customized for different subgraphs; the sequence shown here is the recommended workflow for VGF.\n", + "Because we are staying in floating-point precision, no calibration with example inputs is required. \n", + "\n", + "If you print the module again, you will see that nodes are left in FP form (or annotated with any necessary casts) without any quantize/dequantize wrappers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.vgf import VgfCompileSpec\n", + "\n", + "# Create a compilation spec describing the floating point target.\n", + "compile_spec = VgfCompileSpec(\"TOSA-1.0+FP\")\n", + "\n", + "_ = graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the graph_module\n", + "exported_program = torch.export.export(graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To lower the graph_module for INT targets using the VGF backend, we apply the arm_quantizer. \n", + "\n", + "Quantization can be performed in various ways and tailored to different subgraphs; the sequence shown here represents the recommended workflow for VGF. \n", + "\n", + "This step also requires calibrating the module with representative inputs. \n", + "\n", + "If you print the module again, you’ll see that each node is now wrapped in quantization/dequantization nodes that embed the calculated quantization parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.quantizer import (\n", + " VgfQuantizer,\n", + " get_symmetric_quantization_config,\n", + ")\n", + "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", + "\n", + "# Create a compilation spec describing the target for configuring the quantizer\n", + "compile_spec = VgfCompileSpec(\"TOSA-1.0+INT\")\n", + "\n", + "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", + "quantizer = VgfQuantizer(compile_spec)\n", + "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n", + "quantizer.set_global(operator_config)\n", + "\n", + "# Post training quantization\n", + "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", + "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", + "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", + "\n", + "_ = quantized_graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the quantized_graph_module\n", + "quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# In the example below, we will make use of the quantized graph module.\n", + "\n", + "The lowering in the VGFBackend happens in five steps:\n", + "\n", + "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", + "2. **Partitioning**: Find subgraphs that will be lowered by the VGF backend.\n", + "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the VGF subgraph(s) compatible with TOSA \n", + "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", + "5. **Compilation to VGF**: Compiles the FX GraphModule into a VGF representation using the model_converter and the previously created compile_spec. It also prints a network summary for each processed VGF partition.\n", + "\n", + "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "# Ensure the vulkan environment variables and MLSDK components are available on $PATH\n", + "source ethos-u-scratch/setup_path.sh" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from executorch.backends.arm.vgf import VgfPartitioner\n", + "from executorch.exir import (\n", + " EdgeCompileConfig,\n", + " ExecutorchBackendConfig,\n", + " to_edge_transform_and_lower,\n", + ")\n", + "from executorch.extension.export_util.utils import save_pte_program\n", + "\n", + "# Create partitioner from compile spec\n", + "partitioner = VgfPartitioner(compile_spec)\n", + "\n", + "# Lower the exported program to the VGF backend\n", + "edge_program_manager = to_edge_transform_and_lower(\n", + " quantized_exported_program,\n", + " partitioner=[partitioner],\n", + " compile_config=EdgeCompileConfig(\n", + " _check_ir_validity=False,\n", + " ),\n", + ")\n", + "\n", + "# Convert edge program to executorch\n", + "executorch_program_manager = edge_program_manager.to_executorch(\n", + " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", + ")\n", + "\n", + "executorch_program_manager.exported_program().module().print_readable()\n", + "\n", + "# Save pte file\n", + "cwd_dir = os.getcwd()\n", + "pte_base_name = \"simple_example\"\n", + "pte_name = pte_base_name + \".pte\"\n", + "pte_path = os.path.join(cwd_dir, pte_name)\n", + "save_pte_program(executorch_program_manager, pte_name)\n", + "assert os.path.exists(pte_path), \"Build failed; no .pte-file found\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build executor runtime\n", + "\n", + "### Prerequisite\n", + "With our VGF inside our PTE we now need to setup the runtime. To do this we will use the previously built MLSDK dependencies, but we will also need to setup a Vulkan environment externally to Executorch.\n", + "Plese follow https://vulkan.lunarg.com/sdk/home in order to setup. \n", + "\n", + "\n", + "After the AOT compilation flow is done, we need to build the executor_runner target. For this example the generic version will be used.\n", + "To do this, please ensure the following commands are executed before moving onto the next step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "# Ensure the vulkan environment variables and MLSDK components are available on $PATH\n", + "source ethos-u-scratch/setup_path.sh\n", + "\n", + "# Compiled programs will appear in the executorch/cmake-out directory we create here.\n", + "# Build example executor runner application to examples/arm/vgf_minimal_example\n", + "cmake \\\n", + " -DCMAKE_INSTALL_PREFIX=cmake-out \\\n", + " -DCMAKE_BUILD_TYPE=Debug \\\n", + " -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \\\n", + " -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \\\n", + " -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \\\n", + " -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \\\n", + " -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \\\n", + " -DEXECUTORCH_BUILD_XNNPACK=OFF \\\n", + " -DEXECUTORCH_BUILD_VULKAN=ON \\\n", + " -DEXECUTORCH_BUILD_VGF=ON \\\n", + " -DEXECUTORCH_ENABLE_LOGGING=ON \\\n", + " -DPYTHON_EXECUTABLE=python \\\n", + " -B../../cmake-out-vkml ../..\n", + "\n", + "cmake --build ../../cmake-out-vkml --target executor_runner" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run on VKML Emulator\n", + "\n", + "We can finally use the `backends/arm/scripts/run_vkml.sh` utility script to run the .pte end-to-end and proving out a backend’s kernel implementation. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess\n", + "\n", + "# Setup paths\n", + "et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n", + "et_dir = os.path.abspath(et_dir)\n", + "script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n", + "\n", + "args = f\"--model={pte_path}\"\n", + "subprocess.run(os.path.join(script_dir, \"run_vkml.sh\") + \" \" + args, shell=True, cwd=et_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "fileHeader": "", + "fileUid": "03cece58-b80a-42d2-99b5-73f66077d2de", + "isAdHoc": false, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# VGF Backend flow example\n", - "\n", - "This guide demonstrates the full flow for lowering a module using the VGF backend using ExecuTorch. \n", - "Tested on Linux x86_64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", - "\n", - "Before you begin:\n", - "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", - "2. Install MLSDK and Tosa using `examples/arm/setup.sh --disable-ethos-u-deps --enable-mlsdk-deps` (For further guidance, refer to https://docs.pytorch.org/executorch/main/tutorial-arm.html)\n", - "3. Export vulkan environment variables and add MLSDK components to PATH and LD_LIBRARY_PATH using `examples/arm/ethos-u-scratch/setup_path.sh`\n", - "\n", - "With all commands executed from the base `executorch` folder.\n", - "\n", - "*Some scripts in this notebook produce long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## AOT Flow\n", - "\n", - "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "\n", - "class Add(torch.nn.Module):\n", - " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", - " return x + y\n", - "\n", - "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", - "\n", - "model = Add()\n", - "model = model.eval()\n", - "exported_program = torch.export.export_for_training(model, example_inputs)\n", - "graph_module = exported_program.module()\n", - "\n", - "_ = graph_module.print_readable()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# VGF backend supports both INT and FP targets. \n", - "\n", - "To lower the graph_module for FP targets using the VGF backend, we run it through the default FP lowering pipeline. \n", - "\n", - "FP lowering can be customized for different subgraphs; the sequence shown here is the recommended workflow for VGF.\n", - "Because we are staying in floating-point precision, no calibration with example inputs is required. \n", - "\n", - "If you print the module again, you will see that nodes are left in FP form (or annotated with any necessary casts) without any quantize/dequantize wrappers.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.vgf import VgfCompileSpec\n", - "\n", - "# Create a compilation spec describing the floating point target.\n", - "compile_spec = VgfCompileSpec(\"TOSA-1.0+FP\")\n", - "\n", - "_ = graph_module.print_readable()\n", - "\n", - "# Create a new exported program using the graph_module\n", - "exported_program = torch.export.export(graph_module, example_inputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To lower the graph_module for INT targets using the VGF backend, we apply the arm_quantizer. \n", - "\n", - "Quantization can be performed in various ways and tailored to different subgraphs; the sequence shown here represents the recommended workflow for VGF. \n", - "\n", - "This step also requires calibrating the module with representative inputs. \n", - "\n", - "If you print the module again, you’ll see that each node is now wrapped in quantization/dequantization nodes that embed the calculated quantization parameters." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from executorch.backends.arm.quantizer import (\n", - " VgfQuantizer,\n", - " get_symmetric_quantization_config,\n", - ")\n", - "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n", - "\n", - "# Create a compilation spec describing the target for configuring the quantizer\n", - "compile_spec = VgfCompileSpec(\"TOSA-1.0+INT\")\n", - "\n", - "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", - "quantizer = VgfQuantizer(compile_spec)\n", - "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n", - "quantizer.set_global(operator_config)\n", - "\n", - "# Post training quantization\n", - "quantized_graph_module = prepare_pt2e(graph_module, quantizer)\n", - "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", - "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", - "\n", - "_ = quantized_graph_module.print_readable()\n", - "\n", - "# Create a new exported program using the quantized_graph_module\n", - "quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# In the example below, we will make use of the quantized graph module.\n", - "\n", - "The lowering in the VGFBackend happens in five steps:\n", - "\n", - "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", - "2. **Partitioning**: Find subgraphs that will be lowered by the VGF backend.\n", - "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the VGF subgraph(s) compatible with TOSA \n", - "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", - "5. **Compilation to VGF**: Compiles the FX GraphModule into a VGF representation using the model_converter and the previously created compile_spec. It also prints a network summary for each processed VGF partition.\n", - "\n", - "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%bash\n", - "# Ensure the vulkan environment variables and MLSDK components are available on $PATH\n", - "source ethos-u-scratch/setup_path.sh" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from executorch.backends.arm.vgf import VgfPartitioner\n", - "from executorch.exir import (\n", - " EdgeCompileConfig,\n", - " ExecutorchBackendConfig,\n", - " to_edge_transform_and_lower,\n", - ")\n", - "from executorch.extension.export_util.utils import save_pte_program\n", - "\n", - "# Create partitioner from compile spec\n", - "partitioner = VgfPartitioner(compile_spec)\n", - "\n", - "# Lower the exported program to the VGF backend\n", - "edge_program_manager = to_edge_transform_and_lower(\n", - " quantized_exported_program,\n", - " partitioner=[partitioner],\n", - " compile_config=EdgeCompileConfig(\n", - " _check_ir_validity=False,\n", - " ),\n", - ")\n", - "\n", - "# Convert edge program to executorch\n", - "executorch_program_manager = edge_program_manager.to_executorch(\n", - " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", - ")\n", - "\n", - "executorch_program_manager.exported_program().module().print_readable()\n", - "\n", - "# Save pte file\n", - "cwd_dir = os.getcwd()\n", - "pte_base_name = \"simple_example\"\n", - "pte_name = pte_base_name + \".pte\"\n", - "pte_path = os.path.join(cwd_dir, pte_name)\n", - "save_pte_program(executorch_program_manager, pte_name)\n", - "assert os.path.exists(pte_path), \"Build failed; no .pte-file found\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Build executor runtime\n", - "\n", - "### Prerequisite\n", - "With our VGF inside our PTE we now need to setup the runtime. To do this we will use the previously built MLSDK dependencies, but we will also need to setup a Vulkan environment externally to Executorch.\n", - "Plese follow https://vulkan.lunarg.com/sdk/home in order to setup. \n", - "\n", - "\n", - "After the AOT compilation flow is done, we need to build the executor_runner target. For this example the generic version will be used.\n", - "To do this, please ensure the following commands are executed before moving onto the next step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%bash\n", - "# Ensure the vulkan environment variables and MLSDK components are available on $PATH\n", - "source ethos-u-scratch/setup_path.sh\n", - "\n", - "# Compiled programs will appear in the executorch/cmake-out directory we create here.\n", - "# Build example executor runner application to examples/arm/vgf_minimal_example\n", - "cmake \\\n", - " -DCMAKE_INSTALL_PREFIX=cmake-out \\\n", - " -DCMAKE_BUILD_TYPE=Debug \\\n", - " -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \\\n", - " -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \\\n", - " -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \\\n", - " -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \\\n", - " -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \\\n", - " -DEXECUTORCH_BUILD_XNNPACK=OFF \\\n", - " -DEXECUTORCH_BUILD_VULKAN=ON \\\n", - " -DEXECUTORCH_BUILD_VGF=ON \\\n", - " -DEXECUTORCH_ENABLE_LOGGING=ON \\\n", - " -DPYTHON_EXECUTABLE=python \\\n", - " -B../../cmake-out-vkml ../..\n", - "\n", - "cmake --build ../../cmake-out-vkml --target executor_runner" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Run on VKML Emulator\n", - "\n", - "We can finally use the `backends/arm/scripts/run_vkml.sh` utility script to run the .pte end-to-end and proving out a backend’s kernel implementation. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import subprocess\n", - "\n", - "# Setup paths\n", - "et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n", - "et_dir = os.path.abspath(et_dir)\n", - "script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n", - "\n", - "args = f\"--model={pte_path}\"\n", - "subprocess.run(os.path.join(script_dir, \"run_vkml.sh\") + \" \" + args, shell=True, cwd=et_dir)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 2 }