Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions docs/source/advanced-topics-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
(advanced-topics-section)=

# Advanced

Deep dive into ExecuTorch's advanced features for optimization, customization, and integration.

This section covers advanced concepts for developers who need to customize ExecuTorch for specific use cases, optimize performance, or integrate with custom hardware backends.

## Quantization & Optimization

Techniques for model compression and performance optimization.

**→ {doc}`quantization-optimization` — Quantization strategies and performance optimization**

Key topics:

- Quantization strategies and techniques
- Performance profiling and optimization

## Model Export

Learn the core ExecuTorch workflow, exporting PyTorch models to the `.pte` format for edge deployment.

**→ {doc}`using-executorch-export`** - Model Export & Lowering

Key topics:

- Export and Lowering Workflow
- Hardware Backend Selection & Optimization
- Dynamic Shapes & Advanced Model Features


## Kernel Library

Deep dive into ExecuTorch's kernel implementation and customization.

**→ {doc}`kernel-library-advanced` — Kernel library deep dive and customization**

Key topics:

- Kernel library architecture
- Custom kernel implementation
- Selective build and optimization

## Backend & Delegates

**→ {doc}`backend-delegate-advanced` — Backend delegate integration**

Key topics:

- Learn how to integrate Backend Delegate into ExecuTorch and more
- XNNPACK Delegate Internals
- Debugging Delegation


## Runtime & Integration

Advanced runtime features and backend integration.

**→ {doc}`runtime-integration-advanced` — Runtime customization and backend integration**

Key topics:

- Backend delegate implementation
- Platform abstraction layer
- Custom runtime integration

## Compiler & IR

Advanced compiler features and intermediate representation details.

**→ {doc}`compiler-ir-advanced` — Compiler passes and IR specification**

Key topics:

- Custom compiler passes
- Memory planning strategies
- Backend dialect and EXIR
- Ops set definition


## File Formats

ExecuTorch file format specifications and internals.

**→ {doc}`file-formats-advanced` — PTE and PTD file format specifications**

Key topics:

- PTE file format internals
- PTD file format specification
- Custom file format handling

## Next Steps

After exploring advanced topics:

- **{doc}`tools-sdk-section`** - Developer tools for debugging and profiling
- **{doc}`api-section`** - Complete API reference documentation

```{toctree}
:hidden:
:maxdepth: 2
:caption: Advanced Topics

quantization-optimization
using-executorch-export
kernel-library-advanced
backend-delegate-advanced
runtime-integration-advanced
compiler-ir-advanced
file-formats-advanced
1 change: 1 addition & 0 deletions docs/source/android-arm-vgf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-arm-vgf.md
28 changes: 28 additions & 0 deletions docs/source/android-backends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
(android-backends)=
# Backends

Available hardware acceleration backends for Android deployment.

## CPU Acceleration

- {doc}`android-xnnpack` — XNNPACK CPU acceleration

## GPU Acceleration

- {doc}`android-vulkan` — Vulkan GPU acceleration

## NPU/Accelerator Backends

- {doc}`android-qualcomm` — Qualcomm AI Engine (NPU)
- {doc}`android-mediatek` — MediaTek NPU acceleration
- {doc}`android-arm-vgf` — ARM VGF Backend
- {doc}`android-samsung-exynos` — Samsung Exynos NPU

```{toctree}
:hidden:
android-xnnpack
android-vulkan
android-qualcomm
android-mediatek
android-arm-vgf
android-samsung-exynos
9 changes: 9 additions & 0 deletions docs/source/android-examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Examples & Demos

- [Working with LLMs - Android Examples](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android)
- [Demo Apps](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3/android/DeepLabV3Demo#executorch-android-demo-app)
- {doc}`tutorial-arm-vgf` — Export a simple PyTorch model for the ExecuTorch VGF backend

```{toctree}
:hidden:
tutorial-arm-vgf
1 change: 1 addition & 0 deletions docs/source/android-mediatek.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-mediatek.md
1 change: 1 addition & 0 deletions docs/source/android-qualcomm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-qualcomm.md
1 change: 1 addition & 0 deletions docs/source/android-samsung-exynos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-samsung-exynos.md
23 changes: 23 additions & 0 deletions docs/source/android-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
(android-section)=

# Android

Deploy ExecuTorch on Android devices with hardware acceleration support.

## Quick Start & Integration

- {doc}`using-executorch-android` — Complete Android integration guide

## Backends

- {doc}`android-backends` — Available Android backends and acceleration options

## Examples & Demos

- {doc}`android-examples` — Explore Android Examples & Demos

```{toctree}
:hidden:
using-executorch-android
android-backends
android-examples
1 change: 1 addition & 0 deletions docs/source/android-vulkan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-vulkan.md
1 change: 1 addition & 0 deletions docs/source/android-xnnpack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-xnnpack.md
26 changes: 26 additions & 0 deletions docs/source/api-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
(api-section)=
# API

In this section, find complete API documentation for ExecuTorch's export, runtime, and extension interfaces. Includes comprehensive references for Python, C++, and Java APIs across all supported platforms.

- {doc}`export-to-executorch-api-reference` — Export to ExecuTorch API Reference
- {doc}`executorch-runtime-api-reference` — ExecuTorch Runtime API Reference
- {doc}`runtime-python-api-reference` — Runtime Python API Reference
- {doc}`api-life-cycle` — API Life Cycle
- [Android doc →](https://pytorch.org/executorch/main/javadoc/)** — Android API Documentation
- {doc}`extension-module` — Extension Module
- {doc}`extension-tensor` — Extension Tensor
- {doc}`running-a-model-cpp-tutorial` — Detailed C++ Runtime APIs Tutorial

```{toctree}
:hidden:
:maxdepth: 1
:caption: API Reference

export-to-executorch-api-reference
executorch-runtime-api-reference
runtime-python-api-reference
api-life-cycle
extension-module
extension-tensor
running-a-model-cpp-tutorial
11 changes: 0 additions & 11 deletions docs/source/api.md

This file was deleted.

33 changes: 33 additions & 0 deletions docs/source/backend-delegate-advanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
(backend-delegate-advanced)=

# Backend & Delegates

## Integration

- {doc}`backend-delegates-integration` — Learn how to integrate a backend delegate into ExecuTorch

## XNNPACK Reference

- {doc}`backend-delegates-xnnpack-reference` — Deep dive into XNNPACK delegate internals and implementation details

## Dependency Management

- {doc}`backend-delegates-dependencies` — Manage third-party dependencies for backend delegates

## Overview

- {doc}`compiler-delegate-and-partitioner` — Understanding backends, delegates, and the partitioner system

## Debugging

- {doc}`debug-backend-delegate` — Tools and techniques for debugging delegation issues

```{toctree}
:hidden:
:maxdepth: 1

backend-delegates-integration
backend-delegates-xnnpack-reference
backend-delegates-dependencies
compiler-delegate-and-partitioner
debug-backend-delegate
73 changes: 58 additions & 15 deletions docs/source/backends-overview.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,64 @@
# Backend Overview
# Backends

ExecuTorch backends provide hardware acceleration for a specific hardware target. In order to achieve maximum performance on target hardware, ExecuTorch optimizes the model for a specific backend during the export and lowering process. This means that the resulting .pte file is specialized for the specific hardware. In order to deploy to multiple backends, such as Core ML on iOS and Arm CPU on Android, it is common to generate a dedicated .pte file for each.
## Backend Overview

The choice of hardware backend is informed by the hardware that the model is intended to be deployed on. Each backend has specific hardware requires and level of model support. See the documentation for each hardware backend for more details.
ExecuTorch backends provide hardware acceleration for specific hardware targets, enabling models to run efficiently on devices ranging from mobile phones to embedded systems and DSPs. During the export and lowering process, ExecuTorch optimizes your model for the chosen backend, resulting in a `.pte` file specialized for that hardware. To support multiple platforms (e.g., Core ML on iOS, Arm CPU on Android), you typically generate a dedicated `.pte` file for each backend.

As part of the .pte file creation process, ExecuTorch identifies portions of the model (partitions) that are supported for the given backend. These sections are processed by the backend ahead of time to support efficient execution. Portions of the model that are not supported on the delegate, if any, are executed using the portable fallback implementation on CPU. This allows for partial model acceleration when not all model operators are supported on the backend, but may have negative performance implications. In addition, multiple partitioners can be specified in order of priority. This allows for operators not supported on GPU to run on CPU via XNNPACK, for example.
The choice of backend is informed by the hardware your model will run on. Each backend has its own hardware requirements and level of model/operator support. See the documentation for each backend for details.

### Available Backends
As part of `.pte` file creation, ExecuTorch identifies model partitions supported by the backend. These are processed ahead of time for efficient execution. Operators not supported by the delegate are executed using the portable CPU fallback (e.g., XNNPACK), allowing for partial acceleration. You can also specify multiple partitioners in order of priority, so unsupported GPU ops can fall back to CPU, for example.

Commonly used hardware backends are listed below. For mobile, consider using XNNPACK for Android and XNNPACK or Core ML for iOS. To create a .pte file for a specific backend, pass the appropriate partitioner class to `to_edge_transform_and_lower`. See the appropriate backend documentation for more information.
---

- [XNNPACK (Mobile CPU)](backends-xnnpack.md)
- [Core ML (iOS)](backends-coreml.md)
- [Metal Performance Shaders (iOS GPU)](backends-mps.md)
- [Vulkan (Android GPU)](backends-vulkan.md)
- [Qualcomm NPU](backends-qualcomm.md)
- [MediaTek NPU](backends-mediatek.md)
- [ARM Ethos-U NPU](backends-arm-ethos-u.md)
- [ARM VGF](backends-arm-vgf.md)
- [Cadence DSP](backends-cadence.md)
## Why Backends Matter

Backends are the bridge between your exported model and the hardware it runs on. Choosing the right backend ensures your model takes full advantage of device-specific acceleration, balancing performance, compatibility, and resource usage.

---

## Choosing a Backend

| Backend | Platform(s) | Hardware Type | Typical Use Case |
|------------------------------------------|---------------------|---------------|---------------------------------|
| [XNNPACK](backends-xnnpack) | All | CPU | General-purpose, fallback |
| [Core ML](backends-coreml) | iOS, macOS | NPU/GPU | Apple devices, high performance |
| [Metal Performance Shaders](backends-mps)| iOS, macOS | GPU | Apple GPU acceleration |
| [Vulkan ](backends-vulkan) | Android | GPU | Android GPU acceleration |
| [Qualcomm](backends-qualcomm) | Android | NPU | Qualcomm SoCs |
| [MediaTek](backends-mediatek) | Android | NPU | MediaTek SoCs |
| [ARM EthosU](backends-arm-ethos-u) | Embedded | NPU | ARM MCUs |
| [ARM VGF](backends-arm-vgf) | Android | NPU | ARM platforms |
| [OpenVINO](build-run-openvino) | Embedded | CPU/GPU/NPU | Intel SoCs |
| [NXP](backends-nxp) | Embedded | NPU | NXP SoCs |
| [Cadence](backends-cadence) | Embedded | DSP | DSP-optimized workloads |
| [Samsung Exynos](backends-samsung-exynos)| Android | NPU | Samsung Socs |

**Tip:** For best performance, export a `.pte` file for each backend you plan to support.

---

## Best Practices

- **Test on all target devices:** Operator support may vary by backend.
- **Use fallback wisely:** If a backend doesn't support an operator, ExecuTorch will run it on CPU.
- **Consult backend docs:** Each backend has unique setup and tuning options.

---

```{toctree}
:maxdepth: 1
:hidden:
:caption: Backend Overview

backends-xnnpack
backends-coreml
backends-mps
backends-vulkan
backends-qualcomm
backends-mediatek
backends-arm-ethos-u
backends-arm-vgf
build-run-openvino
backends-nxp
backends-cadence
backends-samsung-exynos
1 change: 1 addition & 0 deletions docs/source/backends-samsung-exynos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Samsung Exynos Backend (TBD)
1 change: 1 addition & 0 deletions docs/source/backends-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
```{include} backends-overview.md
7 changes: 4 additions & 3 deletions docs/source/backends-xnnpack.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,11 @@ The XNNPACK delegate can also be used as a backend to execute symmetrically quan

### Supported Quantization Schemes
The XNNPACK delegate supports the following quantization schemes:

- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow).
- Supports both static and dynamic activations.
- Supports per-channel and per-tensor schemes.
- Supports linear, convolution, add, mul, cat, and adaptive avg pool 2d operators.
- Supports both static and dynamic activations.
- Supports per-channel and per-tensor schemes.
- Supports linear, convolution, add, mul, cat, and adaptive avg pool 2d operators.

Weight-only quantization is not currently supported on XNNPACK.

Expand Down
17 changes: 0 additions & 17 deletions docs/source/backends.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/compiler-delegate-and-partitioner.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Backends and Delegates
# Understanding Backends and Delegates

Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch

Expand Down
Loading
Loading