From 90ebac2539ae431160d7f3e9fca58a8385379062 Mon Sep 17 00:00:00 2001
From: Scott Roy <161522778+metascroy@users.noreply.github.com>
Date: Fri, 17 Oct 2025 15:04:51 -0700
Subject: [PATCH] Update mps docs and fix coreml/mps doc references (#15179)

(cherry picked from commit c016f298ef4f4b29b47ebe747a1dba92c0d4c8b7)
---
 CONTRIBUTING.md                               |  4 +-
 README-wheel.md                               |  2 +-
 backends/apple/coreml/README.md               |  2 +-
 docs/source/backends-overview.md              | 30 ++++-----
 .../mps/mps-overview.md}                      | 63 +++++--------------
 docs/source/ios-coreml.md                     |  2 +-
 docs/source/ios-mps.md                        |  2 +-
 docs/source/quantization-overview.md          |  2 +-
 .../using-executorch-building-from-source.md  |  2 +-
 docs/source/using-executorch-export.md        |  4 +-
 docs/source/using-executorch-ios.md           |  2 +-
 11 files changed, 41 insertions(+), 74 deletions(-)
 rename docs/source/{backends-mps.md => backends/mps/mps-overview.md} (60%)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 71e097042d7..40d3a206f5b 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -24,8 +24,8 @@ For Apple, please refer to the [iOS documentation](docs/source/using-executorch-
 executorch
 ├── <a href="backends">backends</a> - Backend delegate implementations for various hardware targets. Each backend uses partitioner to split the graph into subgraphs that can be executed on specific hardware, quantizer to optimize model precision, and runtime components to execute the graph on target hardware. For details refer to the <a href="docs/source/backend-delegates-integration.md">backend documentation</a> and the <a href="docs/source/using-executorch-export.md">Export and Lowering tutorial</a> for more information.
 │   ├── <a href="backends/apple">apple</a> - Apple-specific backends.
-│   │   ├── <a href="backends/apple/coreml">coreml</a> - CoreML backend for Apple devices. See <a href="docs/source/backends-coreml.md">doc</a>.
-│   │   └── <a href="backends/apple/mps">mps</a> - Metal Performance Shaders backend for Apple devices. See <a href="docs/source/backends-mps.md">doc</a>.
+│   │   ├── <a href="backends/apple/coreml">coreml</a> - CoreML backend for Apple devices. See <a href="docs/source/backends/coreml/coreml-overview.md">doc</a>.
+│   │   └── <a href="backends/apple/mps">mps</a> - Metal Performance Shaders backend for Apple devices. See <a href="docs/source/backends/mps/mps-overview.md">doc</a>.
 │   ├── <a href="backends/arm">arm</a> - ARM architecture backends. See <a href="docs/source/backends-arm-ethos-u.md">doc</a>.
 │   ├── <a href="backends/cadence">cadence</a> - Cadence-specific backends. See <a href="docs/source/backends-cadence.md">doc</a>.
 │   ├── <a href="backends/example">example</a> - Example backend implementations.
diff --git a/README-wheel.md b/README-wheel.md
index 7ae9b0aa2e0..e20b447f96a 100644
--- a/README-wheel.md
+++ b/README-wheel.md
@@ -12,7 +12,7 @@ The prebuilt `executorch.runtime` module included in this package provides a way
 to run ExecuTorch `.pte` files, with some restrictions:
 * Only [core ATen operators](docs/source/ir-ops-set-definition.md) are linked into the prebuilt module
 * Only the [XNNPACK backend delegate](docs/source/backends-xnnpack.md) is linked into the prebuilt module.
-* \[macOS only] [Core ML](docs/source/backends-coreml.md) and [MPS](docs/source/backends-mps.md) backend
+* \[macOS only] [Core ML](docs/source/backends/coreml/coreml-overview.md) and [MPS](docs/source/backends/mps/mps-overview.md) backend
   are also linked into the prebuilt module.
 
 Please visit the [ExecuTorch website](https://pytorch.org/executorch) for
diff --git a/backends/apple/coreml/README.md b/backends/apple/coreml/README.md
index d063dfc8b71..d72f04da1a1 100644
--- a/backends/apple/coreml/README.md
+++ b/backends/apple/coreml/README.md
@@ -1,7 +1,7 @@
 # ExecuTorch Core ML Delegate
 
 This subtree contains the Core ML Delegate implementation for ExecuTorch.
-Core ML is an optimized framework for running machine learning models on Apple devices. The delegate is the mechanism for leveraging the Core ML framework to accelerate operators when running on Apple devices.  To learn how to use the CoreML delegate, see the [documentation](https://github.com/pytorch/executorch/blob/main/docs/source/backends-coreml.md). 
+Core ML is an optimized framework for running machine learning models on Apple devices. The delegate is the mechanism for leveraging the Core ML framework to accelerate operators when running on Apple devices.  To learn how to use the CoreML delegate, see the [documentation](https://github.com/pytorch/executorch/blob/main/docs/source/backends/coreml/coreml-overview.md).
 
 ## Layout
 - `compiler/` : Lowers a module to Core ML backend.
diff --git a/docs/source/backends-overview.md b/docs/source/backends-overview.md
index bfa17bc9a9c..dfeb6243d37 100644
--- a/docs/source/backends-overview.md
+++ b/docs/source/backends-overview.md
@@ -18,20 +18,20 @@ Backends are the bridge between your exported model and the hardware it runs on.
 
 ## Choosing a Backend
 
-| Backend                                        | Platform(s)         | Hardware Type | Typical Use Case                |
-|------------------------------------------------|---------------------|---------------|---------------------------------|
-| [XNNPACK](backends-xnnpack)                    | All                 | CPU           | General-purpose, fallback       |
-| [Core ML](/backends/coreml/coreml-overview.md) | iOS, macOS          | NPU/GPU/CPU   | Apple devices, high performance |
-| [Metal Performance Shaders](backends-mps)      | iOS, macOS          | GPU           | Apple GPU acceleration          |
-| [Vulkan ](backends-vulkan)                     | Android             | GPU           | Android GPU acceleration        |
-| [Qualcomm](backends-qualcomm)                  | Android             | NPU           | Qualcomm SoCs                   |
-| [MediaTek](backends-mediatek)                  | Android             | NPU           | MediaTek SoCs                   |
-| [ARM EthosU](backends-arm-ethos-u)             | Embedded            | NPU           | ARM MCUs                        |
-| [ARM VGF](backends-arm-vgf)                    | Android             | NPU           | ARM platforms                   |
-| [OpenVINO](build-run-openvino)                 | Embedded            | CPU/GPU/NPU   | Intel  SoCs                     |
-| [NXP](backends-nxp)                            | Embedded            | NPU           | NXP SoCs                        |
-| [Cadence](backends-cadence)                    | Embedded            | DSP           | DSP-optimized workloads         |
-| [Samsung Exynos](backends-samsung-exynos)      | Android             | NPU           | Samsung SoCs                    |
+| Backend                                                         | Platform(s)         | Hardware Type | Typical Use Case                |
+|-----------------------------------------------------------------|---------------------|---------------|---------------------------------|
+| [XNNPACK](backends-xnnpack)                                     | All                 | CPU           | General-purpose, fallback       |
+| [Core ML](/backends/coreml/coreml-overview.md)                  | iOS, macOS          | NPU/GPU/CPU   | Apple devices, high performance |
+| [Metal Performance Shaders](/backends/mps/mps-overview.md)      | iOS, macOS          | GPU           | Apple GPU acceleration          |
+| [Vulkan ](backends-vulkan)                                      | Android             | GPU           | Android GPU acceleration        |
+| [Qualcomm](backends-qualcomm)                                   | Android             | NPU           | Qualcomm SoCs                   |
+| [MediaTek](backends-mediatek)                                   | Android             | NPU           | MediaTek SoCs                   |
+| [ARM EthosU](backends-arm-ethos-u)                              | Embedded            | NPU           | ARM MCUs                        |
+| [ARM VGF](backends-arm-vgf)                                     | Android             | NPU           | ARM platforms                   |
+| [OpenVINO](build-run-openvino)                                  | Embedded            | CPU/GPU/NPU   | Intel  SoCs                     |
+| [NXP](backends-nxp)                                             | Embedded            | NPU           | NXP SoCs                        |
+| [Cadence](backends-cadence)                                     | Embedded            | DSP           | DSP-optimized workloads         |
+| [Samsung Exynos](backends-samsung-exynos)                       | Android             | NPU           | Samsung SoCs                    |
 
 **Tip:** For best performance, export a `.pte` file for each backend you plan to support.
 
@@ -52,7 +52,7 @@ Backends are the bridge between your exported model and the hardware it runs on.
 
 backends-xnnpack
 backends/coreml/coreml-overview
-backends-mps
+backends/mps/mps-overview
 backends-vulkan
 backends-qualcomm
 backends-mediatek
diff --git a/docs/source/backends-mps.md b/docs/source/backends/mps/mps-overview.md
similarity index 60%
rename from docs/source/backends-mps.md
rename to docs/source/backends/mps/mps-overview.md
index 184bd88e3a7..a2280defad5 100644
--- a/docs/source/backends-mps.md
+++ b/docs/source/backends/mps/mps-overview.md
@@ -1,55 +1,27 @@
 # MPS Backend
 
-In this tutorial we will walk you through the process of getting setup to build the MPS backend for ExecuTorch and running a simple model on it.
+MPS delegate is the ExecuTorch solution to take advantage of Apple's GPU for on-device ML using the [MPS Graph](https://developer.apple.com/documentation/metalperformanceshadersgraph/mpsgraph?language=objc) framework and tuned kernels provided by [MPS](https://developer.apple.com/documentation/metalperformanceshaders?language=objc).
 
-The MPS backend device maps machine learning computational graphs and primitives on the [MPS Graph](https://developer.apple.com/documentation/metalperformanceshadersgraph/mpsgraph?language=objc) framework and tuned kernels provided by [MPS](https://developer.apple.com/documentation/metalperformanceshaders?language=objc).
+## Target Requirements
 
-::::{grid} 2
-:::{grid-item-card}  What you will learn in this tutorial:
-:class-card: card-prerequisites
-* In this tutorial you will learn how to export [MobileNet V3](https://pytorch.org/vision/main/models/mobilenetv3.html) model to the MPS delegate.
-* You will also learn how to compile and deploy the ExecuTorch runtime with the MPS delegate on macOS and iOS.
-:::
-:::{grid-item-card}  Tutorials we recommend you complete before this:
-:class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Getting Started](getting-started.md)
-* [Building ExecuTorch with CMake](using-executorch-building-from-source.md)
-* [ExecuTorch iOS Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo)
-* [ExecuTorch LLM iOS Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple)
-:::
-::::
+Below are the minimum OS requirements on various hardware for running a MPS-delegated ExecuTorch model:
+- [macOS](https://developer.apple.com/macos) >= 12.4
+- [iOS](https://www.apple.com/ios) >= 15.4
 
+## Development Requirements
+To develop you need:
 
-## Prerequisites (Hardware and Software)
+- [Xcode](https://developer.apple.com/xcode/) >= 14.1
 
-In order to be able to successfully build and run a model using the MPS backend for ExecuTorch, you'll need the following hardware and software components:
+Before starting, make sure you install the Xcode Command Line Tools:
 
-### Hardware:
- - A [mac](https://www.apple.com/mac/) for tracing the model
-
-### Software:
-
-  - **Ahead of time** tracing:
-    - [macOS](https://www.apple.com/macos/) 12
-
-  - **Runtime**:
-    - [macOS](https://www.apple.com/macos/) >= 12.4
-    - [iOS](https://www.apple.com/ios) >= 15.4
-    - [Xcode](https://developer.apple.com/xcode/) >= 14.1
-
-## Setting up Developer Environment
-
-***Step 1.*** Complete the steps in [Getting Started](getting-started.md) to set up the ExecuTorch development environment.
-
-You will also need a local clone of the ExecuTorch repository. See [Building ExecuTorch from Source](using-executorch-building-from-source.html) for instructions. All commands in this document should be run from the executorch repository.
-
-## Build
+```bash
+xcode-select --install
+```
 
-### AOT (Ahead-of-time) Components
+## Using the MPS Backend
 
-**Compiling model for MPS delegate**:
-- In this step, you will generate a simple ExecuTorch program that lowers MobileNetV3 model to the MPS delegate. You'll then pass this Program (the `.pte` file) during the runtime to run it using the MPS backend.
+In this step, you will generate a simple ExecuTorch program that lowers MobileNetV3 model to the MPS delegate. You'll then pass this Program (the `.pte` file) during the runtime to run it using the MPS backend.
 
 ```bash
 cd executorch
@@ -121,7 +93,7 @@ python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --generate_
 python3 -m devtools.inspector.inspector_cli --etdump_path etdump.etdp --etrecord_path etrecord.bin
 ```
 
-## Deploying and Running on Device
+## Runtime integration
 
 ***Step 1***. Create the ExecuTorch core and MPS delegate frameworks to link on iOS
 ```bash
@@ -146,8 +118,3 @@ From the same page, include the needed libraries for the MPS delegate:
 - `Metal.framework`
 
 In this tutorial, you have learned how to lower a model to the MPS delegate, build the mps_executor_runner and run a lowered model through the MPS delegate, or directly on device using the MPS delegate static library.
-
-
-## Frequently encountered errors and resolution.
-
-If you encountered any bugs or issues following this tutorial please file a bug/issue on the [ExecuTorch repository](https://github.com/pytorch/executorch/issues), with hashtag **#mps**.
diff --git a/docs/source/ios-coreml.md b/docs/source/ios-coreml.md
index 48271326d87..ff6551aa0c2 100644
--- a/docs/source/ios-coreml.md
+++ b/docs/source/ios-coreml.md
@@ -1 +1 @@
-```{include} backends-coreml.md
+```{include} backends/coreml/coreml-overview.md
diff --git a/docs/source/ios-mps.md b/docs/source/ios-mps.md
index d6f305d33aa..13717675ba5 100644
--- a/docs/source/ios-mps.md
+++ b/docs/source/ios-mps.md
@@ -1 +1 @@
-```{include} backends-mps.md
+```{include} backends/mps/mps-overview.md
diff --git a/docs/source/quantization-overview.md b/docs/source/quantization-overview.md
index 4ff8d34a4a8..4ac886b9ed2 100644
--- a/docs/source/quantization-overview.md
+++ b/docs/source/quantization-overview.md
@@ -29,7 +29,7 @@ These quantizers usually support configs that allow users to specify quantizatio
 Not all quantization options are supported by all backends. Consult backend-specific guides for supported quantization modes and configuration, and how to initialize the backend-specific PT2E quantizer:
 
 * [XNNPACK quantization](backends-xnnpack.md#quantization)
-* [CoreML quantization](backends-coreml.md#quantization)
+* [CoreML quantization](backends/coreml/coreml-quantization.md)
 * [QNN quantization](backends-qualcomm.md#step-2-optional-quantize-your-model)
 
 
diff --git a/docs/source/using-executorch-building-from-source.md b/docs/source/using-executorch-building-from-source.md
index 48901f62a76..36f8f5fefac 100644
--- a/docs/source/using-executorch-building-from-source.md
+++ b/docs/source/using-executorch-building-from-source.md
@@ -385,7 +385,7 @@ xcode-select --install
 ```
 
 Run the above command with `--help` flag to learn more on how to build additional backends
-(like [Core ML](backends-coreml.md), [MPS](backends-mps.md) or XNNPACK), etc.
+(like [Core ML](backends/coreml/coreml-overview.md), [MPS](backends/mps/mps-overview.md) or XNNPACK), etc.
 Note that some backends may require additional dependencies and certain versions of Xcode and iOS.
 See backend-specific documentation for more details.
 
diff --git a/docs/source/using-executorch-export.md b/docs/source/using-executorch-export.md
index 7abf5cbd30a..f0ad7c18467 100644
--- a/docs/source/using-executorch-export.md
+++ b/docs/source/using-executorch-export.md
@@ -33,8 +33,8 @@ As part of the .pte file creation process, ExecuTorch identifies portions of the
 Commonly used hardware backends are listed below. For mobile, consider using XNNPACK for Android and XNNPACK or Core ML for iOS. To create a .pte file for a specific backend, pass the appropriate partitioner class to `to_edge_transform_and_lower`. See the appropriate backend documentation and the [Export and Lowering](#export-and-lowering) section below for more information.
 
 - [XNNPACK (CPU)](backends-xnnpack.md)
-- [Core ML (iOS)](backends-coreml.md)
-- [Metal Performance Shaders (iOS GPU)](backends-mps.md)
+- [Core ML (iOS)](backends/coreml/coreml-overview.md)
+- [Metal Performance Shaders (iOS GPU)](backends/mps/mps-overview.md)
 - [Vulkan (Android GPU)](backends-vulkan.md)
 - [Qualcomm NPU](backends-qualcomm.md)
 - [MediaTek NPU](backends-mediatek.md)
diff --git a/docs/source/using-executorch-ios.md b/docs/source/using-executorch-ios.md
index 15ccef8d8a1..8e075853161 100644
--- a/docs/source/using-executorch-ios.md
+++ b/docs/source/using-executorch-ios.md
@@ -107,7 +107,7 @@ git clone -b release/1.0 https://github.com/pytorch/executorch.git --depth 1 --r
 python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
 ```
 
-4. Install the required dependencies, including those needed for the backends like [Core ML](backends-coreml.md) or [MPS](backends-mps.md), if you plan to build them later:
+4. Install the required dependencies, including those needed for the backends like [Core ML](backends/coreml/coreml-overview.md) or [MPS](backends/mps/mps-overview.md), if you plan to build them later:
 
 ```bash
 ./install_requirements.sh