From 8d9b821e179a7170fcff0be18cf2121bb25d950f Mon Sep 17 00:00:00 2001
From: Gregory James Comer <gjcomer@meta.com>
Date: Tue, 21 Oct 2025 11:15:23 -0700
Subject: [PATCH] Update build from source and getting started docs

---
 docs/source/getting-started.md                |  31 +-
 .../using-executorch-building-from-source.md  | 352 +++++++-----------
 2 files changed, 155 insertions(+), 228 deletions(-)
diff --git a/docs/source/getting-started.md b/docs/source/getting-started.md
index 51c59f5e021..767e80b1183 100644
--- a/docs/source/getting-started.md
+++ b/docs/source/getting-started.md
@@ -10,9 +10,9 @@ The following are required to install the ExecuTorch host libraries, needed to e
 
 - Python 3.10 - 3.12
 - g++ version 7 or higher, clang++ version 5 or higher, or another C++17-compatible toolchain.
-- Linux (x86_64 or ARM64) or macOS (ARM64).
+- Linux (x86_64 or ARM64), macOS (ARM64), or Windows (x86_64).
     - Intel-based macOS systems require building PyTorch from source (see [Building From Source](using-executorch-building-from-source.md) for instructions).
-    - Windows is supported via WSL.
+- On Windows, Visual Studio 2022 or later. Clang build tools are needed to build from source.
 
 ## Installation
 To use ExecuTorch, you will need to install both the Python package and the appropriate platform-specific runtime libraries. Pip is the recommended way to install the ExecuTorch python package.
@@ -25,6 +25,7 @@ pip install executorch
 
 To build the framework from source, see [Building From Source](using-executorch-building-from-source.md). Backend delegates may require additional dependencies. See the appropriate backend documentation for more information.
 
+> **_NOTE:_** On Windows, ExecuTorch requires a [Visual Studio Developer Powershell](https://learn.microsoft.com/en-us/visualstudio/ide/reference/command-prompt-powershell?view=vs-2022). Running from outside of a developer prompt will manifest as errors related to CL.exe.
 
 <hr/>
 
@@ -44,7 +45,7 @@ ExecuTorch provides hardware acceleration for a wide variety of hardware. The mo
 For mobile use cases, consider using XNNPACK for Android and Core ML or XNNPACK for iOS as a first step. See [Hardware Backends](backends-overview.md) for more information.
 
 ### Exporting
-Exporting is done using Python APIs. ExecuTorch provides a high degree of customization during the export process, but the typical flow is as follows. This example uses the MobileNet V2 image classification model implementation in torchvision, but the process supports any [export-compliant](https://pytorch.org/docs/stable/export.html) PyTorch model. For users working with Hugging Face models,
+Exporting is done using Python APIs. ExecuTorch provides a high degree of customization during the export process, but the typical flow is as follows. This example uses the MobileNet V2 image classification model implementation in torchvision, but the process supports any [export-compliant](https://pytorch.org/docs/stable/export.html) PyTorch model. For Hugging Face models,
 you can find a list of supported models in the [*huggingface/optimum-executorch*](https://github.com/huggingface/optimum-executorch) repo.
 
 ```python
@@ -103,7 +104,7 @@ print(torch.allclose(output[0], eager_reference_output, rtol=1e-3, atol=1e-5))
 
 For complete examples of exporting and running the model, please refer to our [examples GitHub repository](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2/python).
 
-Additionally, if you work with Hugging Face models, the [*huggingface/optimum-executorch*](https://github.com/huggingface/optimum-executorch) library simplifies running these models end-to-end with ExecuTorch, using familiar Hugging Face APIs. Visit the repository for specific examples and supported models.
+Additionally, for Hugging Face models, the [*huggingface/bptimum-executorch*](https://github.com/huggingface/optimum-executorch) library simplifies running these models end-to-end with ExecuTorch using familiar Hugging Face APIs. Visit the repository for specific examples and supported models.
 
 <hr/>
 
@@ -131,7 +132,7 @@ dependencies {
 ```
 
 #### Runtime APIs
-Models can be loaded and run using the `Module` class:
+Models can be loaded and run from Java or Kotlin using the `Module` class.
 ```java
 import org.pytorch.executorch.EValue;
 import org.pytorch.executorch.Module;
@@ -147,8 +148,11 @@ EValue[] output = model.forward(input_evalue);
 float[] scores = output[0].toTensor().getDataAsFloatArray();
 ```
 
+Note that the [C++](#c) APIs can be used when targeting Android native.
+
 For a full example of running a model on Android, see the [DeepLabV3AndroidDemo](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3/android/DeepLabV3Demo). For more information on Android development, including building from source, a full description of the Java APIs, and information on using ExecuTorch from Android native code, see [Using ExecuTorch on Android](using-executorch-android.md).
 
+
 ### iOS
 
 #### Installation
@@ -165,22 +169,27 @@ For more information on iOS integration, including an API reference, logging set
 ExecuTorch provides C++ APIs, which can be used to target embedded or mobile devices. The C++ APIs provide a greater level of control compared to other language bindings, allowing for advanced memory management, data loading, and platform integration.
 
 #### Installation
-CMake is the preferred build system for the ExecuTorch C++ runtime. To use with CMake, clone the ExecuTorch repository as a subdirectory of your project, and use CMake's `add_subdirectory("executorch")` to include the dependency. The `executorch` target, as well as kernel and backend targets will be made available to link against. The runtime can also be built standalone to support diverse toolchains. See [Using ExecuTorch with C++](using-executorch-cpp.md) for a detailed description of build integration, targets, and cross compilation.
+CMake is the preferred build system for the ExecuTorch C++ runtime. To use with CMake, clone the ExecuTorch repository as a subdirectory of your project, and use CMake's `add_subdirectory("executorch")` to include the dependency. The `executorch` target, as well as kernel and backend targets will be made available to link against. The runtime can also be built standalone to support diverse toolchains. See [Using ExecuTorch with C++](using-executorch-cpp.md) and [Building from Source](using-executorch-building-from-source.md) for a detailed description of build integration, targets, and cross compilation.
 
 ```
 git clone -b viable/strict https://github.com/pytorch/executorch.git
 ```
-```python
+```cmake
+# Set CMAKE_CXX_STANDARD to 17 or above.
+set(CMAKE_CXX_STANDARD 17)
+
 # CMakeLists.txt
+set(EXECUTORCH_BUILD_PRESET_FILE ${CMAKE_SOURCE_DIR}/executorch/tools/cmake/preset/llm.cmake)
+# Set other ExecuTorch options here.
+
 add_subdirectory("executorch")
 ...
 target_link_libraries(
   my_target
   PRIVATE executorch
-          extension_module_static
-          extension_tensor
-          optimized_native_cpu_ops_lib
-          xnnpack_backend)
+          executorch::backends
+          executorch::extensions
+          executorch::kernels)
 ```
 
 
diff --git a/docs/source/using-executorch-building-from-source.md b/docs/source/using-executorch-building-from-source.md
index 66ee7ef7a24..56792ef1f42 100644
--- a/docs/source/using-executorch-building-from-source.md
+++ b/docs/source/using-executorch-building-from-source.md
@@ -17,8 +17,8 @@ ExecuTorch is tested on the following systems, although it should also work in s
  * macOS (x86_64/ARM64)
     * Big Sur (11.0)+
  * Windows (x86_64)
+    * Windows 10+ with Visual Studio 2022+ and [Clang-CL](https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170)
     * Windows Subsystem for Linux (WSL) with any of the Linux options
-    * Windows 10+ with Visual Studio 2022+ (experimental)
 
 ### Software Requirements
 
@@ -29,16 +29,19 @@ ExecuTorch is tested on the following systems, although it should also work in s
 * `g++` version 7 or higher, `clang++` version 5 or higher, or another
   C++17-compatible toolchain.
 * `python` version 3.10-3.12
-* `Xcode Command Line Tools` (macOS only)
 * `ccache` (optional) - A compiler cache that speeds up recompilation
+* **macOS**
+  - `Xcode Command Line Tools`
+* **Windows**
+  - `Visual Studio Clang Tools` - See [Clang/LLVM support in Visual Studio](https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170).
 
-Additional dependencies will be installed automatically when running the [Python installation](#building-the-python-package).
+Additional dependencies will be automatically installed when running the [Python installation](#building-the-python-package).
 Note that the cross-compilable core runtime code supports a wider range of
-toolchains, down to C++17. See the [Runtime Overview](runtime-overview.md) for
+toolchains, down to C++17. See [Runtime Overview](runtime-overview.md) for
 portability details.
 
 ## Environment Setup
- Clone the ExecuTorch repository from GitHub and create a conda environment as follows. Venv can be used in place on conda.
+ Clone the ExecuTorch repository from GitHub and create a conda environment. Venv can be used in place on conda.
    ```bash
    git clone -b viable/strict https://github.com/pytorch/executorch.git
    cd executorch
@@ -46,6 +49,13 @@ portability details.
    conda activate executorch
    ```
 
+> **_NOTE:_** Addition Windows Setup
+>
+> ExecuTorch requires symlinks to be enabled to build the Python components. To enable symlinks, run the following command before cloning the repository. Missing symlinks will manifest as an error related to `version.py` when running `pip install .`. See [src/README.md](https://github.com/pytorch/executorch/blob/main/src/README.md) for more information.
+> ```bash
+>   git config --system core.symlinks true
+>  ```
+
 <hr/>
 
 ## Building the Python package
@@ -62,7 +72,7 @@ portability details.
   * `--clean`: Removes build artifacts.
   * `--editable`: Install the ExecuTorch python package in editable mode (see [Editable Install](#editable-install)).
   * `--minimal`: Install only the minimal set of dependencies required to run ExecuTorch. Do not install dependencies for examples.
-  * `--use-pt-pinned-commit`: Install the pinned PyTorch commit. When not specified, the latest PyTorch nightly build is installed.
+  * `--use-pt-pinned-commit`: Install the pinned PyTorch commit or release version. When not specified, the latest PyTorch nightly build is installed.
 
   For Intel-based macOS systems, use `--use-pt-pinned-commit --minimal`. As PyTorch does not provide pre-built binaries for Intel Mac, installation requires building PyTorch from source. Instructions can be found in [PyTorch Installation](https://github.com/pytorch/pytorch#installation).
 
@@ -73,6 +83,13 @@ portability details.
   CMAKE_ARGS="-DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh
   ```
 
+  ## Verify the Build
+
+To verify that the Python components are installed correctly, run the following command. This will create a file named mv2_xnnpack_fp32.pte in the current directory for the MobileNet V2 model with the XNNPACK backend. If it completes without error, the ExecuTorch Python components are installed successfully.
+```bash
+python -m executorch.examples.xnnpack.aot_compiler --model_name="mv2" --delegate
+```
+
   ### Editable Install
    For development, include the `--editable` flag, which allows for local changes to ExecuTorch Python code to be reflected without a re-install. Note that when C++ files are modified, you will need to re-run the full installation to reflect the changes.
    ```bash
@@ -114,47 +131,38 @@ portability details.
 
 ## Building the C++ Runtime
 
-The ExecuTorch C++ runtime is built using CMake. It can be compiled standalone to run examples, added as a CMake dependency, or cross-compiled for Android, iOS, or embedded platforms.
+The ExecuTorch runtime uses CMake as the build system. When using ExecuTorch from C++ user code with CMake, adding ExecuTorch as a submodule and referencing via CMake `add_subdirectory` will build the runtime as part of the user build.
 
-### Configuring
+When user code is not using CMake, the runtime can be built standalone and linked. The CMake options described below apply in both cases. Scripts are also provided for [Android AAR](#cross-compiling-for-android) and [iOS framework](#cross-compiling-for-ios) builds.
 
-Configuration should be done after cloning, pulling the upstream repo, or changing build options. Once this is done, you won't need to do it again until you pull from the upstream repo or modify any CMake-related files.
+| Use Case                   | How to Build                                                                       |
+| :------------------------- | :--------------------------------------------------------------------------------- |
+| C++ with user CMake        | Use CMake `add_subdirectory`.                                                      |
+| C++ without user CMake     | Bulild ExecuTorch standalone with CMake. Link libraries with user build.           |
+| Android with Java/Kotlin   | Use [scripts/build_android_libraries.sh](#cross-compiling-for-android).            |
+| Android with C++           | Follow C++ build steps, [cross-compile for Android](#cross-compiling-for-android). |
+| iOS                        | Use [scripts/build_ios_frameworks.sh](#cross-compiling-for-ios).                   |
 
-```bash
-# cd to the root of the executorch repo
-cd executorch
-
-# Clean and configure the CMake build system. It's good practice to do this
-# whenever cloning or pulling the upstream repo.
-./install_executorch.sh --clean
-(mkdir cmake-out && cd cmake-out && cmake ..)
-```
+### Configuring
 
-### Building
+Configuration should be done after cloning, pulling the upstream repo, or changing build options. Once this is done, you won't need to do it again until you pull from the upstream repo or modify any CMake-related files.
 
-Build all targets with `cmake --build`.
+When building as a submodule as part of a user CMake build, ExecuTorch CMake options can be specified either as part of the user CMake configuration or in user CMake code.
 
+CMake configuration for standalone runtime build:
 ```bash
-# cd to the root of the executorch repo
-cd executorch
-
-# Build using the configuration that you previously generated under the
-# `cmake-out` directory.
-#
-# NOTE: The `-j` argument specifies how many jobs/processes to use when
-# building, and tends to speed up the build significantly. It's typical to use
-# "core count + 1" as the `-j` value.
-cmake --build cmake-out -j9
+mkdir cmake-out
+cmake -B cmake-out --preset [preset] [options]
+cmake --build cmake-out -j10
 ```
 
-> **_TIP:_** For faster rebuilds, consider installing ccache (see [Compiler Cache section](#compiler-cache-ccache) above). On first builds, ccache populates its cache. Subsequent builds with the same compiler flags can be significantly faster.
-
-### Build Presets
+#### Build Presets
 
-ExecuTorch provides fine-grained control over what is built, as described in [Build Options](#build-options). These options are grouped into CMake presets to cover common scenarios, while providing the ability to override individual options. Presets can be specified when configuring CMake by specifying `--preset [name]` when configuring.
+ExecuTorch provides fine-grained control over what is built, as described in [Build Options](#build-options). These options are grouped into CMake presets to cover common scenarios while preserving the ability to override individual options. Presets can be specified when configuring CMake by specifying `--preset [name]` when configuring.
 
 Preset values for common scenarios are listed below. Using a platform preset is recommended to avoid needing to specify many fine-grained build options.
 
+ * `android` - Build featuers and backends common for Android targets.
  * `arm-baremetal` - Build for bare-metal ARM targets.
  * `ios` - Build features and backends common for iOS targets.
  * `macos` - Build features and backends common for Mac targets.
@@ -163,77 +171,34 @@ Preset values for common scenarios are listed below. Using a platform preset is
  * `profiling` - Build the ExecuTorch runtime with profiling enabled.
  * `zephyr` - Build for Zephyr RTOS.
 
+User CMake:
+```cmake
+set(EXECUTORCH_BUILD_PRESET_FILE ${CMAKE_SOURCE_DIR}/executorch/tools/cmake/preset/llm.cmake)
+```
+
+Standalone build:
 ```bash
 # Configure the build with the ios preset.
 cmake .. --preset ios
 ```
 
-### CMake Targets and Libraries
+#### Build Options
 
-To link against the ExecuTorch framework from CMake, the following top-level targets are exposed:
-
- * `executorch::backends`: Contains all configured backends.
- * `executorch::extensions`: Contains all configured extensions.
- * `executorch::kernels`: Contains all configured kernel libraries.
-
-The backends, extensions, and kernels included in these targets are controlled by the various `EXECUTORCH_` CMake options specified by the build. Using these targets will automatically pull in the required dependencies to use the configured features.
-
-### Running an Example Model
-
-The example `executor_runner` binary can be used to run a model and sanity-check the build. Run the following commands to generate and run a simple model.
-You should see the message "Model executed successfully" followed by the output values.
+CMake options can be used to for fine-grained control of build type, control which features are built, and configure functionality, such as logging. Options are typically specified during CMake configuration. Default values of each option are set by the active preset, but can be overridden by specifying the option when configuring.
 
-``` bash
-python -m examples.portable.scripts.export --model_name="add"
-./cmake-out/executor_runner --model_path add.pte
-```
+Note that many build options require other options to be enabled. This may require enabling multiple options to enable a given feature. The CMake build output will provide an error message when a required option is not enabled.
 
-```
-I 00:00:00.000526 executorch:executor_runner.cpp:82] Model file add.pte is loaded.
-I 00:00:00.000595 executorch:executor_runner.cpp:91] Using method forward
-I 00:00:00.000612 executorch:executor_runner.cpp:138] Setting up planned buffer 0, size 48.
-I 00:00:00.000669 executorch:executor_runner.cpp:161] Method loaded.
-I 00:00:00.000685 executorch:executor_runner.cpp:171] Inputs prepared.
-I 00:00:00.000764 executorch:executor_runner.cpp:180] Model executed successfully.
-I 00:00:00.000770 executorch:executor_runner.cpp:184] 1 outputs:
-Output 0: tensor(sizes=[1], [2.])
+User CMake:
+```cmake
+set(EXECUTORCH_BUILD_XNNPACK ON)
 ```
 
-
-### Compiler Cache (ccache)
-
-ExecuTorch automatically detects and enables [ccache](https://ccache.dev/) if it's installed. This significantly speeds up recompilation by caching previously compiled objects:
-
-- If ccache is detected, you'll see: `ccache found and enabled for faster builds`
-- If ccache is not installed, you'll see: `ccache not found, builds will not be cached`
-
-To install ccache:
+Standalone build:
 ```bash
-# Ubuntu/Debian
-sudo apt install ccache
-
-# macOS
-brew install ccache
-
-# CentOS/RHEL
-sudo yum install ccache
-# or
-sudo dnf install ccache
+cmake -DEXECUTORCH_BUILD_XNNPACK=ON
 ```
 
-No additional configuration is needed - the build system will automatically use ccache when available.
-
-See [CMakeLists.txt](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt)
-
-<hr/>
-
-## Build Options
-
-CMake options can be used to for fine-grained control of build type, control which features are built, and configure functionality, such as logging. Options are typically specified during CMake configuration. Default values of each option are set by the active preset, but can be overridden by specifying the option when configuring.
-
-Note that many build options require other options to be enabled. This may require enabling multiple options to enable a given feature. The CMake build output will provide an error message when a required option is not enabled.
-
-#### Build Type
+##### Build Type
 
 The CMake build is typically set to `Debug` or `Release`. For production use or profiling, release mode should be used to improve performance and reduce binary size. It disables program verification and executorch logging and adds optimizations flags. The `EXECUTORCH_OPTIMIZE_SIZE` flag can be used to further optimize for size with a small performance tradeoff.
 
@@ -242,7 +207,7 @@ The CMake build is typically set to `Debug` or `Release`. For production use or
 cmake .. -DCMAKE_BUILD_TYPE=Release
 ```
 
-#### Backends
+##### Backends
 
 Typically, each hardware backend exposes a CMake option to control whether the backend is built. See backend-specific documentation for more details.
 
@@ -262,7 +227,7 @@ Typically, each hardware backend exposes a CMake option to control whether the b
 cmake .. -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_VULKAN=ON
 ```
 
-#### Extensions
+##### Extensions
 
 ExecuTorch extensions provide optional functionality outside of the core runtime. As the core runtime is designed to run in constrained environments, these features are typically disabled by default. Extensions include higher-level APIs (Module and Tensor), multi-threading support (Threadpool), training, and more.
 
@@ -283,7 +248,7 @@ ExecuTorch extensions provide optional functionality outside of the core runtime
 cmake .. -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON
  ```
 
-#### Logging
+##### Logging
 
 Logging is enabled by default in debug builds and disabled in release. When enabled, the default log level is Info. Both log enable and level can be overriden with options. See [Logging](using-executorch-runtime-integration.md#logging). Disabling logging and decreasing log verbosity will reduce binary size by stripping unused strings from the build.
 
@@ -295,7 +260,39 @@ Logging is enabled by default in debug builds and disabled in release. When enab
 cmake .. -DEXECUTORCH_ENABLE_LOGGING=ON -DEXECUTORCH_LOG_LEVEL=debug
  ```
 
-#### Output Libraries
+### Building
+
+Build all targets with `cmake --build`.
+
+```bash
+# cd to the root of the executorch repo
+cd executorch
+
+# Build using the configuration that you previously generated under the
+# `cmake-out` directory.
+#
+# NOTE: The `-j` argument specifies how many jobs/processes to use when
+# building, and tends to speed up the build significantly. It's typical to use
+# "core count + 1" as the `-j` value.
+cmake --build cmake-out -j9
+```
+
+> **_TIP:_** For faster rebuilds, consider installing ccache (see [Compiler Cache section](#compiler-cache-ccache) above). On first builds, ccache populates its cache. Subsequent builds with the same compiler flags can be significantly faster.
+
+<hr/>
+
+
+## CMake Targets and Output Libraries
+
+To link against the ExecuTorch framework from CMake, the following top-level targets are exposed:
+
+ * `executorch::backends`: Contains all configured backends.
+ * `executorch::extensions`: Contains all configured extensions.
+ * `executorch::kernels`: Contains all configured kernel libraries.
+
+The backends, extensions, and kernels included in these targets are controlled by the various `EXECUTORCH_` CMake options specified by the build. Using these targets will automatically pull in the required dependencies to use the configured features.
+
+### Linking Without CMake
 
 To link against the runtime from outside of the CMake ecosystem, the runtime can be first built with CMake and then linked directly. A few of the relevant top-level targets are described below. Note that this is a more involved process than using CMake and is only recommended when using CMake is not viable.
 
@@ -314,6 +311,26 @@ To link against the runtime from outside of the CMake ecosystem, the runtime can
 
 Backends typically introduce additional targets. See backend-specific documentation for more details.
 
+### Verify the Build
+
+To verify the build, ExecuTorch optionally compiles a simple, stand-alone model runner to run PTE files with all-one input tensors. It is not enabled by default in most presets, but can be enabled by configuring with `-DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON -DEXECUTORCH_BUILD_EXTENSION_EVALUE_UTIL=ON`.
+
+Once compiled, invoke the runner with a sample PTE (such as the one generated by [verifying the Python build](#verify-the-build)).
+```bash
+cmake-out/executor_runner --model_path=mv2_xnnpack_fp32.pte
+```
+
+If the runner runs successfully, you should see output similar to the following:
+```
+I 00:00:00.043703 executorch:executor_runner.cpp:379] Model executed successfully 1 time(s) in 15.013292 ms.
+I 00:00:00.043720 executorch:executor_runner.cpp:383] 1 outputs:
+OutputX 0: tensor(sizes=[1, 1000], [
+  -0.509859, 0.300644, 0.0953884, 0.147724, 0.231202, 0.338554, 0.206888, -0.0575762, -0.389273, -0.0606864,
+  ...,
+  0.421219, 0.100447, -0.506771, -0.115824, -0.693017, -0.183262, 0.154781, -0.410684, 0.0119296, 0.449713,
+])
+```
+
 <hr/>
 
 ## Cross-Compiling for Android
@@ -327,8 +344,7 @@ Backends typically introduce additional targets. See backend-specific documentat
 
 ### Building the AAR
 
-With the NDK installed, the `build_android_library.sh` script will build the ExecuTorch Java AAR. This file contains the ExecuTorch Java bindings
-and native code. See [Using the AAR File](using-executorch-android.md#using-aar-file) for usage.
+With the NDK installed, the `build_android_library.sh` script will build the ExecuTorch Java AAR, which contains ExecuTorch Java bindings. See [Using the AAR File](using-executorch-android.md#using-aar-file) for usage.
 
 ```bash
 export ANDROID_ABIS=arm64-v8a
@@ -337,36 +353,21 @@ mkdir -p $BUILD_AAR_DIR
 sh scripts/build_android_library.sh
 ```
 
-### Building the Example Runner
+### Android Native
 
-The native executor runner can be cross-compiled for android and deployed via ADB. This step is intended as
-an example of CMake cross compilation and is not necessary for integration into an app.
+To use the ExecuTorch runtime from native Android C++ code, the runtime can be cross-compiled for Android. The recommended approach is to add ExecuTorch as a submodule of the user project and use [CMake](https://developer.android.com/ndk/guides/cmake) for the native build. The above steps for C++ with CMake can be followed.
 
+For direct cross-compilation, the ExecuTorch runtime can be configured to build with the NDK toolchain:
 ```bash
-# Run the following lines from the `executorch/` folder
-./install_executorch.sh --clean
-mkdir cmake-android-out && cd cmake-android-out
-
 # point -DCMAKE_TOOLCHAIN_FILE to the location where ndk is installed
-cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake  -DANDROID_ABI=arm64-v8a ..
-
-cd  ..
-cmake --build  cmake-android-out  -j9
-
-adb shell mkdir -p /data/local/tmp/executorch
-# push the binary to an Android device
-adb push  cmake-android-out/executor_runner  /data/local/tmp/executorch
-# push the model file
-adb push  add.pte  /data/local/tmp/executorch
-
-adb shell  "/data/local/tmp/executorch/executor_runner --model_path /data/local/tmp/executorch/add.pte"
+cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a ..
 ```
 
 <hr/>
 
 ## Cross-Compiling for iOS
 
-For iOS, we'll build [frameworks](https://developer.apple.com/documentation/xcode/creating-a-multi-platform-binary-framework-bundle) instead of static libraries. The frameworks contain the compiled ExecuTorch runtime and public headers.
+iOS binaries are built as [frameworks](https://developer.apple.com/documentation/xcode/creating-a-multi-platform-binary-framework-bundle) instead of static libraries. The frameworks contain the compiled ExecuTorch runtime and public headers.
 
 ### Pre-requisites
 
@@ -394,112 +395,29 @@ See backend-specific documentation for more details.
 2. Copy over the generated `.xcframework` bundles to your Xcode project, link them against
 your targets and don't forget to add an extra linker flag `-all_load`.
 
-Check out the [iOS Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo) tutorial for more info.
-
-<hr/>
-
-## Building on Windows
-
-ExecuTorch provides experimental support for native Windows builds.
-
-> **_NOTE:_**  All commands should be executed on Windows powershell in administrator mode.
-
-### Environment Setup
-
-#### Pre-requisites
+See the [iOS Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo) tutorial for example usage of the ExecuTorch frameworks.
 
-1. Install miniconda for Windows from the [official website](https://docs.conda.io/en/latest/miniconda.html).
-2. Install Git for Windows from the [official website](https://git-scm.com/download/win).
-3. Install ClangCL for Windows from the [official website](https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170) or through a [Visual Studio](https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170) or [Visual Studio Code](https://code.visualstudio.com/docs/cpp/config-clang-mac) installation.
+## Compiler Cache (ccache)
 
-#### Clone and Configure Environment
-
-```bash
-git config --global core.symlinks true
-git clone --recurse -submodules https://github.com/pytorch/executorch.git
-cd executorch
-conda create -yn et python=3.12
-conda activate et
-```
-
-If Conda is not available, run conda-hook.ps1, where `$miniconda_dir` is the directory where miniconda is installed.
-This is `“C:\Users\<username>\AppData\Local”` by default.
-
-```bash
-$miniconda_dir\\shell\\condabin\\conda-hook.ps1
-```
-
-### Build the Python Package
-
-Run `install_executorch.bat` to build and install the ExecuTorch Python package and runtime bindings.
-
-```bash
-cd executorch
-./install_executorch.bat
-```
-
-> **_NOTE_** Many components are not currently buildable on Windows. These instructions install a very minimal ExecuTorch which can be used as a sanity check.
+ExecuTorch automatically detects and enables [ccache](https://ccache.dev/) if it's installed. This significantly speeds up recompilation by caching previously compiled objects:
 
-### Build the C++ Runtime
+- If ccache is detected, you'll see: `ccache found and enabled for faster builds`
+- If ccache is not installed, you'll see: `ccache not found, builds will not be cached`
 
+To install ccache:
 ```bash
-del -Recurse -Force cmake-out; `
-cmake . `
-  -DCMAKE_INSTALL_PREFIX=cmake-out `
-  -DPYTHON_EXECUTABLE=$miniconda_dir\\envs\\et\\python.exe `
-  -DCMAKE_PREFIX_PATH=$miniconda_dir\\envs\\et\\Lib\\site-packages `
-  -DCMAKE_BUILD_TYPE=Release `
-  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON `
-  -DEXECUTORCH_BUILD_FLATC=ON `
-  -DEXECUTORCH_BUILD_PYBIND=OFF `
-  -DEXECUTORCH_BUILD_XNNPACK=ON `
-  -DEXECUTORCH_BUILD_KERNELS_LLM=ON `
-  -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON `
-  -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON `
-  -DEXECUTORCH_ENABLE_LOGGING=ON `
-  -T ClangCL `
-  -Bcmake-out; `
-cmake --build cmake-out -j64 --target install --config Release
-```
-
-> **_NOTE_** `$miniconda_dir` is the directory where you installed miniconda. This is `“C:\Users\<username>\AppData\Local”` by default.
-
-### Running an Example Model
-
-To validate the installation by running a model, create a file named export_mv2.py. Then, run the powershell commands to export and run the model.
-The expected output is a tensor of size 1x1000, containing class scores.
-
-```py
-# export_mv2.py
-import torch
-from executorch.exir import to_edge_transform_and_lower
-from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
-from torchvision.models import mobilenet_v2
-from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
-
-mv2 = mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
-example_inputs = (torch.randn((1, 3, 224, 224)),)
-
-program = to_edge_transform_and_lower(
-  torch.export.export(model, example_inputs)
-).to_executorch()
-
-with open("mv2_xnnpack.pte", "wb") as file:
-    executorch_program.write_to_file(file)
-```
+# Ubuntu/Debian
+sudo apt install ccache
 
-```bash
-python .\\export_mv2.py
-.\\cmake-out\\backends\\xnnpack\\Release\\xnn_executor_runner.exe --model_path=.\\mv2_xnnpack.pte
-```
+# macOS
+brew install ccache
 
-```bash
-Output 0: tensor(sizes=[1, 1000], [
-  -0.50986, 0.30064, 0.0953904, 0.147726, 0.231205, 0.338555, 0.206892, -0.0575775, … ])
+# CentOS/RHEL
+sudo yum install ccache
+# or
+sudo dnf install ccache
 ```
 
-## Next Steps
+No additional configuration is needed - the build system will automatically use ccache when available.
 
-* [Selective Build](kernel-library-selective-build.md) to link only kernels used by the program. This can provide significant binary size savings.
-* Tutorials on building [Android](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3/android/DeepLabV3Demo#executorch-android-demo-app) and [iOS](https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo) demo apps.
-* Tutorials on deploying applications to embedded devices such as [ARM Cortex-M/Ethos-U](backends-arm-ethos-u.md) and [XTensa HiFi DSP](backends-cadence.md).
+See [CMakeLists.txt](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt)