From 14f3ec6d8f40cd5c8e9dcde7b620bb02014dcc30 Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Fri, 6 Mar 2026 16:50:50 -0800 Subject: [PATCH 1/2] binary size skill --- .claude/skills/binary-size/SKILL.md | 59 +++++++++++++++++++++++++++++ CLAUDE.md | 1 + 2 files changed, 60 insertions(+) create mode 100644 .claude/skills/binary-size/SKILL.md diff --git a/.claude/skills/binary-size/SKILL.md b/.claude/skills/binary-size/SKILL.md new file mode 100644 index 00000000000..1ce84d6eb63 --- /dev/null +++ b/.claude/skills/binary-size/SKILL.md @@ -0,0 +1,59 @@ +--- +name: binary-size +description: Analyze and reduce ExecuTorch binary size. Use when investigating binary size, running size tests, or optimizing the runtime for size-constrained deployments. +--- + +# Binary Size + +## Build and measure +```bash +conda activate executorch +bash test/build_size_test.sh +strip -o /tmp/size_test_stripped cmake-out/test/size_test +strip -o /tmp/size_test_all_ops_stripped cmake-out/test/size_test_all_ops +ls -la /tmp/size_test_stripped /tmp/size_test_all_ops_stripped +``` + +Produces two binaries: +- `cmake-out/test/size_test` — ExecuTorch runtime without operator implementations +- `cmake-out/test/size_test_all_ops` — ExecuTorch runtime with portable ops + +## Analyze with bloaty +```bash +bloaty cmake-out/test/size_test -d compileunits # by source file +bloaty cmake-out/test/size_test -d symbols -n 30 # by symbol +bloaty cmake-out/test/size_test -d sections # by ELF section +bloaty -- # diff two builds +``` + +Also useful: `nm -S | sort -k2 -rn | head -30` for symbol sizes. + +## Key build flags +Set by `test/build_size_test.sh`: +- `CMAKE_BUILD_TYPE=Release` +- `EXECUTORCH_OPTIMIZE_SIZE=ON` — enables `-Os`, `-fno-exceptions`, `-fno-rtti`, unwind table suppression +- `CXXFLAGS="-fno-exceptions -fno-rtti -Wall -Werror"` + +## Constraints +- Use **CMake** to build (not Buck) +- **C++17 minimum** language standard +- Must build on **GCC 9** (CI uses `executorch-ubuntu-22.04-gcc9-nopytorch`) and **Clang 12** — avoid compiler-specific flags or pragmas without version guards +- Do not regress existing functionality — run tests for modified files +- Do not change build flags in `build_size_test.sh` for size reductions +- Do not increase latency in the core runtime + +## Where to look for size reductions +- `.text`: `bloaty -d symbols` — look for large functions, template bloat, duplicate code +- `.rodata`: `strings ` — look for verbose error messages, format strings, file paths +- `.eh_frame`: should be suppressed when `EXECUTORCH_OPTIMIZE_SIZE=ON` +- Static init functions: `nm -S | grep GLOBAL__sub_I` — constexpr constructors can eliminate these +- Logging: `ET_LOG_ENABLED=0` in Release builds eliminates format strings; ensure it propagates to consumers via `PUBLIC` compile definitions +- Inline header functions: watch for define mismatches between library and consumer TUs + +## Document each change +Create `binary-size.md` with: + +| Binary | This change (N vs N-1) | Cumulative (N vs main) | +|---|---|---| +| `size_test` (stripped) | -X | -Y | +| `size_test_all_ops` (stripped) | -X | -Y | diff --git a/CLAUDE.md b/CLAUDE.md index a4b7aad0252..8cb29af5d4d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,6 +6,7 @@ - `/building` - Build runners or C++ libs - `/profile` - Profile execution - `/cortex-m` - Build, test, or develop the Cortex-M backend +- `/binary-size` - Analyze and reduce binary size Reference docs in `.claude/`: backends, runtime-api, quantization, llm-export, faq, tokenizers From 997ff1695880bba5f51b6a0fb60f15f2d384913d Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Fri, 6 Mar 2026 17:06:48 -0800 Subject: [PATCH 2/2] add binary size skill --- .claude/skills/binary-size/SKILL.md | 35 ++++++++++++++++++++--------- .github/workflows/pull.yml | 9 ++++---- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/.claude/skills/binary-size/SKILL.md b/.claude/skills/binary-size/SKILL.md index 1ce84d6eb63..bbc9c03d668 100644 --- a/.claude/skills/binary-size/SKILL.md +++ b/.claude/skills/binary-size/SKILL.md @@ -5,7 +5,14 @@ description: Analyze and reduce ExecuTorch binary size. Use when investigating b # Binary Size -## Build and measure +## Start from the `main` branch of executorch +Ask the user where the executorch repo is. + +```bash +git checkout main && git pull +``` + +## Build and measure baseline ```bash conda activate executorch bash test/build_size_test.sh @@ -20,13 +27,14 @@ Produces two binaries: ## Analyze with bloaty ```bash -bloaty cmake-out/test/size_test -d compileunits # by source file bloaty cmake-out/test/size_test -d symbols -n 30 # by symbol bloaty cmake-out/test/size_test -d sections # by ELF section bloaty -- # diff two builds +nm -S | sort -k2 -rn | head -30 # symbol sizes +strings | less # string literals in .rodata ``` -Also useful: `nm -S | sort -k2 -rn | head -30` for symbol sizes. +Note: `bloaty -d compileunits` requires debug info (`-g`). The Release build does not include it. ## Key build flags Set by `test/build_size_test.sh`: @@ -43,17 +51,22 @@ Set by `test/build_size_test.sh`: - Do not increase latency in the core runtime ## Where to look for size reductions -- `.text`: `bloaty -d symbols` — look for large functions, template bloat, duplicate code -- `.rodata`: `strings ` — look for verbose error messages, format strings, file paths -- `.eh_frame`: should be suppressed when `EXECUTORCH_OPTIMIZE_SIZE=ON` -- Static init functions: `nm -S | grep GLOBAL__sub_I` — constexpr constructors can eliminate these -- Logging: `ET_LOG_ENABLED=0` in Release builds eliminates format strings; ensure it propagates to consumers via `PUBLIC` compile definitions -- Inline header functions: watch for define mismatches between library and consumer TUs +- `.text`: look for large functions, template bloat, duplicate instantiations +- `.rodata`: verbose error messages, format strings, embedded file paths (`__FILE__`) +- `.eh_frame`: should already be suppressed when `EXECUTORCH_OPTIMIZE_SIZE=ON` +- Static init functions (`nm -S | grep GLOBAL__sub_I`): use `constexpr` constructors to constant-initialize static arrays +- Logging strings: `ET_LOG_ENABLED=0` in Release eliminates format strings; ensure it propagates to consumers via `PUBLIC` compile definitions on cmake targets +- Inline header functions: watch for compile-define mismatches between library and consumer TUs (e.g. `ET_LOG_ENABLED` set in library but not in consumer) -## Document each change -Create `binary-size.md` with: +## For each change +1. Create a branch: `git checkout -b binary-size-` +2. Implement, rebuild, measure stripped sizes +3. Create a separate PR — one logical change per PR +4. Record results in `binary-size-.md`: | Binary | This change (N vs N-1) | Cumulative (N vs main) | |---|---|---| | `size_test` (stripped) | -X | -Y | | `size_test_all_ops` (stripped) | -X | -Y | + +5. Update the CI size threshold in `.github/workflows/pull.yml` if sizes decrease diff --git a/.github/workflows/pull.yml b/.github/workflows/pull.yml index 045659bc779..d88996ff8cb 100644 --- a/.github/workflows/pull.yml +++ b/.github/workflows/pull.yml @@ -475,10 +475,8 @@ jobs: output=$(ls -la cmake-out/test/size_test) arr=($output) size=${arr[4]} - # threshold=48120 on devserver with gcc9 - # todo(lfq): update once binary size is below 50kb. - # Note: using gcc9-nopytorch container with pinned nightly PyTorch - threshold="63785" + # Current CI size: 48008 (gcc9-nopytorch, 2026-03-06) + threshold="48500" if [[ "$size" -le "$threshold" ]]; then echo "Success $size <= $threshold" else @@ -513,7 +511,8 @@ jobs: output=$(ls -la cmake-out/test/size_test) arr=($output) size=${arr[4]} - threshold="51752" + # Current CI size: 44160 (clang12, 2026-03-06) + threshold="45000" if [[ "$size" -le "$threshold" ]]; then echo "Success $size <= $threshold" else