Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .claude/skills/binary-size/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
name: binary-size
description: Analyze and reduce ExecuTorch binary size. Use when investigating binary size, running size tests, or optimizing the runtime for size-constrained deployments.
---

# Binary Size

## Start from the `main` branch of executorch
Ask the user where the executorch repo is.

```bash
git checkout main && git pull
```

## Build and measure baseline
```bash
conda activate executorch
bash test/build_size_test.sh
strip -o /tmp/size_test_stripped cmake-out/test/size_test
strip -o /tmp/size_test_all_ops_stripped cmake-out/test/size_test_all_ops
ls -la /tmp/size_test_stripped /tmp/size_test_all_ops_stripped
```

Produces two binaries:
- `cmake-out/test/size_test` — ExecuTorch runtime without operator implementations
- `cmake-out/test/size_test_all_ops` — ExecuTorch runtime with portable ops

## Analyze with bloaty
```bash
bloaty cmake-out/test/size_test -d symbols -n 30 # by symbol
bloaty cmake-out/test/size_test -d sections # by ELF section
bloaty <after> -- <before> # diff two builds
nm -S <binary> | sort -k2 -rn | head -30 # symbol sizes
strings <binary> | less # string literals in .rodata
```

Note: `bloaty -d compileunits` requires debug info (`-g`). The Release build does not include it.

## Key build flags
Set by `test/build_size_test.sh`:
- `CMAKE_BUILD_TYPE=Release`
- `EXECUTORCH_OPTIMIZE_SIZE=ON` — enables `-Os`, `-fno-exceptions`, `-fno-rtti`, unwind table suppression
- `CXXFLAGS="-fno-exceptions -fno-rtti -Wall -Werror"`
Comment on lines +42 to +43
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “Key build flags” list is incomplete vs test/build_size_test.sh (it also sets -Wno-int-in-bool-context and -DET_HAVE_PREAD=0 via COMMON_CXXFLAGS). Consider updating this section to reflect the full flags actually used, and clarify which flags come from EXECUTORCH_OPTIMIZE_SIZE vs explicit CXXFLAGS.

Suggested change
- `EXECUTORCH_OPTIMIZE_SIZE=ON` — enables `-Os`, `-fno-exceptions`, `-fno-rtti`, unwind table suppression
- `CXXFLAGS="-fno-exceptions -fno-rtti -Wall -Werror"`
- `EXECUTORCH_OPTIMIZE_SIZE=ON`
- Typically enables: `-Os`, `-fno-exceptions`, `-fno-rtti`, and unwind table suppression
- Explicit compiler flags (`CXXFLAGS`/`COMMON_CXXFLAGS`):
- `-fno-exceptions -fno-rtti -Wall -Werror`
- `-Wno-int-in-bool-context`
- `-DET_HAVE_PREAD=0`

Copilot uses AI. Check for mistakes.

## Constraints
- Use **CMake** to build (not Buck)
- **C++17 minimum** language standard
- Must build on **GCC 9** (CI uses `executorch-ubuntu-22.04-gcc9-nopytorch`) and **Clang 12** — avoid compiler-specific flags or pragmas without version guards
- Do not regress existing functionality — run tests for modified files
- Do not change build flags in `build_size_test.sh` for size reductions
- Do not increase latency in the core runtime

## Where to look for size reductions
- `.text`: look for large functions, template bloat, duplicate instantiations
- `.rodata`: verbose error messages, format strings, embedded file paths (`__FILE__`)
- `.eh_frame`: should already be suppressed when `EXECUTORCH_OPTIMIZE_SIZE=ON`
- Static init functions (`nm -S <binary> | grep GLOBAL__sub_I`): use `constexpr` constructors to constant-initialize static arrays
- Logging strings: `ET_LOG_ENABLED=0` in Release eliminates format strings; ensure it propagates to consumers via `PUBLIC` compile definitions on cmake targets
- Inline header functions: watch for compile-define mismatches between library and consumer TUs (e.g. `ET_LOG_ENABLED` set in library but not in consumer)

## For each change
1. Create a branch: `git checkout -b binary-size-<N>`
2. Implement, rebuild, measure stripped sizes
3. Create a separate PR — one logical change per PR
4. Record results in `binary-size-<N>.md`:

| Binary | This change (N vs N-1) | Cumulative (N vs main) |
|---|---|---|
| `size_test` (stripped) | -X | -Y |
| `size_test_all_ops` (stripped) | -X | -Y |

5. Update the CI size threshold in `.github/workflows/pull.yml` if sizes decrease
9 changes: 4 additions & 5 deletions .github/workflows/pull.yml
Original file line number Diff line number Diff line change
Expand Up @@ -475,10 +475,8 @@ jobs:
output=$(ls -la cmake-out/test/size_test)
arr=($output)
size=${arr[4]}
Comment on lines 475 to 477
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow parses file size from ls -la output (arr=($output); size=${arr[4]}), which is brittle (format can vary with locale/ACL markers) and can misread the size. Prefer using stat (e.g., bytes via stat -c%s <file>) to make this check robust.

Copilot uses AI. Check for mistakes.
# threshold=48120 on devserver with gcc9
# todo(lfq): update once binary size is below 50kb.
# Note: using gcc9-nopytorch container with pinned nightly PyTorch
threshold="63785"
# Current CI size: 48008 (gcc9-nopytorch, 2026-03-06)
threshold="48500"
Comment on lines +478 to +479
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new gcc9 threshold (48,500) is very close to the observed size (48,008). This small buffer risks CI flakiness when toolchain/image deps change slightly; consider leaving a larger margin (or documenting why ~500 bytes is sufficient).

Suggested change
# Current CI size: 48008 (gcc9-nopytorch, 2026-03-06)
threshold="48500"
# Current CI size: 48008 (gcc9-nopytorch, 2026-03-06); leave ~2KB headroom to avoid CI flakiness
threshold="50000"

Copilot uses AI. Check for mistakes.
if [[ "$size" -le "$threshold" ]]; then
echo "Success $size <= $threshold"
else
Expand Down Expand Up @@ -513,7 +511,8 @@ jobs:
output=$(ls -la cmake-out/test/size_test)
arr=($output)
size=${arr[4]}
threshold="51752"
# Current CI size: 44160 (clang12, 2026-03-06)
threshold="45000"
if [[ "$size" -le "$threshold" ]]; then
echo "Success $size <= $threshold"
else
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- `/building` - Build runners or C++ libs
- `/profile` - Profile execution
- `/cortex-m` - Build, test, or develop the Cortex-M backend
- `/binary-size` - Analyze and reduce binary size

Reference docs in `.claude/`: backends, runtime-api, quantization, llm-export, faq, tokenizers

Expand Down
Loading