Cache flatc binary and schema extraction to fix 3x fbpkg export slowdown by navsud · Pull Request #19104 · pytorch/executorch

navsud · 2026-04-24T03:30:24Z

Summary:
When running from a standalone PAR file (e.g. via fbpkg), the flatc binary
used for XNNPACK flatbuffer serialization is extracted from the PAR zip archive
on every invocation via importlib.resources.as_file(). For one of llama
transformer XNNPack exports, this happened ~225 times (once per XNNPACK
partition), adding ~8.5 seconds per extraction from the 3.4 GB PAR archive
— a total of ~32 minutes of pure I/O overhead.

Changes

`executorch/exir/_serialize/_flatbuffer.py`

Added _get_flatc_path() which caches the extracted flatc binary path
using a module-level contextlib.ExitStack. The ExitStack keeps the
importlib.resources.as_file() context manager alive for the process
lifetime, preventing the temp file from being cleaned up between calls.
Simplified _run_flatc() to use the cached path directly.

`executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py`

Added _cached_schema_bytes module-level cache for schema.fbs content,
avoiding repeated importlib.resources.read_bytes() calls from the PAR
archive.

Differential Revision: D102214303

pytorch-bot · 2026-04-24T03:30:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19104

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

✅ You can merge normally! (3 Unrelated Failures)

As of commit ad040eb with merge base c3f3d12 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-coreml-bc-macos (macos-m2-stable) / macos-job (gh) (matched macos rule in flaky-rules.json)
File doesn't exist

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-24T03:30:32Z

@navsud has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102214303.

Summary: When running from a standalone PAR file (e.g. via fbpkg), the `flatc` binary used for XNNPACK flatbuffer serialization is extracted from the PAR zip archive on every invocation via `importlib.resources.as_file()`. For one of llama transformer XNNPack exports, this happened ~225 times (once per XNNPACK partition), adding ~8.5 seconds per extraction from the 3.4 GB PAR archive — a total of **~32 minutes** of pure I/O overhead. ## Changes ### `executorch/exir/_serialize/_flatbuffer.py` - Added `_get_flatc_path()` which caches the extracted `flatc` binary path using a module-level `contextlib.ExitStack`. The `ExitStack` keeps the `importlib.resources.as_file()` context manager alive for the process lifetime, preventing the temp file from being cleaned up between calls. - Simplified `_run_flatc()` to use the cached path directly. ### `executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py` - Added `_cached_schema_bytes` module-level cache for `schema.fbs` content, avoiding repeated `importlib.resources.read_bytes()` calls from the PAR archive. Differential Revision: D102214303

mergennachin · 2026-04-24T19:54:37Z

@claude Review this PR

claude · 2026-04-24T19:55:19Z

Claude finished @mergennachin's task in 1m 17s —— View job

Review of PR #19104: Cache flatc binary and schema extraction

Read changed files and understand the diff
Analyze caching approach for correctness and safety
Check for edge cases and potential issues
Post review feedback

Summary

This PR addresses a real and significant performance problem: extracting the flatc binary from a 3.4 GB PAR archive ~225 times adds ~32 minutes of I/O overhead. The caching approach is straightforward and the improvement is substantial. Two caches are introduced:

_flatbuffer.py: Caches the extracted flatc binary path using a module-level ExitStack to keep the temp file alive.
xnnpack_graph_serialize.py: Caches the schema.fbs bytes in a module-level variable.

Issues

1. Thread safety — `_get_flatc_path()` has a race condition

_get_flatc_path() (exir/_serialize/_flatbuffer.py:251-277) checks _flatc_cached_path is not None without any locking. If two threads call this concurrently before the cache is populated, both will create an ExitStack and extract the binary. The second thread will overwrite _flatc_exit_stack, orphaning the first ExitStack (which means its temp file may be cleaned up while the first thread's _flatc_cached_path still points to it).

This may not be a problem today if export is single-threaded, but it's worth noting. If thread safety isn't a concern for this code path, a brief comment saying so would be helpful.

Fix this →

2. Same race in `_cached_schema_bytes`

xnnpack_graph_serialize.py:322-327 — same pattern: two threads could both see _cached_schema_bytes is None and both call read_bytes(). This one is benign (both will read the same bytes, no resource leak), but it's the same class of issue.

3. `_flatc_exit_stack` is never closed

The ExitStack at _flatbuffer.py:248 keeps the temp file alive for the process lifetime, which is the intended behavior. However, the temp file is never explicitly cleaned up — it relies on OS-level cleanup at process exit. This is fine for normal use, but consider registering an atexit handler to close the ExitStack for cleanliness. Not blocking, just a suggestion.

Nits

_flatbuffer.py:286: [_get_flatc_path()] + list(args) — the list(args) conversion is only needed because args is typed as Sequence[str]. This is fine, just noting it's pre-existing.
xnnpack_graph_serialize.py:303: _cached_schema_bytes: Optional[bytes] = None is separated from the related _delegate_instance_id by a blank line, which is good for readability. The variable placement at module scope is clean.

Looks Good

The ExitStack approach for keeping the as_file() context alive is the right pattern — it's the documented way to extend the lifetime of extracted resources.
The schema bytes cache in XNNPACK is simple and effective — read_bytes() is idempotent so even the theoretical race is harmless.
The _run_flatc simplification is a nice cleanup — separating path resolution from command execution.
The performance improvement (eliminating ~32 min of overhead) is significant.

Overall this is a clean, well-motivated change. The thread safety point is the main thing to consider — whether it matters depends on whether this code path can be hit concurrently.

navsud requested review from JacobSzwejbka, digantdesai and larryliu0820 as code owners April 24, 2026 03:30

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 24, 2026

navsud added the release notes: none Do not include this in the release notes label Apr 24, 2026

navsud force-pushed the export-D102214303 branch from fc68795 to ad040eb Compare April 24, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache flatc binary and schema extraction to fix 3x fbpkg export slowdown#19104

Cache flatc binary and schema extraction to fix 3x fbpkg export slowdown#19104
navsud wants to merge 1 commit intopytorch:mainfrom
navsud:export-D102214303

navsud commented Apr 24, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Apr 24, 2026

Uh oh!

mergennachin commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

navsud commented Apr 24, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

executorch/exir/_serialize/_flatbuffer.py

executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19104

❗ 1 Active SEVs

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

meta-codesync Bot commented Apr 24, 2026

Uh oh!

mergennachin commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #19104: Cache flatc binary and schema extraction

Summary

Issues

1. Thread safety — _get_flatc_path() has a race condition

2. Same race in _cached_schema_bytes

3. _flatc_exit_stack is never closed

Nits

Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

navsud commented Apr 24, 2026 •

edited by meta-codesync Bot

Loading

`executorch/exir/_serialize/_flatbuffer.py`

`executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py`

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading

1. Thread safety — `_get_flatc_path()` has a race condition

2. Same race in `_cached_schema_bytes`

3. `_flatc_exit_stack` is never closed