Skip to content

Conversation

pytorchbot
Copy link
Collaborator

Summary

We are seeing pthreadpool-related crashes on Mac when running with pybindings. This appears to be due to XNNPACK using the Google fork of pthreadpool and extension/threadpool using the pthreadpool in libtorch_cpu. See #14321 for more details.

Beyond the obvious one definition rule issues, the specific failure happens because the pthreadpool functions in the copy of pthreadpool built with ET are marked as weak on Apple platforms. The functions are not marked as weak in source code or in the build, and the behavior appears to be specific to Apple's toolchain.

Weak symbols are compiled as indirect calls and can be overridden at runtime by strong symbols in another dylib. For reasons that I don't fully understand, the pthreadpool symbols in libtorch_cpu are strong. Also, the calls in XNNPACK prefer the symbols from the local pthreadpool

This PR works around the issue by building pthreadpool with -fvisibility=hidden, which causes the symbols to not be exposed in the final dylib, and thus not end up in the symbol table as an indirect symbol. Instead, the call to pthreadpool_create in extension_threadpool is compiled as a direct call to the pthreadpool_create in the pthreadpool built by executorch.

This isn't a proper fix for the issue, as there are still two pthreadpool implementations in the process whenever we link libtorch_cpu. However, it does appear to mitigate the symptoms and thus prevent crashes. Long-term, we'll need to find a proper solution, such as namespacing the pthreadpool fork.

Test plan

In addition to validating this change on CI (including trunk CI), I manually verified the fix by testing the repro in #14321 before and after the change. I verified that ASan does not trip upon resetting the threadpool. I also verified with nm and otool that pthreadpool_create does not show up in the indirect symbol table, and thus cannot (to my knowledge) be overridden at runtime by the implementation in libtorch_cpu.

### Summary
We are seeing pthreadpool-related crashes on Mac when running with
pybindings. This appears to be due to XNNPACK using the Google fork of
pthreadpool and extension/threadpool using the pthreadpool in
libtorch_cpu. See #14321 for
more details.

Beyond the obvious one definition rule issues, the specific failure
happens because the pthreadpool functions in the copy of pthreadpool
built with ET are marked as weak on Apple platforms. The functions are
not marked as weak in source code or in the build, and the behavior
appears to be specific to Apple's toolchain.

Weak symbols are compiled as indirect calls and can be overridden at
runtime by strong symbols in another dylib. For reasons that I don't
fully understand, the pthreadpool symbols in libtorch_cpu are strong.
Also, the calls in XNNPACK prefer the symbols from the local pthreadpool

This PR works around the issue by building pthreadpool with
-fvisibility=hidden, which causes the symbols to not be exposed in the
final dylib, and thus not end up in the symbol table as an indirect
symbol. Instead, the call to pthreadpool_create in extension_threadpool
is compiled as a direct call to the pthreadpool_create in the
pthreadpool built by executorch.

This isn't a proper fix for the issue, as there are still two
pthreadpool implementations in the process whenever we link
libtorch_cpu. However, it does appear to mitigate the symptoms and thus
prevent crashes. Long-term, we'll need to find a proper solution, such
as namespacing the pthreadpool fork.

### Test plan
In addition to validating this change on CI (including trunk CI), I
manually verified the fix by testing the repro in
#14321 before and after the
change. I verified that ASan does not trip upon resetting the
threadpool. I also verified with `nm` and `otool` that
`pthreadpool_create` does not show up in the indirect symbol table, and
thus cannot (to my knowledge) be overridden at runtime by the
implementation in libtorch_cpu.

(cherry picked from commit 1da530d)
Copy link

pytorch-bot bot commented Oct 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14951

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 4 Pending

As of commit 13eae38 with merge base e0dda90 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025
@GregoryComer GregoryComer merged commit ac5de26 into release/1.0 Oct 9, 2025
202 of 214 checks passed
@GregoryComer GregoryComer deleted the cherry-pick-14838-by-pytorch_bot_bot_ branch October 9, 2025 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants