fix(build): work around MSVC std::mutex ABI mismatch on Windows wheels#279
Merged
XuehaiPan merged 3 commits intometaopt:mainfrom May 5, 2026
Merged
Conversation
std::mutex ABI mismatch on Windows wheels
ff43af3 to
866a308
Compare
2 tasks
3 tasks
866a308 to
e1e0628
Compare
e1e0628 to
c56f319
Compare
…GLIBCXX_USE_CXX11_ABI` style
Move the macro definition from a per-target `target_compile_definitions` in
`src/CMakeLists.txt` to a project-wide `add_definitions` in the root
`CMakeLists.txt`, parallel to `_GLIBCXX_USE_CXX11_ABI`. Add the macro to CI
workflow env, the cibuildwheel `environment-pass` list, and the Makefile
defaults so the same name flows through every input channel.
Also gain a documented disable channel: setting either
`_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR=` (empty env var) or
`cmake -D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR=` skips the `add_definitions`
call entirely, leaving the macro undefined so MSVC's `#ifdef` resolves to
false and the constexpr mutex layout is used. The dispatcher uses
`DEFINED ENV{X} OR DEFINED X` so both input channels behave symmetrically.
Useful for reproducing issue metaopt#278 once upstream pybind11 hardens
`internals_pp_manager::pp_set_mutex_`.
Switch the `add_definitions` form to bare `-D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR`
(no `=value`) since MSVC's check is `#ifdef`, not `#if`. The status message
similarly reports just `set` or `not set` rather than echoing a value that
has no effect on behavior.
c56f319 to
eb2cb43
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Define Microsoft's documented
_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTORmacro for the_Ctarget on MSVC builds so thatstd::mutex's constructor and_Mtx_lockbody bind to the same loadedmsvcp140.dllregardless of whether another wheel preloaded an older copy of the MSVC C++ runtime.This is a build-system-only change scoped to the
_Ctarget on MSVC; no source changes, and no effect on non-Windows wheels or on Windows toolchains other than MSVC (e.g.,clang-cl + libc++,mingw-w64).Motivation and Context
Fixes #278.
The user-reported crash is a null-vptr dereference at
msvcp140!mtx_do_lock+0x74, fired duringoptree._C'sPyInit__Cwhen imported afterpyarrow. Full forensic trail (cdb dump, import-table inspection, address-range arithmetic) is in #278; the short version:std::mutexhas a constexpr zero-init layout and the constructor emits no runtime init call. The wheel's import table reflects this: it imports only_Mtx_lockand_Mtx_unlockfrommsvcp140.dll, not_Mtx_init_in_situ.pyarrowships its ownmsvcp140.dll14.28.29334.0 (Nov 2020) under the unmangled name. Whenpyarrowis imported first, the Windows loader caches that copy under the namemsvcp140.dll. optree's_C.pyd, loaded later, binds itsmsvcp140.dll!_Mtx_lockIAT entry to pyarrow's old copy (confirmed: crash address0x7ff8df0c2eb0falls inside pyarrow'smsvcp140.dllimage range, and it's the only copy in the dump with PDB symbols loaded)._Mtx_lockinterprets the new mutex layout as the old ConcRT-backed object, virtual-dispatches through what it expects to be a critical-section vptr atmtx + 8, finds null, and dies.internals_pp_manager::create_pp_content_once()(lockingpp_set_mutex_), called frompybind11::detail::ensure_internals()at the top ofPYBIND11_MODULE_PYINIT— before any optree code inBuildModuleruns. So the crash cannot be avoided by deferring optree-side initialization._DISABLE_CONSTEXPR_MUTEX_CONSTRUCTORis Microsoft's documented escape hatch for newer-toolset / older-redistributable mismatches of exactly this shape (MS STL changelog). With it set,std::mutex's constructor reverts to a non-constexpr body that calls_Mtx_init_in_situat runtime, so both the constructor and_Mtx_locknecessarily bind to the same loadedmsvcp140.dll— the layout written by the constructor matches the layout read by the lock.Verification
Build the Windows wheel with this change and confirm the import table now includes
_Mtx_init_in_situ:Expected:
_Mtx_init_in_situappears alongside_Mtx_lockand_Mtx_unlock. (_Mtx_destroy_in_situwill not appear:_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTORonly reverts the constructor; MSVC STL keeps~mutex()as= defaultregardless. That's fine here becausepp_set_mutex_lives in a Meyers singleton and is never destructed in practice.) If only_Mtx_lock/_Mtx_unlockappear, the macro did not take effect.End-to-end repro (run on the reporter's environment, or any Windows env with
pyarrow+torch):python -X dev -X faulthandler -c "import pyarrow, torch; import optree"Expected before this PR: access violation. Expected after: clean import.
Tradeoffs
std::mutexctor now does a cross-DLL_Mtx_init_in_situcall instead of being a constexpr no-op. optree constructs a handful of mutexes total over a process's lifetime; the cost is unmeasurable.constinit std::mutexpatterns inside the_Ctarget. None of optree's existingPYBIND11_CONSTINITuses depend on this — they go through pybind11'sgil_safe_call_once_and_storeandstd::once_flag, neither of which is aconstinit std::mutex.Removal Criterion
Documented in the inline CMake comment: remove once optree no longer ships wheels that may be loaded alongside an
msvcp140.dllolder than the toolset used to build the wheel.Types of changes
Checklist
make format. (required)make lint. (required)make testpass. (required)