Fix: enable EBO for simple_ilist on MSVC via LLVM_DECLARE_EMPTY_BASES#2012
Fix: enable EBO for simple_ilist on MSVC via LLVM_DECLARE_EMPTY_BASES#2012YevheniiKotyrlo wants to merge 3 commits intofacebook:static_hfrom
Conversation
…t via build-time patch file Mirror the same change being upstreamed at facebook/hermes#2012: `__declspec(empty_bases)` on `simple_ilist` so MSVC actually applies Empty Base Optimization across the class's two empty bases. Without the attribute, MSVC pads each empty base with one byte and pointer-aligns the result, shifting the embedded `Sentinel` member from offset 0 to offset 8. The Rust FFI in `crates/hermes/src/parser/node.rs` is one of the callers that depends on the coincident layout: it wraps a `*const NodeList` (`= *const simple_ilist<Node>`) and iterates by reading `next` pointers, comparing each one against the head pointer it was given. On Linux/Clang and macOS the comparison succeeds — `Sentinel` is at offset 0, so `&list == &list.Sentinel` — and iteration terminates. On Windows MSVC the C++ side stores `&list.Sentinel` (offset 8) in the linked-list pointers while Rust received `&list` (offset 0); the comparison never matches, the iterator walks past the end of the list, dereferences the sentinel as if it were a real `Node`, and trips a `STATUS_ACCESS_VIOLATION 0xC0000005` on the first field load. Reproducible in `cargo test --release -p fft --lib hparser::tests::test1` on Windows (35-character JS suffices because it forces the FFI to iterate the function's params `simple_ilist`). Approach: a unified-diff patch file lives under `patches/` and the two cmake-driven build scripts apply it to the bundled Hermes submodule via `git apply` before configuring. The hook is: - **Target-gated.** Skipped entirely when `is_msvc_target()` is false. GCC/Clang already collapse the empty bases without the attribute, so Linux and macOS builds are unaffected. - **Idempotent.** Returns early when `__declspec(empty_bases) simple_ilist` is already present in the file — true after the first run in a single build, and also true once the Hermes submodule is bumped past facebook/hermes#2012 so the patch becomes unnecessary and can be removed at the maintainer's leisure. - **Loud on conflict.** If `git apply` fails (Hermes refactored `simple_ilist.h` and the patch's context lines no longer match), the build script panics with a message that names the patch file, surfaces git's stderr, and points at the two resolution paths: drop the patch when upstream lands, or regenerate it. Silently skipping would produce an unpatched binary that AVs at runtime — worse than a build error. Validated end-to-end on Windows (host) and on Linux WSL Ubuntu 22.04 with Rust 1.88.0: - `cargo test --release -p fft --lib hparser::tests::test1` passes on Windows (was `STATUS_ACCESS_VIOLATION` before). - `cargo test --workspace --release` clean on Windows. - Full workspace test clean on Linux (the patcher is gated and correctly skipped — verified by greping the submodule file after the build). - `cargo fmt --all --check` clean. - Re-running the build after the patch already landed exits the hook in <1ms (idempotency check). - Simulated Hermes-bump-with-conflict (inserted a comment ahead of the `simple_ilist` template) panics with the documented message and exit code 101, never producing a binary. Closes-by: should make jbroma#17's `cvt_smloc` graceful fallback unnecessary once this lands; the original `assert!(loc.is_valid(), ...)` is correct because Hermes does not produce null `SMLoc`s in this code path. Resolves the runtime AV reported in jbroma#16.
…t via build-time patch file Mirror the same change being upstreamed at facebook/hermes#2012: `__declspec(empty_bases)` on `simple_ilist` so MSVC actually applies Empty Base Optimization across the class's two empty bases. Without the attribute, MSVC pads each empty base with one byte and pointer-aligns the result, shifting the embedded `Sentinel` member from offset 0 to offset 8. The Rust FFI in `crates/hermes/src/parser/node.rs` is one of the callers that depends on the coincident layout: it wraps a `*const NodeList` (`= *const simple_ilist<Node>`) and iterates by reading `next` pointers, comparing each one against the head pointer it was given. On Linux/Clang and macOS the comparison succeeds — `Sentinel` is at offset 0, so `&list == &list.Sentinel` — and iteration terminates. On Windows MSVC the C++ side stores `&list.Sentinel` (offset 8) in the linked-list pointers while Rust received `&list` (offset 0); the comparison never matches, the iterator walks past the end of the list, dereferences the sentinel as if it were a real `Node`, and trips a `STATUS_ACCESS_VIOLATION 0xC0000005` on the first field load. Reproducible in `cargo test --release -p fft --lib hparser::tests::test1` on Windows (35-character JS suffices because it forces the FFI to iterate the function's params `simple_ilist`). Approach: a unified-diff patch file lives under `patches/` and the two cmake-driven build scripts apply it to the bundled Hermes submodule via `git apply` before configuring. The hook is: - **Target-gated.** Skipped entirely when `is_msvc_target()` is false. GCC/Clang already collapse the empty bases without the attribute, so Linux and macOS builds are unaffected. - **Idempotent.** Returns early when `__declspec(empty_bases) simple_ilist` is already present in the file — true after the first run in a single build, and also true once the Hermes submodule is bumped past facebook/hermes#2012 so the patch becomes unnecessary and can be removed at the maintainer's leisure. - **Race-safe.** Cargo runs `crates/hermes/build.rs` and `crates/fft_support/build.rs` in parallel, so both can pass the idempotency check while a sibling is still mid-`git apply`. If our own `git apply` then fails, we re-read the file: if the attribute is now present, we skip silently (sibling won the race); only a real conflict still panics. - **Loud on conflict.** If `git apply` fails for a non-race reason (Hermes refactored `simple_ilist.h` and the patch's context lines no longer match), the build script panics with a message that names the patch file, surfaces git's stderr, and points at the two resolution paths: drop the patch when upstream lands, or regenerate it. Silently skipping would produce an unpatched binary that AVs at runtime — worse than a build error. Validated end-to-end on Windows MSVC (host) and Linux WSL Ubuntu 22.04 with Rust 1.88.0, plus eight edge-case scenarios: - Clean apply on Windows: passes. - Idempotent re-run: passes in <10s, no re-application. - Linux: workspace passes; patcher correctly skips (verified by inspecting the submodule file post-build). - Hermes-bump-with-upstream-fix scenario (pre-applied attribute): builds clean, no re-application. - Patch file missing: clean panic with "was patches/ pruned?". - Submodule target file missing: clean panic with "reading ... failed". - Patch file corrupt (not a valid diff): clean panic surfacing git's "No valid patches in input". - Partial pre-application (comment present but attribute absent): clean panic with "patch does not apply". - Three back-to-back parallel clean builds: all pass; race-safety holds under the parallel invocation. `cargo fmt --all --check` clean. End-to-end: rebuilt napi binding plugged into onejs/one's Vite-native bundler returns a complete ~5 MB RN bundle in ~1.2 s on Windows where it previously segfaulted on the first request. Closes-by: should make jbroma#17's `cvt_smloc` graceful fallback unnecessary once this lands; the original `assert!(loc.is_valid(), ...)` is correct because Hermes does not produce null `SMLoc`s in this code path. Resolves the runtime AV reported in jbroma#16.
|
Hi, this is a nice catch. hermes/include/hermes/Support/Compiler.h Line 44 in bcbb8c0 I think it would be slightly cleaner to add LLVH is a permanent partial vendored fork, so we have no plans to ever upgrade it en-masse. This gives us some freedom to tweak it (although we do try to keep the tweaks to a minimum and record them in https://github.com/facebook/hermes/tree/bcbb8c040abcf315fa284e798f52ae8252d0b1f1/external/llvh/patches). |
1e89d1c to
ba1a251
Compare
|
Thanks for the pointer to I've force-pushed a revised commit that adopts your suggestion exactly:
Patch reverses cleanly via Two optional follow-ups I considered but kept out of this PR per your "minimum tweaks" preference; happy to do either if you'd prefer them in:
Either or both, or neither — happy to defer to your call. |
Makes sense. Please add it to this PR.
Seems like a good idea. Depending on the results, we may split it into a separate PR or keep it in this one. |
Tier B donePushed as a fast-forward commit ( // External/llvh/include/llvh/ADT/simple_ilist.h, namespace ilist_detail:
struct ebo_check_node : ilist_node<ebo_check_node> {};
static_assert(
sizeof(simple_ilist<ebo_check_node>) ==
sizeof(ilist_sentinel<compute_node_options<ebo_check_node>::type>),
"simple_ilist must use Empty Base Optimization. If this fires on MSVC, "
"ensure LLVM_DECLARE_EMPTY_BASES is applied to the class template "
"(see external/llvh/patches/simple-ilist-msvc-empty-bases.patch).");
Tier C sweep resultsSearched the 75 headers under One additional candidate found:
template <typename NodeTy>
struct ilist_node_traits : ilist_alloc_traits<NodeTy>,
ilist_callback_traits<NodeTy> {};Both bases are empty by the same logic as The downstream effect is subtler than Fix is the same one-line decoration: template <typename NodeTy>
struct LLVM_DECLARE_EMPTY_BASES ilist_node_traits
: ilist_alloc_traits<NodeTy>, ilist_callback_traits<NodeTy> {};( Cleared (false positives from the grep):
No multi-base classes outside the ilist family in DispositionTwo paths, your call:
Both work for me. Lean on whichever you prefer. |
simple_ilist inherits from two empty bases (the configured list_base_type and SpecificNodeAccess). MSVC's ABI does not collapse multiple empty bases by default — it pads each empty base with one byte and pointer-aligns the result, shifting the embedded Sentinel from offset 0 to offset 8. The shift is invisible through pure C++ list usage (iterators hand out node pointers and account for the sentinel offset), but it breaks any caller that compares an iterator's stored node pointer against `&list` rather than against an iterator obtained from `list.end()` — most visibly FFI consumers walking the linked-list `next` pointers from another language. A real-world manifestation: fast-flow-transform's Rust FFI in crates/hermes/src/parser/node.rs wraps a *const NodeList and iterates by reading the embedded `next` pointer, comparing each one against the head pointer it was given. On Linux/Clang and macOS the comparison succeeds (Sentinel at offset 0). On Windows MSVC it never matches because C++ stores `&list.Sentinel` (offset 8) in the linked-list pointers while Rust received `&list` (offset 0); the iterator walks past the end, dereferences the sentinel as a real Node, and trips STATUS_ACCESS_VIOLATION 0xC0000005 during AST traversal. This patch adds an LLVM_DECLARE_EMPTY_BASES macro to llvh's Compiler.h mirroring the existing HERMES_EMPTY_BASES one layer up at include/hermes/Support/Compiler.h:44 (llvh cannot depend on Hermes, so the same idiom is needed in both layers), and tags simple_ilist with the macro. Per the external/llvh/patches/ directory convention the divergence is also recorded as a patch file. GCC and Clang on Linux/macOS resolve the macro to nothing — their Itanium ABI collapses empty bases unconditionally, so there is nothing to opt into. MSVC and clang-cl on Windows (which emulates MSVC's ABI for binary compat) resolve it to __declspec(empty_bases) and produce the correct layout. Test plan: - Builds cleanly on MSVC and on GCC/Clang (the macro is gated on _MSC_VER and is a no-op elsewhere). - fast-flow-transform's Rust workspace tests (36 across fft, fft_support, hermes parser/FFI crates) pass on Windows MSVC with this patch applied; before the fix the FFI iteration tests crashed with STATUS_ACCESS_VIOLATION. - Linux/macOS: unchanged, no measurable layout difference.
Per review feedback on this PR, add a `static_assert` at namespace scope that pins `sizeof(simple_ilist<ebo_check_node>)` to `sizeof(ilist_sentinel<OptionsT>)`. The assertion fires at compile time if the two empty bases stop collapsing — for example if LLVM_DECLARE_EMPTY_BASES is removed from the class template, MSVC's ABI changes, or build-flag drift disables the attribute. Also refresh external/llvh/patches/simple-ilist-msvc-empty-bases.patch so the recorded divergence stays byte-aligned with the in-tree change (the directory's existing convention — every llvh tweak is recorded there). Backstops the runtime FFI bug: simple_ilist's two empty bases (list_base_type, SpecificNodeAccess) must collapse so the embedded Sentinel sits at offset 0; without that, FFI consumers like fast-flow-transform that compare an iterator's stored next pointer against `&list` walk past end-of-list and dereference Sentinel as if it were a real Node.
e14e37f to
a9d1706
Compare
|
I think it makes sense to combine all fixes in a single PR. They are all related after all. Thanks! |
Refresh the build-time patch and idempotency markers to byte-match the post-review form of facebook/hermes#2012. After tmikov's review, the upstream PR was revised from an inline `#if defined(_MSC_VER)` block to a portable LLVM_DECLARE_EMPTY_BASES macro defined in llvh's own Compiler.h (mirroring the existing HERMES_EMPTY_BASES one layer up). The fft patch is updated to match exactly so the post-merge idempotency check still no-ops cleanly when fft eventually bumps its Hermes pin past the merged commit. The idempotency check now accepts both markers: - `LLVM_DECLARE_EMPTY_BASES simple_ilist` (the new upstream form) - `__declspec(empty_bases) simple_ilist` (the legacy inline form) Recognizing both keeps already-patched checkouts building without a forced submodule re-init: developers who applied the legacy form from a previous build see the legacy marker and skip; new checkouts apply the macro form and see the new marker. Validation: - `cargo build --release` on Windows MSVC: 39.83s incremental, clean - `cargo test --release --workspace` on Windows MSVC: 36/36 pass across fft, fft_support, and hermes parser/FFI crates (the FFI iteration tests that AV without the attribute all green) - `git apply --check --reverse` on the refreshed patch: clean
|
@tmikov has imported this pull request. If you are a Meta employee, you can view this in D104498746. |
ilist_node_traits inherits from ilist_alloc_traits and ilist_callback_traits — both empty. On MSVC without LLVM_DECLARE_EMPTY_BASES sizeof grows from 1 to 2, which then prevents EBO when iplist_impl uses ilist_node_traits as its TraitsT base — iplist/ilist end up carrying needless trait padding on Windows. Symmetric fix to the simple_ilist change. Recorded in external/llvh/patches/ilist-node-traits-msvc-empty-bases.patch per the directory's convention.
|
@YevheniiKotyrlo has updated the pull request. You must reimport the pull request before landing. |
|
Pushed Tier C as a fast-forward commit ( Both compile-time invariants are now empirically verified, not just analytically asserted. Two isolated MSVC test programs ( Static_assert (Tier B) — fires when the macro is removed. Test stubs Tier C Confirms the analytical claim: without the Tier C decoration,
|
Summary
Add a portable
LLVM_DECLARE_EMPTY_BASESmacro to llvh'sCompiler.hand apply it tosimple_ilistso MSVC actually performs Empty Base Optimization across the class's two empty bases (list_base_typeandSpecificNodeAccess). Without this attribute, MSVC pads each empty base with one byte and pointer-aligns the result, shifting the embeddedSentinelmember from offset 0 to offset 8 of thesimple_ilist.The macro mirrors the existing
HERMES_EMPTY_BASESone architectural layer up atinclude/hermes/Support/Compiler.h. llvh cannot depend on Hermes, so the same idiom is needed in both layers — that's the whole reason this surfaced inside llvh after Meta had already worked around it forPointerBaseunderHERMESVM_CONTIGUOUS_HEAP.This change is purely layout-sensitive: it does not affect codegen, ABI, or behavior on any other compiler, and on MSVC it produces the same single-allocation layout that GCC and Clang already produce.
Diff shape
external/llvh/include/llvh/Support/Compiler.hLLVM_DECLARE_EMPTY_BASESnext to the other class-decoration macros (betweenLLVM_LIBRARY_VISIBILITYandLLVM_PREFETCH). Resolves to__declspec(empty_bases)on MSVC, empty elsewhere.external/llvh/include/llvh/ADT/simple_ilist.hstatic_assertinvariant innamespace ilist_detailthat pinssizeof(simple_ilist<T>) == sizeof(ilist_sentinel<OptionsT>), locking the EBO collapse as a compile-time guarantee (catches macro removal, MSVC ABI changes, build-flag drift).external/llvh/patches/simple-ilist-msvc-empty-bases.patchsimple_ilisttweaks already follow this pattern:ilist-erase-between.patchandilist-nondefault-destructor.patch.When the shift is observable
Through ordinary C++ list usage, the shift is invisible — the iterators hand out node pointers, and
&list.end()already accounts for the sentinel offset. It becomes load-bearing only for code that takes the address of asimple_ilistand treats it as the address of the embedded sentinel:simple_ilistfrom another language and walk the linked list manually.&listrather than against an iterator obtained fromlist.end().A real-world manifestation:
jbroma/fast-flow-transform's Rust FFI (crates/hermes/src/parser/node.rs) takes a*const NodeListfromhermes_get_FunctionDeclaration_params(...)and iterates by readingnextpointers, comparing each against the head pointer it was given. On Linux/Clang and macOS the comparison succeeds —Sentinelis at offset 0, so&list == &list.Sentinel— and iteration terminates. On Windows MSVC, the C++ side stores&list.Sentinel(offset 8) in the linked-list pointers while Rust received&list(offset 0), so the comparison never matches; iteration walks past the end of the list, dereferences the sentinel as if it were a realNode, and trips aSTATUS_ACCESS_VIOLATION 0xC0000005on the first field load.Repro and validation
Quickest repro (downstream, but isolates the bug to this exact attribute):
The trivial input
parse("function foo(p1) { var x = (10 + p1); }")is enough — it forces the FFI to iterate the function's paramssimple_ilist. Layout traces (eprintln!inNodeIterator::next) on the failing build show the off-by-8 signature directly:After the patch the same trace shows
read next == headand iteration terminates correctly.Test plan
cargo build --releaseon Windows MSVC: clean build of fast-flow-transform's full Hermes static-lib link, 36.13s from clean state (build.rs applies the patch viagit apply, cmake compiles all 10 Hermes static libs).cargo test --release --workspaceon Windows MSVC: 136 tests pass acrossfft,fft_ast,fft_pass,fft_node,fft_support,hermesparser/FFI crates. The FFI iteration tests (parser::hermes_parser::tests::good_parse,comments,magic_comments,parse_error) all green; before the patch they crashed withSTATUS_ACCESS_VIOLATION.namespace ilist_detailtriggers per-translation-unit. Compiles cleanly under MSVC with the macro in place; would fail at compile time without the macro (sizeof grows by 8 bytes — the assertion would fire).libLLVHSupport.a,libhermesParser.a,libhermesAST.a, etc.). Macro resolves to empty on non-MSVC; layout is identical to before the patch.fast-flow-transform#18build-time hook,curl http://localhost:8081/index.bundle?platform=iosreturns a complete 5.01 MB bundle in 1.31 s (and 5.03 MB android bundle in 1.04 s). Before the patch the bundler segfaulted on the first request.cargo test --release --workspace: not re-run end-to-end against the macro form. rustc 1.93.1 hits an internal compiler error infft_ast's lint pass (pre-existing rustc bug;fft_astis a Rust-only crate that doesn't include the C++ headers I changed). The C++ Linux build (above) covers the relevant correctness; full Rust workspace verification would need a rustc version that doesn't ICE.Alternatives considered
#if defined(_MSC_VER)block onsimple_ilistonly. This was the first cut. Replaced with the portable macro per reviewer feedback — the macro version aligns with three precedents already in the codebase (theLLVM_*convention in llvh's ownCompiler.h; theHERMES_EMPTY_BASESmirror one layer up; theexternal/llvh/patches/recording rule).ilist_node_traitsatilist.h:84, which inherits fromilist_alloc_traitsandilist_callback_traits, both empty). Reported the sweep in this thread; pre-built the Tier C patch locally on a wip branch (c6dbd850) and ready to fold in here or split into a follow-up PR per the maintainer's preference.list.end()and use the returned pointer instead of&list. But that requires either exposing a public accessor for the sentinel pointer (a Hermes-side change) or pulling the sentinel address out of an iterator's private field (fragile). Taggingsimple_ilistwith the EBO macro is the actual structural fix and matches the LLVM-upstream convention for layout-sensitive containers (seeD146190, which introduced__declspec(empty_bases)onstd::optional/std::variantfor the same MSVC ABI quirk).Notes
iplist,iplist_impl) are not affected because they single-inherit fromsimple_ilist(one empty + one non-empty base; the multi-empty-base failure mode doesn't apply). The Tier C sweep result above documentsilist_node_traitsseparately.simple_ilist.h. The trigger node and its sentinel type are zero-runtime-cost compile-time only.simple_ilistexplains the motivation in-source so future maintainers understand why the attribute is required.