Skip to content

libpas: memory-accounting fixes + embedder thread-suspension hook for Linux TLC decommit#182

Merged
Jarred-Sumner merged 4 commits intomainfrom
jarred/libpas-memory-fixes
Apr 13, 2026
Merged

libpas: memory-accounting fixes + embedder thread-suspension hook for Linux TLC decommit#182
Jarred-Sumner merged 4 commits intomainfrom
jarred/libpas-memory-fixes

Conversation

@Jarred-Sumner
Copy link
Copy Markdown
Collaborator

@Jarred-Sumner Jarred-Sumner commented Apr 13, 2026

What this fixes

Bun's RSS on Linux servers sits noticeably higher than the equivalent Node.js process and doesn't fully recover after load drops. Two independent causes, both in libpas:

  1. Several memory-accounting bugs that make the allocator commit fresh pages when free ones exist, or strand free pages so the scavenger never returns them.
  2. TLC allocator-page decommit is Darwin-only. On Linux, when a thread goes idle its local allocators stay live and their pages stay committed for the life of the thread. Darwin force-stops them via Mach thread_suspend; Linux had no equivalent path.

This PR fixes (1) directly and fixes (2) by letting libpas borrow WTF's existing thread-suspension mechanism instead of owning a second one.


Commit 1 — eight libpas accounting fixes

The two that affect production memory:

  • pas_fast_large_free_heap.c — cartesian-tree max-heap invariant inverted. The tree is a treap keyed on (address, size) used to find a free block ≥ N bytes. fast_write_cursor and fast_merge had the re-balance check backwards: when a node shrinks they checked the parent (should be children), when it grows they checked children (should be parent). When the root shrank below a child, or a leaf grew past its parent, the tree silently broke, find_first pruned the subtree, and the allocator pulled fresh pages from aligned_allocator while a satisfying block sat hidden. This is the path every large allocation and every TLC allocation takes.

  • pas_bitfit_directory.ctake_last_empty race strands pages. It read last_empty_plus_one with pas_versioned_field_read instead of read_to_watch. A concurrent view_did_become_empty between the read and the final try_write was a no-op (no version bump), so try_write zeroed the field with empty_bits[K] still set — a permanently stranded page. The segregated directory already does this correctly.

The other six are correctness/hygiene with little production impact (PGM UAF, a missed flag copy on TLC realloc, a missing mutex unlock, two test-config-only accounting symmetries, and bumping the standalone xcodeproj to gnu++23 so test_pas actually compiles on the current macOS SDK).

All have new regression tests in LargeFreeHeapTests.cpp / BitfitTests.cpp that fail on main and pass after.


Commit 2 — pas_thread_suspender callback API

Adds an install-once embedder callback table:

struct pas_thread_suspender {
    void* (*current_thread)(void);          // called at TLC creation
    bool  (*begin_suspend)(void* handle);   // called by scavenger, holding pas_thread_suspend_lock + heap_lock
    void  (*end_suspend)(void* handle);
    void  (*release_handle)(void* handle);  // optional, called at TLC destroy after dropping locks
};

pas_thread_local_cache stores the handle. The scavenger's force-stop path becomes:

  • Darwin: unchanged — direct Mach thread_suspend (kernel-counted, already nests with GC). Disassembly of the production dylib confirms the embedder branch is dead-code-eliminated.
  • Non-Darwin with a suspender installed: route through begin_suspend/end_suspend.
  • Non-Darwin without one: request-only, same as before (except it now reports work-pending instead of falling through silently).

decommit_allocator_range is enabled on PAS_OS(DARWIN) || PAS_OS(LINUX).

A testing-only pas_thread_suspender_override_native flag lets Darwin route through the embedder so the Linux code path can be exercised locally — ThreadSuspenderTests.cpp does this with a Mach-backed mock.


Commit 3 — WTF adapter

// Threading.cpp, inside WTF::initialize(), Linux only
pas_install_thread_suspender(&s_pasThreadSuspender);

The adapter wraps WTF::Thread::suspend/resume. Three things make this safe:

  1. Same lock, no new mechanism. ThreadSuspendLocker already wraps pas_thread_suspend_lock under BENABLE(LIBPAS), so GC suspension and the libpas scavenger were already serialized by the same lock. The adapter uses a new ThreadSuspendLocker(AdoptLock) constructor since the scavenger holds the lock when begin_suspend is called. Thread::suspend only takes the locker as a const-ref witness, so this is purely a token.

  2. Same signal, no new handler. Thread::suspend on Linux uses g_wtfConfig.sigThreadSuspendResume — whatever the embedder configures (Bun uses SIGPWR). libpas installs no handlers and reserves no signals.

  3. Handle lifetime. current_thread() takes a ref on the WTF::Thread; release_handle() (fired from TLC destroy after libpas drops its locks) drops it. Threads created outside WTF return a null handle and stay request-only — graceful degradation.

The "what if GC suspends a thread the scavenger also wants" case: on Darwin the kernel counts suspensions; on Linux the shared lock serializes them — the scavenger blocks on pas_thread_suspend_lock until GC's ThreadSuspendLocker destructs, then proceeds. No nesting, no deadlock.


Testing

Standalone libpas (./build.sh -s macosx -a arm64 -t test_pas -v testing):

Suite Result
LargeFreeHeap 49/49 (47/49 on main — 2 new tests)
Bitfit 3/3 (2/2 on main — 1 new test)
ThreadSuspender 2/2 (new)
CartesianTree 75/75
IsoHeapPageSharing 182/182
IsoHeapChaos 296/296
PGM 15/15
RaceTests 4/4
TLCDecommit 16/19 fail — pre-existing on main, identical

Adversarial: built clean main in a worktree with only the new tests applied — they crash with the expected assertions; pass after the fixes. testForceStopUsesEmbedder 20/20 under repeat. Production dylib: override_native symbol absent, _stop_allocator disassembly shows direct _thread_suspend with no _pas_thread_suspender_instance load.

pas_fast_large_free_heap.c: fast_write_cursor and fast_merge had inverted
max-heap invariant checks (shrink checked parent, grow checked children;
should be the reverse). When the root shrank or a leaf grew via coalescing,
the node stayed in place with parent.y < child.y, hiding free blocks from
find_first and forcing redundant aligned_allocator pulls. Affects every
pas_large_heap and pas_large_utility_free_heap (TLC allocations).

pas_bitfit_directory.c: take_last_empty used pas_versioned_field_read
instead of read_to_watch, so a concurrent view_did_become_empty_at_index
between the read and the final try_write was a no-op (no version bump),
letting try_write zero last_empty_plus_one with empty_bits[K] still set:
a permanently stranded page. Segregated directory already does this right.

pas_probabilistic_guard_malloc_allocator.c: deallocate dereferenced *entry
after pas_ptr_hash_map_remove may have rehashed/freed the table.

pas_thread_local_cache.c: TLC realloc copied should_stop_bitvector but not
its should_stop_some gate, dropping pending stop requests for one cycle.

pas_scavenger.c: try_install_foreign_work_callback returned false without
unlocking foreign_work.lock when the descriptor table was full.

pas_large_free_heap_helpers.c: large_utility_aligned_allocator's failure
path called give_back without the talks_to_large_sharing_pool guard that
wraps the matching take_later.

pas_deferred_decommit_log.c: decommit_all coalesced adjacent ranges without
comparing mmap_capability, decommitting the second range with the first's
capability.

Tests: two LargeFreeHeapTests cases (root-shrink, leaf-grow) and one
BitfitTests race-hook case; all fail before, pass after. New
race-test-hook kind compiles to a no-op when PAS_ENABLE_TESTING is off.

Build: bump libpas.xcodeproj to gnu++23 so test_pas links against the
current libc++ (<atomic> vs <stdatomic.h> interop hard-errors before C++23).
…decommit

On non-Darwin platforms the scavenger could only request that a thread stop
its allocators (honored on the next allocation slow path), so allocators on
threads that go idle without exiting were never stopped and their TLC pages
stayed committed. Darwin uses Mach thread_suspend to force-stop them; this
adds an embedder-provided suspender so the same path works on Linux without
libpas owning a second signal-based mechanism that would deadlock against
the embedder's GC suspension.

pas_thread_suspender: install-once table of {current_thread, begin_suspend,
end_suspend}. Contract: callbacks must not allocate or take libpas locks;
begin_suspend serializes against the embedder's other suspenders.

pas_thread_local_cache: store embedder_thread_handle at TLC creation and
copy it on realloc. can_force_stop_allocators() is compile-time true on
Darwin, runtime (suspender_instance && handle) elsewhere. Non-Darwin
suspend()/resume() route through the installed callbacks; Darwin path is
unchanged. decommit_allocator_range() is now PAS_OS(DARWIN) || PAS_OS(LINUX).

Without an installed suspender behavior is identical to before (request-only
on non-Darwin). The JSC/WTF adapter is a follow-up; this commit is the
libpas hook.

Tests: ThreadSuspenderTests verifies install + handle storage on all
platforms, and balanced begin/end calls on force-stop on non-Darwin.
…resume

ThreadSuspendLocker already wraps pas_thread_suspend_lock under
BENABLE(LIBPAS), so the libpas scavenger and JSC's GC are already serialized
by the same lock. The adapter therefore needs an adopt-lock constructor
(the scavenger holds the lock when begin_suspend is called) — Thread::suspend
takes the locker by const ref purely as a witness, so no behavior change.

current_thread() takes a ref on the WTF::Thread to keep it alive across the
TLC's lifetime; release_handle() (new optional callback, fired from TLC
destroy after dropping libpas locks) drops it.

Linux only: Darwin keeps direct Mach in libpas; Windows decommit isn't
enabled. Uses g_wtfConfig.sigThreadSuspendResume — whatever signal Bun
already configures (SIGPWR), no second handler.

Also: pas_store_store_fence before publishing pas_thread_suspender_instance.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0118333a-56b4-4d62-8fba-7e2fb2d7794a

📥 Commits

Reviewing files that changed from the base of the PR and between 98dc8d6 and d8d42dd.

📒 Files selected for processing (1)
  • Source/WTF/wtf/posix/ThreadingPOSIX.cpp

Walkthrough

Adds an embedder-facing thread suspender API for libpas, integrates it into WTF Thread suspend/resume paths, augments ThreadSuspendLocker with adopt-lock semantics, updates bmalloc build/test inputs, and includes multiple libpas bug fixes and new tests exercising the suspender and allocator behaviors.

Changes

Cohort / File(s) Summary
Thread Suspender API
Source/bmalloc/libpas/src/libpas/pas_thread_suspender.h, Source/bmalloc/libpas/src/libpas/pas_thread_suspender.c
New public embedder interface and installer for process-wide pas_thread_suspender_instance; runtime checks and optional test override added.
WTF Threading Integration
Source/WTF/wtf/Threading.h, Source/WTF/wtf/Threading.cpp, Source/WTF/wtf/posix/ThreadingPOSIX.cpp
Added AdoptLockTag constructor and m_shouldUnlock flag to ThreadSuspendLocker; mapped pas_thread_suspender callbacks to WTF Thread suspend/resume and adjusted self-suspension assertion.
libpas Thread-Local Cache & Suspend Flow
Source/bmalloc/libpas/src/libpas/pas_thread_local_cache.h, Source/bmalloc/libpas/src/libpas/pas_thread_local_cache.c
Track embedder_thread_handle on thread-local caches; use embedder suspender begin_suspend/end_suspend when available; add helpers/flags controlling force-stop and broaden decommit platform guard to include Linux.
Build & Project Files
Source/bmalloc/CMakeLists.txt, Source/bmalloc/libpas/libpas.xcodeproj/project.pbxproj
Added pas_thread_suspender.c/.h to bmalloc build and public headers; added ThreadSuspenderTests.cpp to Xcode project; updated C++ standard entries to gnu++23.
ThreadSuspender Tests
Source/bmalloc/libpas/src/test/ThreadSuspenderTests.cpp, Source/bmalloc/libpas/src/test/TestHarness.cpp
New tests exercising mock and (on Darwin) real embedder suspender behavior; registered ThreadSuspender test suite.
Bitfit Race Test
Source/bmalloc/libpas/src/test/BitfitTests.cpp, Source/bmalloc/libpas/src/libpas/pas_bitfit_directory.c, Source/bmalloc/libpas/src/libpas/pas_race_test_hooks.h
Added a race-hook test and hook enum; changed pas_bitfit_directory_take_last_empty() to use versioned-field watch read and invoke the new race hook after the empty-scan loop.
Large Free Heap & Related Fixes
Source/bmalloc/libpas/src/libpas/pas_fast_large_free_heap.c, Source/bmalloc/libpas/src/libpas/pas_large_free_heap_helpers.c, Source/bmalloc/libpas/src/test/LargeFreeHeapTests.cpp
Adjusted child-comparison logic and re-add decisions in fast large-free-heap code; conditionalized sharing-pool return on allocator failure; added regression tests.
Deferred Decommit & Scavenger Fixes
Source/bmalloc/libpas/src/libpas/pas_deferred_decommit_log.c, Source/bmalloc/libpas/src/libpas/pas_scavenger.c
Prevent merging decommit ranges across differing mmap_capability; ensure scavenger early-return releases its mutex.
Probabilistic Guard Malloc Fix
Source/bmalloc/libpas/src/libpas/pas_probabilistic_guard_malloc_allocator.c
Snapshot metadata entry before overwriting to preserve pre-removal state when storing into pgm_metadata_vector.
Thread-Local Cache Layout Change
Source/bmalloc/libpas/src/libpas/pas_thread_local_cache.h
Added void* embedder_thread_handle; to pas_thread_local_cache struct.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the two main objectives: libpas memory-accounting fixes and the new embedder thread-suspension hook for Linux TLC decommit.
Description check ✅ Passed The PR description is comprehensive, exceeding template requirements with detailed technical explanations, commit breakdowns, testing results, and design rationale. It follows WebKit conventions by referencing bugs and explaining the changes thoroughly.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Source/bmalloc/libpas/src/libpas/pas_thread_suspender.h`:
- Around line 53-60: The struct pas_thread_suspender uses raw function pointer
types which reduces readability; define typedefs like
pas_current_thread_callback (for pas_embedder_thread_handle (*)(void)),
pas_begin_suspend_callback (for bool (*)(pas_embedder_thread_handle)),
pas_end_suspend_callback (for void (*)(pas_embedder_thread_handle)), and
pas_release_handle_callback (for void (*)(pas_embedder_thread_handle)); then
update the struct fields current_thread, begin_suspend, end_suspend, and
release_handle to use these typedef names instead of raw pointer signatures so
the code is clearer and the types are reusable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6ad9ad7d-4ce8-41fb-8f7d-05ea1a478917

📥 Commits

Reviewing files that changed from the base of the PR and between 42f80a6 and 98dc8d6.

📒 Files selected for processing (19)
  • Source/WTF/wtf/Threading.cpp
  • Source/WTF/wtf/Threading.h
  • Source/bmalloc/CMakeLists.txt
  • Source/bmalloc/libpas/libpas.xcodeproj/project.pbxproj
  • Source/bmalloc/libpas/src/libpas/pas_bitfit_directory.c
  • Source/bmalloc/libpas/src/libpas/pas_deferred_decommit_log.c
  • Source/bmalloc/libpas/src/libpas/pas_fast_large_free_heap.c
  • Source/bmalloc/libpas/src/libpas/pas_large_free_heap_helpers.c
  • Source/bmalloc/libpas/src/libpas/pas_probabilistic_guard_malloc_allocator.c
  • Source/bmalloc/libpas/src/libpas/pas_race_test_hooks.h
  • Source/bmalloc/libpas/src/libpas/pas_scavenger.c
  • Source/bmalloc/libpas/src/libpas/pas_thread_local_cache.c
  • Source/bmalloc/libpas/src/libpas/pas_thread_local_cache.h
  • Source/bmalloc/libpas/src/libpas/pas_thread_suspender.c
  • Source/bmalloc/libpas/src/libpas/pas_thread_suspender.h
  • Source/bmalloc/libpas/src/test/BitfitTests.cpp
  • Source/bmalloc/libpas/src/test/LargeFreeHeapTests.cpp
  • Source/bmalloc/libpas/src/test/TestHarness.cpp
  • Source/bmalloc/libpas/src/test/ThreadSuspenderTests.cpp

Comment on lines +53 to +60
struct pas_thread_suspender {
pas_embedder_thread_handle (*current_thread)(void);
bool (*begin_suspend)(pas_embedder_thread_handle);
void (*end_suspend)(pas_embedder_thread_handle);
/* Optional; may be NULL. Called once when the TLC that owns the handle is destroyed,
from the owning thread, with no libpas locks held. */
void (*release_handle)(pas_embedder_thread_handle);
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Function pointer typedefs would improve readability.

Consider using typedefs for the function pointer types to improve readability and enable reuse:

typedef pas_embedder_thread_handle (*pas_current_thread_callback)(void);
typedef bool (*pas_begin_suspend_callback)(pas_embedder_thread_handle);
// etc.

This is a minor style suggestion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Source/bmalloc/libpas/src/libpas/pas_thread_suspender.h` around lines 53 -
60, The struct pas_thread_suspender uses raw function pointer types which
reduces readability; define typedefs like pas_current_thread_callback (for
pas_embedder_thread_handle (*)(void)), pas_begin_suspend_callback (for bool
(*)(pas_embedder_thread_handle)), pas_end_suspend_callback (for void
(*)(pas_embedder_thread_handle)), and pas_release_handle_callback (for void
(*)(pas_embedder_thread_handle)); then update the struct fields current_thread,
begin_suspend, end_suspend, and release_handle to use these typedef names
instead of raw pointer signatures so the code is clearer and the types are
reusable.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 13, 2026

Preview Builds

Commit Release Date
d8d42dd3 autobuild-preview-pr-182-d8d42dd3 2026-04-13 09:32:38 UTC
98dc8d6e autobuild-preview-pr-182-98dc8d6e 2026-04-13 08:12:24 UTC

…rtion

The libpas scavenger calls Thread::suspend (via pas_thread_suspender) while
holding the heap lock. The scavenger is a raw pthread with no WTF::Thread,
so currentSingleton() lazy-allocates one -> fastMalloc -> heap lock ->
deadlock. currentMayBeNull() reads TLS without allocating; if it returns
null the caller can't be suspending itself anyway.
@Jarred-Sumner Jarred-Sumner merged commit 4939a8a into main Apr 13, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant