Switch to tcmalloc #5101

SirTyson · 2026-01-16T00:41:24Z

Description

Switches core grom glibc allocator to tcmalloc.

After adding the state snapshot invariant, we've seen nodes with it enabled get OOM killed. This was due to an issue with glibc memory arenas. TLDR: we'd make a large allocation on thread A (the in-memory snapshot copy), pass it over to thread B, then have thread B free the allocation. In glibc, there is a per-thread memory arena. Each allocation made by a thread takes capacity from this arena. The issue is, whatever thread frees an allocation gets the capacity added back to its own arena regardless of the original allocating thread. This means that the snapshot invariant copy allocation caused us to "drain" memory from the apply thread, resulting in the apply thread continually requesting more heap until we get OOM killed. For more info, check out this blog.

The solution is to switch to an allocator more friendly to our multithreaded allocation pattern. After testing mimalloc, jemalloc, glibc, and tcmalloc, I've landed on tcmalloc as our winner.

tcmalloc is developed by Google, packaged in default Ubuntu repos, and is quite stable. It's memory overhead and runtime performance are both excellent for our use case. In order to test Resident Set Size (RSS, or allocator memory overhead), I've been running different allocator setting over the last couple days on test-core-003. After tuning tcmalloc, I've seen a memory decrease of about 40%, from 7.5 GB to 5 GB. Surprisingly, I've also seen a very significant runtime improvement.

In a multithreaded, high throughput environment (measured by indexing the BucketList in parallel), we see the following allocation measurements:

glibc:

median: 69 ns
mean: 680 ns
total allocation time during Bucket Indexing: 313 ms

tcmalloc (with settings from this PR)

median: 22 ns
mean: 178 ns
total allocation time during Bucket Indexing: 82.4 ms

Deallocation time is omitted, as it is basically the same for both. The key takeway is that tcmalloc is faster on the mdeian case, but also much more consistent due to better handling of thread contention.

To test more realistic patterns, I ran catchup on 200 ledgers starting from 60757320 locally on my laptop. tcmalloc resulted in an 11% reduction of total apply time compared to glibc:

I also tested max-sac-tps test on user-dev-001. I tested with 4 threads, batch size of 1, with disk writes counting towards overall apply time. tcmalloc consistently showed ~18% improvement in apply time and overall max sac TPS (from 6400 -> 7680 TPS). This seems like a pretty significant and easy win!

I don't think CI will pass, as we need to add a new dependency to our build. It looks like we have to create the runner with the new dependency inside namespace directly, so maybe @graydon can help out with this. I'm also not sure if I correctly changed the build system, especially with the way I've set up the config options. In my testing I was just linking everything directly, and the actual compile time changes are AI generated, so some help would be appreciated.

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

dmkozh

The memory/perf numbers looks great, thanks for the research!

src/util/TcmallocConfig.cpp

Copilot

Pull request overview

This PR switches stellar-core from the glibc allocator to tcmalloc to address memory arena issues that were causing OOM kills when the state snapshot invariant was enabled. The PR adds tcmalloc as a required dependency on Linux, configures it with optimized parameters for stellar-core's allocation patterns, and updates documentation and build configuration accordingly.

Changes:

Adds TcmallocConfig.cpp to configure tcmalloc parameters at startup
Updates build system (configure.ac, Makefile.am) to detect and link tcmalloc on Linux
Updates INSTALL.md to document the new libgoogle-perftools-dev dependency

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
src/util/TcmallocConfig.cpp	New file that configures tcmalloc parameters (memory release rate, thread cache size, allocation thresholds) using a constructor attribute
configure.ac	Adds pkg-config check for libtcmalloc_minimal >= 2.0 on Linux, makes it mandatory with clear error message
src/Makefile.am	Adds libtcmalloc_LIBS to stellar_core_LDADD for linking
INSTALL.md	Documents new libgoogle-perftools-dev dependency for Linux builds

anupsdf · 2026-01-16T18:54:08Z

Nice work indeed! We might have to add this in a few other places like Dockerfiles etc. I thought of asking copilot to do it but we can just search manually and include them in this PR.

SirTyson · 2026-01-23T19:26:45Z

Nice work indeed! We might have to add this in a few other places like Dockerfiles etc. I thought of asking copilot to do it but we can just search manually and include them in this PR.

I've updated the testing dockerfiles in the core repo and opened a PR for quickstart: stellar/quickstart#890. I believe the last failing github action should pass after the quickstart PR merges.

src/util/TcmallocConfig.cpp

anupsdf

LGTM

SirTyson requested review from Copilot, dmkozh and graydon and removed request for Copilot January 16, 2026 00:41

Copilot started reviewing on behalf of SirTyson January 16, 2026 00:41 View session

dmkozh previously approved these changes Jan 16, 2026

View reviewed changes

src/util/TcmallocConfig.cpp Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings January 16, 2026 18:30

SirTyson dismissed dmkozh’s stale review via 7a3762b January 16, 2026 18:30

SirTyson force-pushed the tcmalloc branch from 1c9136f to 7a3762b Compare January 16, 2026 18:30

Copilot started reviewing on behalf of SirTyson January 16, 2026 18:31 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

SirTyson force-pushed the tcmalloc branch 2 times, most recently from ab092df to eb688a9 Compare January 23, 2026 18:29

SirTyson mentioned this pull request Jan 23, 2026

Add tmalloc dependency to core docker file stellar/quickstart#890

Merged

SirTyson force-pushed the tcmalloc branch 2 times, most recently from ab092df to 01bd2f4 Compare January 23, 2026 19:38

Switch to tcmalloc

ab44234

SirTyson force-pushed the tcmalloc branch from 01bd2f4 to ab44234 Compare January 23, 2026 22:33

anupsdf reviewed Jan 23, 2026

View reviewed changes

src/util/TcmallocConfig.cpp Show resolved Hide resolved

SirTyson enabled auto-merge January 23, 2026 22:48

anupsdf approved these changes Jan 23, 2026

View reviewed changes

SirTyson added this pull request to the merge queue Jan 23, 2026

Merged via the queue into stellar:master with commit 1d2f58f Jan 23, 2026
31 of 45 checks passed

SirTyson deleted the tcmalloc branch January 23, 2026 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to tcmalloc #5101

Switch to tcmalloc #5101

SirTyson commented Jan 16, 2026

Uh oh!

dmkozh left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

anupsdf commented Jan 16, 2026

Uh oh!

SirTyson commented Jan 23, 2026

Uh oh!

Uh oh!

anupsdf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Switch to tcmalloc #5101

Switch to tcmalloc #5101

Conversation

SirTyson commented Jan 16, 2026

Description

Checklist

Uh oh!

dmkozh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

anupsdf commented Jan 16, 2026

Uh oh!

SirTyson commented Jan 23, 2026

Uh oh!

Uh oh!

anupsdf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants