Skip to content

Conversation

@SirTyson
Copy link
Contributor

Description

Switches core grom glibc allocator to tcmalloc.

After adding the state snapshot invariant, we've seen nodes with it enabled get OOM killed. This was due to an issue with glibc memory arenas. TLDR: we'd make a large allocation on thread A (the in-memory snapshot copy), pass it over to thread B, then have thread B free the allocation. In glibc, there is a per-thread memory arena. Each allocation made by a thread takes capacity from this arena. The issue is, whatever thread frees an allocation gets the capacity added back to its own arena regardless of the original allocating thread. This means that the snapshot invariant copy allocation caused us to "drain" memory from the apply thread, resulting in the apply thread continually requesting more heap until we get OOM killed. For more info, check out this blog.

The solution is to switch to an allocator more friendly to our multithreaded allocation pattern. After testing mimalloc, jemalloc, glibc, and tcmalloc, I've landed on tcmalloc as our winner.

tcmalloc is developed by Google, packaged in default Ubuntu repos, and is quite stable. It's memory overhead and runtime performance are both excellent for our use case. In order to test Resident Set Size (RSS, or allocator memory overhead), I've been running different allocator setting over the last couple days on test-core-003. After tuning tcmalloc, I've seen a memory decrease of about 40%, from 7.5 GB to 5 GB. Surprisingly, I've also seen a very significant runtime improvement.

In a multithreaded, high throughput environment (measured by indexing the BucketList in parallel), we see the following allocation measurements:

glibc:

median: 69 ns
mean: 680 ns
total allocation time during Bucket Indexing: 313 ms
tcmalloc (with settings from this PR)

median: 22 ns
mean: 178 ns
total allocation time during Bucket Indexing: 82.4 ms

Deallocation time is omitted, as it is basically the same for both. The key takeway is that tcmalloc is faster on the mdeian case, but also much more consistent due to better handling of thread contention.

To test more realistic patterns, I ran catchup on 200 ledgers starting from 60757320 locally on my laptop. tcmalloc resulted in an 11% reduction of total apply time compared to glibc:

image

I also tested max-sac-tps test on user-dev-001. I tested with 4 threads, batch size of 1, with disk writes counting towards overall apply time. tcmalloc consistently showed ~18% improvement in apply time and overall max sac TPS (from 6400 -> 7680 TPS). This seems like a pretty significant and easy win!

I don't think CI will pass, as we need to add a new dependency to our build. It looks like we have to create the runner with the new dependency inside namespace directly, so maybe @graydon can help out with this. I'm also not sure if I correctly changed the build system, especially with the way I've set up the config options. In my testing I was just linking everything directly, and the actual compile time changes are AI generated, so some help would be appreciated.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson requested review from Copilot, dmkozh and graydon and removed request for Copilot January 16, 2026 00:41
dmkozh
dmkozh previously approved these changes Jan 16, 2026
Copy link
Contributor

@dmkozh dmkozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory/perf numbers looks great, thanks for the research!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches stellar-core from the glibc allocator to tcmalloc to address memory arena issues that were causing OOM kills when the state snapshot invariant was enabled. The PR adds tcmalloc as a required dependency on Linux, configures it with optimized parameters for stellar-core's allocation patterns, and updates documentation and build configuration accordingly.

Changes:

  • Adds TcmallocConfig.cpp to configure tcmalloc parameters at startup
  • Updates build system (configure.ac, Makefile.am) to detect and link tcmalloc on Linux
  • Updates INSTALL.md to document the new libgoogle-perftools-dev dependency

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/util/TcmallocConfig.cpp New file that configures tcmalloc parameters (memory release rate, thread cache size, allocation thresholds) using a constructor attribute
configure.ac Adds pkg-config check for libtcmalloc_minimal >= 2.0 on Linux, makes it mandatory with clear error message
src/Makefile.am Adds libtcmalloc_LIBS to stellar_core_LDADD for linking
INSTALL.md Documents new libgoogle-perftools-dev dependency for Linux builds

@anupsdf
Copy link
Contributor

anupsdf commented Jan 16, 2026

Nice work indeed! We might have to add this in a few other places like Dockerfiles etc. I thought of asking copilot to do it but we can just search manually and include them in this PR.

@SirTyson
Copy link
Contributor Author

Nice work indeed! We might have to add this in a few other places like Dockerfiles etc. I thought of asking copilot to do it but we can just search manually and include them in this PR.

I've updated the testing dockerfiles in the core repo and opened a PR for quickstart: stellar/quickstart#890. I believe the last failing github action should pass after the quickstart PR merges.

@SirTyson SirTyson force-pushed the tcmalloc branch 2 times, most recently from ab092df to 01bd2f4 Compare January 23, 2026 19:38
@SirTyson SirTyson enabled auto-merge January 23, 2026 22:48
Copy link
Contributor

@anupsdf anupsdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SirTyson SirTyson added this pull request to the merge queue Jan 23, 2026
Merged via the queue into stellar:master with commit 1d2f58f Jan 23, 2026
31 of 45 checks passed
@SirTyson SirTyson deleted the tcmalloc branch January 23, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants