Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Use jemalloc global allocator on supported platforms #4367

Merged
merged 1 commit into from Feb 20, 2024

Conversation

jbencin
Copy link
Contributor

@jbencin jbencin commented Feb 11, 2024

Description

Use jemalloc allocator on supported platforms (non-Windows). jemalloc is used by FreeBSD, and used to be the default in Rust, so it should be perfectly safe to use

In the replay-block benchmark, this results in a 3.3% performance improvement:

hyperfine -w 3 -r 10 "stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000" "stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000"
Benchmark 1: stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000
  Time (mean ± σ):     31.078 s ±  0.053 s    [User: 27.011 s, System: 3.903 s]
  Range (min … max):   31.000 s … 31.173 s    10 runs
 
Benchmark 2: stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000
  Time (mean ± σ):     30.044 s ±  0.046 s    [User: 26.014 s, System: 3.864 s]
  Range (min … max):   29.978 s … 30.112 s    10 runs
 
Summary
  stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000 ran
    1.03 ± 0.00 times faster than stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000

Applicable issues

Additional info (benefits, drawbacks, caveats)

  • I applied this change to stacks-inspect and stacks-node. Should I do the other binaries also?
  • I should also benchmark memory usage

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

Copy link

codecov bot commented Feb 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f8c6760) 83.22% compared to head (40939de) 67.77%.

Additional details and impacted files
@@             Coverage Diff             @@
##             next    #4367       +/-   ##
===========================================
- Coverage   83.22%   67.77%   -15.46%     
===========================================
  Files         448      448               
  Lines      321363   321365        +2     
===========================================
- Hits       267453   217794    -49659     
- Misses      53910   103571    +49661     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; looking forward to seeing how this shapes up

@jbencin
Copy link
Contributor Author

jbencin commented Feb 15, 2024

I tested this on a different block range (which makes far more use of Clarity VM) and there is a much more significant performance increase of around 12%:

hyperfine -w 3 -r 10 "stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000" "stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000"
Benchmark 1: stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000
  Time (mean ± σ):     12.140 s ±  0.031 s    [User: 11.607 s, System: 0.489 s]
  Range (min … max):   12.091 s … 12.207 s    10 runs
 
Benchmark 2: stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000
  Time (mean ± σ):     10.842 s ±  0.015 s    [User: 10.346 s, System: 0.442 s]
  Range (min … max):   10.820 s … 10.863 s    10 runs
 
Summary
  stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000 ran
    1.12 ± 0.00 times faster than stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000

@jbencin
Copy link
Contributor Author

jbencin commented Feb 15, 2024

I tried using heaptrack to determine the change in memory usage using jemalloc. It shows about 1/3 the memory usage of using the default allocator. That can't be right. I've heard jemalloc doesn't work right with Valgrind, I'm guessing it's the same issue with heaptrack

Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jbencin jbencin merged commit 059ae88 into stacks-network:next Feb 20, 2024
1 of 2 checks passed
@jbencin jbencin deleted the chore/use-jemalloc branch March 8, 2024 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants