chore: Use `jemalloc` global allocator on supported platforms #4367

jbencin · 2024-02-11T21:45:20Z

Description

Use jemalloc allocator on supported platforms (non-Windows). jemalloc is used by FreeBSD, and used to be the default in Rust, so it should be perfectly safe to use

In the replay-block benchmark, this results in a 3.3% performance improvement:

❯ hyperfine -w 3 -r 10 "stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000" "stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000"
Benchmark 1: stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000
  Time (mean ± σ):     31.078 s ±  0.053 s    [User: 27.011 s, System: 3.903 s]
  Range (min … max):   31.000 s … 31.173 s    10 runs
 
Benchmark 2: stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000
  Time (mean ± σ):     30.044 s ±  0.046 s    [User: 26.014 s, System: 3.864 s]
  Range (min … max):   29.978 s … 30.112 s    10 runs
 
Summary
  stacks-core.next/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ first 3000 ran
    1.03 ± 0.00 times faster than stacks-core.next/target/release/stacks-inspect.x86-64 replay-block /home/jbencin/data/next/ first 3000

Applicable issues

Find Optimizations in the current Clarity VM via benchmarking current VM and Wasm #4316

Additional info (benefits, drawbacks, caveats)

I applied this change to stacks-inspect and stacks-node. Should I do the other binaries also?
I should also benchmark memory usage

Checklist

Test coverage for new or modified code paths
Changelog is updated
Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
New clarity functions have corresponding PR in clarity-benchmarking repo
New integration test(s) added to bitcoin-tests.yml

codecov · 2024-02-11T22:00:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f8c6760) 83.22% compared to head (40939de) 67.77%.

Additional details and impacted files

@@             Coverage Diff             @@
##             next    #4367       +/-   ##
===========================================
- Coverage   83.22%   67.77%   -15.46%     
===========================================
  Files         448      448               
  Lines      321363   321365        +2     
===========================================
- Hits       267453   217794    -49659     
- Misses      53910   103571    +49661

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jcnelson

LGTM; looking forward to seeing how this shapes up

jbencin · 2024-02-15T15:53:02Z

I tested this on a different block range (which makes far more use of Clarity VM) and there is a much more significant performance increase of around 12%:

❯ hyperfine -w 3 -r 10 "stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000" "stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000"
Benchmark 1: stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000
  Time (mean ± σ):     12.140 s ±  0.031 s    [User: 11.607 s, System: 0.489 s]
  Range (min … max):   12.091 s … 12.207 s    10 runs
 
Benchmark 2: stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000
  Time (mean ± σ):     10.842 s ±  0.015 s    [User: 10.346 s, System: 0.442 s]
  Range (min … max):   10.820 s … 10.863 s    10 runs
 
Summary
  stacks-core/target/release/stacks-inspect.jemalloc replay-block /home/jbencin/data/next/ range 99990 100000 ran
    1.12 ± 0.00 times faster than stacks-core/target/release/stacks-inspect.next replay-block /home/jbencin/data/next/ range 99990 100000

jbencin · 2024-02-15T19:40:52Z

I tried using heaptrack to determine the change in memory usage using jemalloc. It shows about 1/3 the memory usage of using the default allocator. That can't be right. I've heard jemalloc doesn't work right with Valgrind, I'm guessing it's the same issue with heaptrack

obycode

👍

saralab requested review from obycode, kantai and jcnelson February 12, 2024 16:03

jcnelson approved these changes Feb 15, 2024

View reviewed changes

jbencin force-pushed the chore/use-jemalloc branch from bc57f72 to ad8ab51 Compare February 19, 2024 20:23

kantai approved these changes Feb 20, 2024

View reviewed changes

obycode approved these changes Feb 20, 2024

View reviewed changes

chore: Use jemalloc global allocator on supported platforms

40939de

jbencin force-pushed the chore/use-jemalloc branch from ad8ab51 to 40939de Compare February 20, 2024 21:42

jbencin merged commit 059ae88 into stacks-network:next Feb 20, 2024
1 of 2 checks passed

jbencin deleted the chore/use-jemalloc branch March 8, 2024 22:08

wileyj mentioned this pull request Mar 18, 2024

Build a portable stacks-node image #4557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Use `jemalloc` global allocator on supported platforms #4367

chore: Use `jemalloc` global allocator on supported platforms #4367

jbencin commented Feb 11, 2024

codecov bot commented Feb 11, 2024 •

edited

jcnelson left a comment

jbencin commented Feb 15, 2024

jbencin commented Feb 15, 2024

obycode left a comment

chore: Use jemalloc global allocator on supported platforms #4367

chore: Use jemalloc global allocator on supported platforms #4367

Conversation

jbencin commented Feb 11, 2024

Description

Applicable issues

Additional info (benefits, drawbacks, caveats)

Checklist

codecov bot commented Feb 11, 2024 • edited

Codecov Report

jcnelson left a comment

Choose a reason for hiding this comment

jbencin commented Feb 15, 2024

jbencin commented Feb 15, 2024

obycode left a comment

Choose a reason for hiding this comment

chore: Use `jemalloc` global allocator on supported platforms #4367

chore: Use `jemalloc` global allocator on supported platforms #4367

codecov bot commented Feb 11, 2024 •

edited