Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upSwitch the default global allocator to System, remove alloc_jemalloc, use jemallocator in rustc #36963
Comments
This comment has been minimized.
This comment has been minimized.
|
Depends on having stable global allocators. Since this will result in immediate performance regressions on platforms using jemalloc today we'll need to be sensitive about how the transition is done and make sure it's clear how to regain that allocator perfomance. It might be a good idea to simultaneously publish other allocator crates to demonstrate the value of choice and for benchmark comparisons. |
brson
added
C-enhancement
A-allocators
T-libs
labels
Oct 4, 2016
brson
referenced this issue
Oct 4, 2016
Closed
Performance regression on Windows due to removal of jemalloc #36328
This comment has been minimized.
This comment has been minimized.
|
An alternative I've heard @sfackler advocate from time to time is:
That would allows us to optionally include jemalloc, but if you want the system allocator for heap profiling, valgrind, or other use cases you can choose so. |
brson
referenced this issue
Oct 4, 2016
Closed
Tracking issue for changing the global, default allocator (RFC 1974) #27389
This comment has been minimized.
This comment has been minimized.
|
I would specifically like to jettison jemalloc entirely and use the system allocator. It breaks way too often, it dropped valgrind support, it adds a couple hundred kb to binaries, etc. |
cuviper
referenced this issue
Oct 6, 2016
Closed
Segfault running aarch64-unknown-linux-gnu binaries on Fedora #36994
This comment has been minimized.
This comment has been minimized.
|
Some historical speed bumps we've had with jemalloc:
I'll try to keep this updated as we run into more issues. |
This comment has been minimized.
This comment has been minimized.
|
jemalloc also makes Rust look bad to newcomers, because it makes "Hello World" executables much larger (I know it's not a fair way to judge a language, but people do, and I can't stop myself from caring about size of redistributable executables, too) |
This comment has been minimized.
This comment has been minimized.
|
Another observation - jemalloc seems to add a significant amount of overhead to thread creation, on both Linux and macOS. This hasn't been a major issue for me as we plan to use the system allocator on Fuchsia, but probably something worth looking into. |
This comment has been minimized.
This comment has been minimized.
fweimer
commented
Jan 3, 2017
|
On the glibc side, we would be interested in workloads where jemalloc shows significant benefits. (@djdelorie is working on improving glibc malloc performance.) |
japaric
added a commit
to japaric/rust
that referenced
this issue
Jan 4, 2017
This comment has been minimized.
This comment has been minimized.
|
PR implementing this: #38820 |
This comment has been minimized.
This comment has been minimized.
|
I'd like to see some light benchmarks to get an idea of the magnitude of the default performance regression we can expect. |
rkruppe
referenced this issue
Apr 10, 2017
Closed
Binary size violates the you-do-not-pay-for-what-you-do-not-use principle. #41177
This comment has been minimized.
This comment has been minimized.
|
I'm not sure if this is active, but wanted to voice a recent pain-point: I am using stable Rust. I wrote a executable. I wrote a dylib. I called one from the other. It explodes because they have different default allocators and I cannot change either on stable. Independent of which allocator is fasterest, or hardest to maintain, etc., that there is a difference between the default allocators makes the shared library FFI story on stable Rust pretty bad. Edit: Also, this issue was opened on my birthday so you should just make it happen. <3 |
This comment has been minimized.
This comment has been minimized.
|
I believe we've also encountered a deadlock on OSX with recent versions of jemalloc - jemalloc/jemalloc#895 |
This comment has been minimized.
This comment has been minimized.
|
I'm going to close this in favor of #27389. It's highly likely that all programs will start to link to jemalloc by default once we stabilize that feature, but there's not really much we can do until that issue lands. |
alexcrichton
closed this
Jun 20, 2017
SimonSapin
referenced this issue
Nov 13, 2017
Closed
Does inlining work through the __rust_alloc symbol? #45831
SimonSapin
referenced this issue
May 24, 2018
Closed
Make rustc use jemalloc through #[global_allocator] #51038
glandium
referenced this issue
May 31, 2018
Merged
Stabilize GlobalAlloc and #[global_allocator] #51241
SimonSapin
changed the title
Switch to liballoc_system by default, move liballoc_jemalloc to crates.io
Switch the default global allocator to System
May 31, 2018
This comment has been minimized.
This comment has been minimized.
SimonSapin
reopened this
May 31, 2018
This comment has been minimized.
This comment has been minimized.
|
I've opened #55238 to close out this issue |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton what about the memory usage? It regressed, apparently. |
This comment has been minimized.
This comment has been minimized.
Do we not want to provide |
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin yeah I sort of see |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Oct 21, 2018
bors
added a commit
that referenced
this issue
Oct 21, 2018
pietroalbini
added a commit
to pietroalbini/rust
that referenced
this issue
Oct 25, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Oct 29, 2018
bors
added a commit
that referenced
this issue
Oct 29, 2018
bors
added a commit
that referenced
this issue
Oct 30, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Oct 30, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Oct 31, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Nov 2, 2018
bors
added a commit
that referenced
this issue
Nov 2, 2018
bors
added a commit
that referenced
this issue
Nov 3, 2018
bors
added a commit
that referenced
this issue
Nov 3, 2018
bors
closed this
in
#55238
Nov 3, 2018
This was referenced Nov 27, 2018
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Is there a tracking issue to track when defaulting to system allocator lands on stable? |
This comment has been minimized.
This comment has been minimized.
|
@johnthagen It just has to ride the normal release train. The PR that closed this issue is currently on the beta branch, on track for 1.32. |
This comment has been minimized.
This comment has been minimized.
|
@johnthagen We generally close tracking issues when something is done/implemented in the In this case, you can see that this issue was closed by #55238 on 2018-11-03, so it likely reached the Nightly channel the next day. Every 6 weeks, Beta becomes Stable and Nightly is forked as the new Beta. So it takes 6 to 12 weeks for a PR merge to reach the Stable channel. https://github.com/rust-lang/rust/blob/master/RELEASES.md shows the dates of past releases and https://forge.rust-lang.org/ the expected date of the next release. |
This comment has been minimized.
This comment has been minimized.
|
Should this be tagged with |
SimonSapin
added
the
relnotes
label
Dec 10, 2018
This comment has been minimized.
This comment has been minimized.
|
Good point! Done. |
This was referenced Dec 14, 2018
This comment has been minimized.
This comment has been minimized.
spacejam
commented
Jan 18, 2019
•
|
I am quite saddened by this. PL-scale memory throughout regressions like this will use a lot more energy, cost most users (who are unlikely to learn about GlobalAlloc) more on their server bills, and blunt the surprising bliss experienced by so many newcomers whose uncertain first steps blow their previous implementations out of the water. Binary size is a vanity metric for computing at scale, and for those who require it to be smaller, they have the flexibility to change. This has real ethical implications, as our DCs are set to consume 20% of the world's electricity by 2025, and the decisions made by those shaping the foundational layers have massive implications. Overriding GlobalAlloc is not a realistic option for authors of allocation intensive libraries, as it prevents users from using tools like the llvm sanitizers etc... As engineers building foundational infrastructure, we have an ethical obligation to the planet to minimize the costs we impose on it. This decision was made in direct contradiction of this responsibility to our shared home. Amazing efficiency by default on the platform that is the main driver of world-wide datacenter power consumption is a precious metric for a language with as bright a future for massive scale adoption as rust. |
This comment has been minimized.
This comment has been minimized.
|
@spacejam I don't think it's quite fair to characterize this as that grand of a problem. It's not as though jemalloc exclusively makes things faster, and thus not as though this is universally a regression. Quite to the contrary. There are some workloads that are made much better by this. This change also means that, as system allocators improve, so will that of Rust programs. This would not be the case for a compiled-in memory allocator. If you want to go down the life-cycle analysis path, I think it could also be argued that we are saving countless person hours by allowing the user of standardized tools from people who previously had to waste time trying to figure out why valgrind or what didn't just work. Along those same lines, one could argue that every change to the standard library has wide-reaching implications on global energy use, but a) that impact is minute; b) that impact is basically impossible to predict; and c) it is infeasible to perform that kind of analysis on any kind of representative scale for every (if any) change. |
brson commentedOct 4, 2016
•
edited by SimonSapin
Updated description
A long time coming, this issue is that we should implement these changes simultaneously:
alloc_jemalloccratestd::alloc::System. While currently the default for cdylib/staticlib, it's not the default for staticlib/executablejemallocatorcrate to rustc, but only rustcalloc_systemcrateWe for the longest time have defaulted to jemalloc as the default allocator for Rust programs. This has been in place since pre-1.0 and the vision was that we'd give programs a by-default faster allocator than what's on the system. Over time, this has not fared well:
#[global_allocator]to opt-in to a jemalloc-based global allocator (through thejemallocatoror any other allocator crate).The compiler, however still receives a good deal of benefit from using jemalloc (measured in #55202 (comment)). If that link is broken, it's basically a blanket across-the-board 8-10% regression in compile time for many benchmarks. (apparently the max rss also regressed on many benchmarks!). For this reason, we don't want to remove jemalloc from rustc itself.
The rest of this issue is now going to be technical details about how we can probably get rid of
alloc_jemallocwhile preserving jemalloc in rustc itself. The tier 1 platforms that usealloc_jemallocwhich this issue will be focused on are:Jemalloc is notably disabled on all Windows platforms (I believe due to our inability to ever get it building over there). Furthermore Jemalloc is enabled on some linux platforms but I think ended up basically being disabled on all but the above. This I believe narrows the targets we need to design for, as we basically need to keep the above working.
Note that we also have two modes of using jemalloc. In one mode we could actually use jemalloc-specific API functions, like
alloc_jemallocdoes today. We could also use the standard API it has and the support to hook into the standard allocator on these two platforms. It's not been measured (AFAIK) at this time the tradeoff between these two strategies. Note that in any case we want to route LLVM's allocations to jemalloc, so we want to be sure to hook into the default allocator somehow.I believe that this default allocator hooking on Linux works by basically relying on its own symbol
mallocoverriding that inlibc, routing all memory allocation to jemalloc. I'm personally quite fuzzy on the details for OSX, but I think it has something to do with "zone allocators" and not much to do with symbol names. I think this means we can build jemalloc without symbol prefixes on Linux, and with symbol prefixes on OSX, and we should be able to, using that build, override the default allocator in both situations.I would propose, first, a "hopefully easy" route to solve this:
jemalloc_sys, pulling in all of jemalloc itself. This should, with the right build configuration, mean that we're not using jemalloc everywhere in the compiler (just as we're rerouting LLVM we're rerouting the compiler).I'm testing out the performance of this in #55217 and will report back with results. Results are that this is universally positive almost! @alexcrichton will make a PR.
Failing this @alexcrichton has ideas for a more invasive solution to use jemalloc-specific API calls in rustc itself, but hopefull that won't be necessary...
Original Description
@alexcrichton and I have increasingly come to think that Rust should not maintain jemalloc bindings in tree and link it by default. The primary reasons being:
For the sake of consistency and maintenance we'd prefer to just always use the system allocator, and make jemalloc an easy option to enable via the global allocator and a jemalloc crate on crates.io.