New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
static TLS errors from jemalloc 5.0.0 built on CentOS 6 #937
Comments
|
Trying to dlopen jemalloc is bad news bears. Thread local storage in shared libraries will, by default, require a malloc on first use of the thread-local data by a thread. This can trigger some reentrancy scenarios, and even if you don't, it's easy to accidentally mix malloc-from-jemalloc and malloc-from-glibc. We also don't try to reclaim global data structures on shutdown, so dlopen-ing and dlclose-ing libjemalloc.so repeatedly will leak. To avoid the reentrancy scenarios, we mark our TLS so that its storage gets allocated with new threads. This works fine for LD_PRELOAD'd (or whatever) shared libraries, but has some problems with dlopen'd ones (where, obviously, the loader doesn't know that it will need that space allocated in advance). glibc tries to save some extra space in advance for any dlopen that happens to come along, but not a ton. jemalloc 5.0 increased the amount of TLS it uses, so it plausibly pushed itself over the limit and can no longer fit in glibc's spare capacity. We could potentially add a config option to move our TLS out of line, but given the pitfalls here I'm pretty nervous. Could you describe a little more why people want to be able to dlopen libjemalloc.so? |
|
(also, in any case, a bugfix 5.0.1 release will be coming out relatively soon that may be worth waiting for before increasing the blast radius of the 5.0 change). |
|
Just to outline our usage of jemalloc in Apache Arrow: We don't explicitly |
|
@davidtgoldblatt sorry to be confusing, we have a shared library |
|
Could you deploy a jemalloc built with custom config options? We'd want to support this by moving the TLS to be backed by the system malloc, which we'd have to have a config flag for. In fact, I think ideally we'd only let someone do this when a custom prefix is turned on, to flash a big neon warning sign saying "malloc here does not mean malloc elsewhere". Would that badly break something for you? |
|
Our usage of jemalloc is fairly limited (https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L54), so as long as we can
then I think we should be OK. |
|
Why not just always delegate to the platform allocator (e.g. posix_memalign)? If all you're getting from linking against jemalloc is a perf boost, you can LD_PRELOAD it instead. The two bullet points you mention are in some tension, because people linking against libjemalloc.so may (quite reasonably) expect to be able to free() memory they mallocx(), and will hit a nasty surprise. |
|
We explicitly call |
|
Would the strategy of introduce a special This wouldn't let users use the *allocx functions unprefixed, but, because of the symbol collision issues involved, I think that's more of a feature than a bug. |
|
I think that this should work for us for the moment until we can raise the minimal glibc requirement. |
|
Yes, I think in this case we would use a prefix that is unique to our library and statically link these symbols to avoid conflicts with other applications that may expect to use |
|
I'm running in to this same issue with |
|
I put up a hypothetical change we could make in davidtgoldblatt@dbd61d6 . This would let you switch at build time to a dlsym-able libjemalloc.so by passing a --enable-general-dynamic-tls configure flag. I don't love this, for a two reasons:
If you're hitting problems with the limited static TLS size, could you try out your build with this fix to ensure you don't hit any issues? I think if this doesn't work, we'd have to add a config option that lets you go back to storing TLS out of line and taking a pointer indirection hit on every operation. I'd really like to avoid something like that, which will hurt common case performance (since I assume that's the only version of the library that would get installed in that case). |
|
I agree that solution doesn't sound ideal. If I was packaging Would using |
|
I tested Original install: With I'm not sure how to test the performance in both situations. |
|
I created a commit (https://github.com/KenMacD/jemalloc/commit/7036e64d36b54e75c6428a3c2aa62db0471f1047) for anyone else looking to test this. |
|
I can confirm that this fixes the problems we ( @wesm and me in pyarrow/conda) were seeing. If needed, I can provide a reproducible environment but that is a bit more work. |
|
Is there a known timeline for having this in a released version? We have a user hitting another bug (https://issues.apache.org/jira/browse/ARROW-1282) which is possibly fixed by #802 |
|
I'm talking to some compiler people about the gnu2 tls dialect. It seems like it's used almost nowhere (in fact, this thread is now on the first page of Google results for it), and I'm worried about weird codegen bugs. It'll probably be a while before we cut another release. There's some core feature work we want to get to. |
|
+1 |
|
FYI: because of the cluster of issues involving increased TLS usage (this and #1106), we'll add a configure option to move the thread-local allocation cache (the largest consumer of TLS space in jemalloc by far) out of thread-local data and into an internal allocation. This will cost a few fast-path nanoseconds if you turn it on, but will let this static TLS space hack continue to work for a while longer. |
|
Timeline for that is probably a few weeks, though. |
Can you point me to where you do that? I'm trying to take a pointer from you guys on this issue, and I've been unable to figure out how you accomplish that. Thanks! |
|
@joshlf, the tsd struct is defined at Line 15 in a315688
Line 725 in 82d1a3f
Basically, you just need to set the TLS model to be "initial-exec" (or "gnu2" if you feel more confident about the implementation issues there than we do). (Not sure how easy it is to do this with rust). Happy to help more, but let's discuss on the the elfmalloc issue to avoid polluting this thread. |
As of jemalloc version 5, more TLS space is required for performance reasons than may be pre-allocated. This is an issue when dlopening jemalloc which view engine does. Fix this by recompiling jemalloc with the --disable-initial-exec-tls flag. For more information, see the following jemalloc issues. jemalloc/jemalloc#1237 jemalloc/jemalloc#937 Change-Id: I0b51e43ed1110174f021b741bbec2ef5df500a8a Reviewed-on: http://review.couchbase.org/104949 Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: Build Bot <build@couchbase.com>
This allows libraries that link against jemalloc to be dlopened. See jemalloc/jemalloc#937 https://github.com/jemalloc/jemalloc/blob/dev/INSTALL.md for detailed explanation of the feature.
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937
|
@interwq you asked at #303 (comment) why The use I'm referring to is with |
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Starting with jemalloc 5 we must build jemalloc with --disable-initial-exec-tls to support linking libmxnet.so with libjemalloc.so As Debian Stretch+ and Ubuntu 18.10+ ship with jemalloc 5 built without --disable-initial-exec-tls, building MXNet with jemalloc support on any of those platforms is currently broken. jemalloc/jemalloc#937 To simplify integration with MXNet's CMake build, we rely on the yet to be merged CMake version of jemalloc: jemalloc/jemalloc#303
Including jemalloc in libfdb_c seems to be problematic, since it's apparently not recommended to dlopen jemalloc: jemalloc/jemalloc#937 (comment)
Disable the thread local storage model in jemalloc 5 to prevent shared libraries linked to libjemalloc from crashing on dlopen(). jemalloc/jemalloc#937 This bug affects both Java JNI and python libraries which link to jemalloc 5, such as RocksDB, which will crash the program when loaded. * gnu/packages/jemalloc.scm (jemalloc)[arguments]: Add --disable-initial-exec-tls configure flag. Co-authored-by: Ludovic Courtès <ludo@gnu.org>
I help maintain packages on conda-forge which has become fairly popular in the Python community. We recently added jemalloc 5.0.0 to the package manager, built on CentOS 6 with devtoolset-2 from this base Docker image (glibc 2.12 I think)
https://github.com/conda-forge/docker-images/blob/master/linux-anvil/Dockerfile
On some platforms, like Ubuntu 14.04 (glibc 2.19), using dlopen on the produced shared library leads to errors like
What is the recommended workaround given that we need to compile on a glibc 2.12 system and deploy the binaries on systems with newer glibc?
this may be related to https://sourceware.org/bugzilla/show_bug.cgi?id=14898
cc @xhochy
The text was updated successfully, but these errors were encountered: