Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static TLS errors from jemalloc 5.0.0 built on CentOS 6 #937

Closed
wesm opened this issue Jun 26, 2017 · 28 comments · Fixed by #1180
Assignees
Milestone

Comments

@wesm
Copy link

@wesm wesm commented Jun 26, 2017

I help maintain packages on conda-forge which has become fairly popular in the Python community. We recently added jemalloc 5.0.0 to the package manager, built on CentOS 6 with devtoolset-2 from this base Docker image (glibc 2.12 I think)

https://github.com/conda-forge/docker-images/blob/master/linux-anvil/Dockerfile

On some platforms, like Ubuntu 14.04 (glibc 2.19), using dlopen on the produced shared library leads to errors like

libjemalloc.so: cannot allocate memory in static TLS block

What is the recommended workaround given that we need to compile on a glibc 2.12 system and deploy the binaries on systems with newer glibc?

this may be related to https://sourceware.org/bugzilla/show_bug.cgi?id=14898

cc @xhochy

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jun 27, 2017

Trying to dlopen jemalloc is bad news bears.

Thread local storage in shared libraries will, by default, require a malloc on first use of the thread-local data by a thread. This can trigger some reentrancy scenarios, and even if you don't, it's easy to accidentally mix malloc-from-jemalloc and malloc-from-glibc. We also don't try to reclaim global data structures on shutdown, so dlopen-ing and dlclose-ing libjemalloc.so repeatedly will leak.

To avoid the reentrancy scenarios, we mark our TLS so that its storage gets allocated with new threads. This works fine for LD_PRELOAD'd (or whatever) shared libraries, but has some problems with dlopen'd ones (where, obviously, the loader doesn't know that it will need that space allocated in advance). glibc tries to save some extra space in advance for any dlopen that happens to come along, but not a ton. jemalloc 5.0 increased the amount of TLS it uses, so it plausibly pushed itself over the limit and can no longer fit in glibc's spare capacity.

We could potentially add a config option to move our TLS out of line, but given the pitfalls here I'm pretty nervous. Could you describe a little more why people want to be able to dlopen libjemalloc.so?

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jun 27, 2017

(also, in any case, a bugfix 5.0.1 release will be coming out relatively soon that may be worth waiting for before increasing the blast radius of the 5.0 change).

@xhochy

This comment has been minimized.

Copy link

@xhochy xhochy commented Jun 27, 2017

Just to outline our usage of jemalloc in Apache Arrow: We don't explicitly dlopen the library but simply link dynamically against it. It is never unloaded explicitly, usage is exactly like any other shared library. We also explicitly use the jemalloc mallocx API thus LD_PRELOADing would not suffice for our usage.

@wesm

This comment has been minimized.

Copy link
Author

@wesm wesm commented Jun 27, 2017

@davidtgoldblatt sorry to be confusing, we have a shared library libarrow.so that is linked to libjemalloc.so, and the TLS error occurs in the dynamic loader

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jun 27, 2017

Could you deploy a jemalloc built with custom config options? We'd want to support this by moving the TLS to be backed by the system malloc, which we'd have to have a config flag for.

In fact, I think ideally we'd only let someone do this when a custom prefix is turned on, to flash a big neon warning sign saying "malloc here does not mean malloc elsewhere". Would that badly break something for you?

@wesm

This comment has been minimized.

Copy link
Author

@wesm wesm commented Jun 27, 2017

Our usage of jemalloc is fairly limited (https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L54), so as long as we can

  • avoid the glibc incompatibility
  • not be too difficult for other users of libjemalloc.so to link to

then I think we should be OK.

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jun 27, 2017

Why not just always delegate to the platform allocator (e.g. posix_memalign)? If all you're getting from linking against jemalloc is a perf boost, you can LD_PRELOAD it instead.

The two bullet points you mention are in some tension, because people linking against libjemalloc.so may (quite reasonably) expect to be able to free() memory they mallocx(), and will hit a nasty surprise.

@xhochy

This comment has been minimized.

Copy link

@xhochy xhochy commented Jun 28, 2017

We explicitly call rallocx to get 64byte-aligned memory reallocations. This is not available in the POSIX API and thus we see a great benefit of directly using the jemalloc API instead of simply the POSIX one.

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jun 28, 2017

Would the strategy of introduce a special configure flag that moves jemalloc's TLS to be allocated by glibc malloc, but requiring this to be used only with --with-jemalloc-prefix work?

This wouldn't let users use the *allocx functions unprefixed, but, because of the symbol collision issues involved, I think that's more of a feature than a bug.

@xhochy

This comment has been minimized.

Copy link

@xhochy xhochy commented Jun 28, 2017

I think that this should work for us for the moment until we can raise the minimal glibc requirement.

@wesm

This comment has been minimized.

Copy link
Author

@wesm wesm commented Jun 28, 2017

Yes, I think in this case we would use a prefix that is unique to our library and statically link these symbols to avoid conflicts with other applications that may expect to use libjemalloc.so

@KenMacD

This comment has been minimized.

Copy link

@KenMacD KenMacD commented Jul 5, 2017

I'm running in to this same issue with rocksdb on Arch when trying to load that library using python-rocksdb.

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jul 11, 2017

I put up a hypothetical change we could make in davidtgoldblatt@dbd61d6 . This would let you switch at build time to a dlsym-able libjemalloc.so by passing a --enable-general-dynamic-tls configure flag.

I don't love this, for a two reasons:

  • I don't know how package maintainers would deal with it (this way always breaks for people who depend on jemalloc without dlsym, and the other way always breaks for people who depend on it with dlsym).
  • I don't think there's a good way of stopping pointers from leaking between the two malloc implementations running in the single process. This will blow up in a confusing way in an end-user's face at some point. Then they'll come angrily yell at you, or, worse, me.

If you're hitting problems with the limited static TLS size, could you try out your build with this fix to ensure you don't hit any issues?

I think if this doesn't work, we'd have to add a config option that lets you go back to storing TLS out of line and taking a pointer indirection hit on every operation. I'd really like to avoid something like that, which will hurt common case performance (since I assume that's the only version of the library that would get installed in that case).

@KenMacD

This comment has been minimized.

Copy link

@KenMacD KenMacD commented Jul 11, 2017

I agree that solution doesn't sound ideal. If I was packaging jemaloc and rocksdb I really wouldn't know what option to use.

Would using -mtls-dialect=gnu2 be likely to work? https://bugs.freedesktop.org/show_bug.cgi?id=35268#c15

@KenMacD

This comment has been minimized.

Copy link

@KenMacD KenMacD commented Jul 11, 2017

I tested -mtls-dialect=gnu2 and removing the __attribute__((tls_model("initial-exec"))) and it worked fine both being loaded dynamically and on start. For reference, here's a readelf --all <lib>> | grep -i tls on each:

Original install:

  L (link order), O (extra OS processing required), G (group), T (TLS),
  TLS            0x000000000006a870 0x000000000026a870 0x000000000026a870
 0x000000000000001e (FLAGS)              STATIC_TLS
00000026f0f8  001d00000007 R_X86_64_JUMP_SLO 0000000000000000 __tls_get_addr@GLIBC_2.3 + 0
    29: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __tls_get_addr@GLIBC_2.3 (9)

With tls-dialect:

  L (link order), O (extra OS processing required), G (group), T (TLS),
  TLS            0x0000000000071870 0x0000000000271870 0x0000000000271870
 0x000000006ffffef6 (TLSDESC_PLT)        0x5990
 0x000000006ffffef7 (TLSDESC_GOT)        0x275ff0
000000276220  000000000024 R_X86_64_TLSDESC                     0
   504: 0000000000000000  6104 TLS     LOCAL  DEFAULT   18 je_tsd_tls

I'm not sure how to test the performance in both situations.

@KenMacD

This comment has been minimized.

Copy link

@KenMacD KenMacD commented Jul 12, 2017

I created a commit (KenMacD@7036e64) for anyone else looking to test this.

@xhochy

This comment has been minimized.

Copy link

@xhochy xhochy commented Jul 16, 2017

I can confirm that this fixes the problems we ( @wesm and me in pyarrow/conda) were seeing. If needed, I can provide a reproducible environment but that is a bit more work.

@wesm

This comment has been minimized.

Copy link
Author

@wesm wesm commented Jul 27, 2017

Is there a known timeline for having this in a released version? We have a user hitting another bug (https://issues.apache.org/jira/browse/ARROW-1282) which is possibly fixed by #802

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Jul 28, 2017

I'm talking to some compiler people about the gnu2 tls dialect. It seems like it's used almost nowhere (in fact, this thread is now on the first page of Google results for it), and I'm worried about weird codegen bugs.

It'll probably be a while before we cut another release. There's some core feature work we want to get to.

@mchodson

This comment has been minimized.

Copy link

@mchodson mchodson commented Aug 10, 2017

+1

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Sep 3, 2017

FYI: because of the cluster of issues involving increased TLS usage (this and #1106), we'll add a configure option to move the thread-local allocation cache (the largest consumer of TLS space in jemalloc by far) out of thread-local data and into an internal allocation. This will cost a few fast-path nanoseconds if you turn it on, but will let this static TLS space hack continue to work for a while longer.

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Sep 3, 2017

Timeline for that is probably a few weeks, though.

@joshlf

This comment has been minimized.

Copy link

@joshlf joshlf commented Sep 8, 2017

@davidtgoldblatt

To avoid the reentrancy scenarios, we mark our TLS so that its storage gets allocated with new threads.

Can you point me to where you do that? I'm trying to take a pointer from you guys on this issue, and I've been unable to figure out how you accomplish that. Thanks!

@davidtgoldblatt

This comment has been minimized.

Copy link
Member

@davidtgoldblatt davidtgoldblatt commented Sep 11, 2017

@joshlf, the tsd struct is defined at

#ifdef JEMALLOC_MALLOC_THREAD_CLEANUP
, with the tls model set in the configure script
AC_DEFINE([JEMALLOC_TLS_MODEL],
.

Basically, you just need to set the TLS model to be "initial-exec" (or "gnu2" if you feel more confident about the implementation issues there than we do). (Not sure how easy it is to do this with rust).

Happy to help more, but let's discuss on the the elfmalloc issue to avoid polluting this thread.

@cfossace

This comment has been minimized.

Copy link

@cfossace cfossace commented Nov 16, 2017

Hey just wanted to bump this issue. I am having some trouble with this because Binary Ninja uses jemalloc. I wrote a C extension module that I compiled with gcc and everything ran in my python interpreter but would segfault in the binja interpreter. I thought recompiling my C extensions with jemalloc would help me troubleshoot the segfault but now I get the error:

ImportError: /usr/local/lib/libjemalloc.so.2: cannot allocate memory in static TLS block

in response to simply importing the custom module in python
(import modulename )

For reference, it is being imported from a modulename.so file in the same folder

I have modified Python.h for python 2.7, replacing stdlib with jemalloc

I've been trying to implement some of the suggested fixes above, so far no dice but I'll keep working on it for another day or two.

@maxnbk

This comment has been minimized.

Copy link

@maxnbk maxnbk commented Dec 5, 2017

KenMacD's solution above worked for me, as applied to tag 5.0.1 and appears fine in our testing environment. (Nothing performance related but has enabled me not to LD_PRELOAD where I didn't want to, and LD_PRELOAD was causing it's own issues in my circumstance)
Cheers.

@nehaljwani

This comment has been minimized.

Copy link
Contributor

@nehaljwani nehaljwani commented Jan 1, 2018

I faced the same issue where mxnet does a dlopen() on libmxnet.so, which links to libjemalloc.so. Can confirm that @KenMacD 's patch worked for me.

@leezu

This comment has been minimized.

Copy link

@leezu leezu commented Jan 22, 2018

@davidtgoldblatt do you have an updated timeline for adding the configure option to move the thread-local allocation cache (the largest consumer of TLS space in jemalloc by far) out of thread-local data and into an internal allocation?

@interwq interwq referenced this issue Apr 12, 2018
@interwq interwq added this to the 5.1.0 milestone Apr 12, 2018
@yorickvP yorickvP referenced this issue Apr 16, 2018
2 of 8 tasks complete
msarahan added a commit to AnacondaRecipes/jemalloc-feedstock that referenced this issue Apr 20, 2018
@ihnorton ihnorton referenced this issue Apr 25, 2018
@veox veox referenced this issue Jan 25, 2019
3 of 3 tasks complete
ns-codereview pushed a commit to couchbase/tlm that referenced this issue Feb 18, 2019
As of jemalloc version 5, more TLS space is required for performance
reasons than may be pre-allocated. This is an issue when dlopening
jemalloc which view engine does. Fix this by recompiling jemalloc with
the --disable-initial-exec-tls flag.

For more information, see the following jemalloc issues.
jemalloc/jemalloc#1237
jemalloc/jemalloc#937

Change-Id: I0b51e43ed1110174f021b741bbec2ef5df500a8a
Reviewed-on: http://review.couchbase.org/104949
Reviewed-by: Dave Rigby <daver@couchbase.com>
Tested-by: Build Bot <build@couchbase.com>
gnzlbg added a commit to gnzlbg/jemallocator that referenced this issue Mar 20, 2019
This allows libraries that link against jemalloc to be dlopened. 

See

jemalloc/jemalloc#937
https://github.com/jemalloc/jemalloc/blob/dev/INSTALL.md

for detailed explanation of the feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.