Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKL causes FFTW to segfault if loaded first #4054

Merged
merged 7 commits into from
Jun 15, 2022

Conversation

ahnitz
Copy link
Member

@ahnitz ahnitz commented Jun 14, 2022

This changes FFTW to use RTLD_DEEPBIND when loading the shared libraries as opposed to RTLD_GLOBAL. The latter is ineffective if MKL is first loaded (which can happen due to innocuous reasons like someone loading numpy before pycbc) because the symbols will still point to the MKL version. DEEPBIND enforces that the symbol lookup first goes to the loaded library rather than the global table so we can ensure FFTW functions are actually called when they are intended.

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

This fixes the test case segfault reported in #4052

# library rather than first going to the global symbol table
# This should be used when multiple libraries may export incompatible
# symbols
RTLD_DEEPBIND = 0x00008
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahnitz I don't know if you saw my comments in slack, but is is easy for you to test rather than hard-coding this, to instead do:

import os
RTLD_DEEPBIND = os.RTLD_DEEPBIND

Is that a hard test for you to do in short order, or do you otherwise prefer not to do that? I guess we really need to wrap that in a test that the imported os has an RTLD_DEEPBIND attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer using this (thanks Josh!), but it seems like mac does not have this. If we are right that this flag is not needed in mac anyway, then perhaps just wrap this in a "is linux" block and use default flags for mac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the least it should probably really be:

import os
if hasattr(os, RTLD_DEEPBIND):
    RTLD_DEEPBIND = os.RTLD_DEEPBIND
else
    RTLD_DEEPBIND = 0x00008 # Or maybe we set it to RTLD_GLOBAL if that flag
                   # just doesn't exist on a given platform?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-willis That's my fault, yeah, we should use the os provided value. I just saw that ctypes didn't have it assigned and jumped to the assumption that it wasn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same value, so yes, in testing they both work for linux at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-willis Ok, looks like that this doesn't work on macos unfortunately as the variable doesn't exist on that platform. Darn.

@@ -30,8 +30,8 @@
# We need to construct them directly with CDLL so
# we can give the RTLD_GLOBAL mode, which we must do
# in order to use the threaded libraries as well.
double_lib = get_ctypes_library('fftw3',['fftw3'],mode=ctypes.RTLD_GLOBAL)
float_lib = get_ctypes_library('fftw3f',['fftw3f'],mode=ctypes.RTLD_GLOBAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the mode be set deeper, ie. in the get_ctypes_library function itself, so that it is used all the time when using ctypes??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Or make this flag the default??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check, but I think this would make sense. Certainly if we are putting logic to decide whether or not a given installation has (or should use) RTLD_DEEPCOPY versus RTLD_GLOBAL, then I think it makes sense to do that in only one place. However I think I prefer the route of making it a default, so that we retain the ability to do something different on a library by library basis, in case we find we need to do something different elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-willis @spxiwh I'm also OK with making DEEPBIND the 'default' assuming this works on all the platforms. Reading its documentation, it certainly sounds like what we would have wanted the default to be, at least in the common scenario where we are essentially treating each shared library as an isolated blob to call.

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

@josh-willis @spxiwh If there aren't any objections, I think we go with the current PR, where we both set the defaul to DEEPBIND, but also explicitly set so in the fftw package so we don't accidentally backslide on that.

@spxiwh
Copy link
Contributor

spxiwh commented Jun 14, 2022

I'm happy to have this be the default and be set in fftw.py.

But I think we do need to use os. RTLD_DEEPBIND. If this isn't defined in mac it implies that this would do nothing in mac anyway, and then it's not clear what setting this would do on macs at all.

I would prefer not changing the previous behaviour at all on mac, and using os. RTLD_DEEPBIND on linux.

I tried running the initial test at the top of #4052 on my mac and it ran fine anyway!!

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

@spxiwh Is the current change OK? I have it read from os.RTLD_DEEPBIND, but if not defined, it will equivalent to an unset flag. I'd like to keep the name though for use so the intention is clear in other modules when it is used, even if it essentially becomes a NOOP on macos

#
# This is only defined for linux systems, on macosx, the default behavior
# is similar
RTLD_DEEPBIND = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as previous behaviour (on mac). Before we would call ctypes.CDLL with mode not set, and the default is then ctypes.DEFAULT_MODE which is 4 on mac, not 0 as it is on linux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is set to 4, that would be RTLD_LOCAL, but we were explicitly setting it to RTLD_GLOBAL before.
https://github.com/realthunder/mac-headers/blob/master/usr/include/dlfcn.h

So should I set it to RTLD_GLOBAL for MacOSX, that would replicate the existing input we gave in FFTW, but I'm not sure if that's a sane default or not in MacOSX

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh darn, it looks like the bit definitions aren't even compatible, for comparison with glibc https://code.woboq.org/userspace/glibc/bits/dlfcn.h.html

# is similar
RTLD_DEEPBIND = 0
if hasattr(os, 'RTLD_DEEPBIND'):
RTLD_DEEPBIND = os.RTLD_DEEPBIND

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe taking @spxiwh 's comment into account, what we want here is:

if hasattr(os, 'RTLD_DEEPBIND'):
    DEFAULT_RTLD_MODE = os.RTLD_DEEPBIND
else:
   DEFAULT_RTLD_MODE = ctypes.DEFAULT_MODE

and then change below to:

def get_ctypes_library(libname, packages, mode=DEFAULT_RTLD_MODE)

which then also allows override in individual libraries. Or would you rather export something like LIBUTILS_RTLD_DEEPCOPY and have that the default, so that we are then also explicitly setting that in fftw rather than relying on the default? Although it is arguably also confusing to have something that implies you're doing DEEPCOPY but in fact on some os's it is a noop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-willis Yeah, either way, non-perfect, but I think I'll opt for just setting up the default as you've proposed and fftw won't set the mode at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming all unit tests pass. And I agree we should do a release after this.

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

@josh-willis Do you recall why we were setting the mode at all in FFTW to RTLD_GLOBAL? That's the ony niggle I'm not sure about.

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

Test failures atm are due to git.ligo.org being down.

@josh-willis
Copy link
Contributor

@josh-willis Do you recall why we were setting the mode at all in FFTW to RTLD_GLOBAL? That's the ony niggle I'm not sure about.

I'm afraid I don't. The things I would want to check are that (for instance) we can load wisdom from a file and then that wisdom gets used, since that's a variable internal inside of FFTW that we can't inspect. I may have just added that "to be sure."

Also, reading over the ctypes docs, I see this:


ctypes.RTLD_GLOBAL

    Flag to use as mode parameter. On platforms where this flag is not available, it is defined as the integer zero.

ctypes.RTLD_LOCAL

    Flag to use as mode parameter. On platforms where this is not available, it is the same as RTLD_GLOBAL.

ctypes.DEFAULT_MODE

    The default mode which is used to load shared libraries. On OSX 10.3, this is RTLD_GLOBAL, otherwise it is the same as RTLD_LOCAL.

So before, when in FFTW we were explicitly passing RTLD_GLOBAL, on new enough macs, we may in fact have been passing zero, since we were not opening it with the default for that OS.

@josh-willis
Copy link
Contributor

@josh-willis Do you recall why we were setting the mode at all in FFTW to RTLD_GLOBAL? That's the ony niggle I'm not sure about.

I'm afraid I don't. The things I would want to check are that (for instance) we can load wisdom from a file and then that wisdom gets used, since that's a variable internal inside of FFTW that we can't inspect. I may have just added that "to be sure."

Well I should read my own code comments, which say:

 
#We need to construct them directly with CDLL so
# we can give the RTLD_GLOBAL mode, which we must do
# in order to use the threaded libraries as well.

so I think the issue was that if we use a multi-threaded FFT, then FFTW needs to link against additional FFTW libraries that contain the threading support (and you have to choose which!), and we need these to all somehow live in the same "FFTW" namespace in the module.

So we definitely need to check that threading still works correctly after this change. I think it should, from the description of the RTLD modes in the docs, and also our test suite should exercise that.

One other thing: do we also want to test building and running with this fix under conda? How confident are we that it will work there? Mostly because I know that conda relies heavily on using RPATH in the libraries it builds to tie them into the environment. I think that if DEEPCOPY behaves as documented then this shouldn't matter, but still...

@ahnitz
Copy link
Member Author

ahnitz commented Jun 14, 2022

@josh-willis I tested this fixed the initial issue reported in #4052 in conda.

@josh-willis @spxiwh Is there anything specific we need to run that isn't covered in the unittests, or should we merge this now?

I think after we merge we should also do a release.

# is similar
RTLD_DEEPBIND = 0
if hasattr(os, 'RTLD_DEEPBIND'):
RTLD_DEEPBIND = os.RTLD_DEEPBIND

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming all unit tests pass. And I agree we should do a release after this.

@josh-willis
Copy link
Contributor

As none of the codebeat issues even seem related to these changes, I am fine with merging this whenever you are ready @ahnitz . I'll leave that to you, and also to wait and see if @spxiwh has additional feedback.

Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks good to me. For anyone referring to this PR in the future, we should be clear that this failure won't always happen if you have MKL and FFTW installed on conda. For e.g. our basic tests use conda, and install both MKL and FFTW, and work fine (though they may use pip install to bring in at least MKL). The IGWN conda environments also seem to work fine (and I assume there conda is used to bring in MKL). I think the common factor is some form of accelerated libraries (BLAS??, numpy??) that are brought in when you want MKL for more than just doing FFTs.

Anyway, happy to have this merged and let's cut a new release.

@ahnitz ahnitz merged commit 047fb00 into gwastro:master Jun 15, 2022
connor-mcisaac pushed a commit to connor-mcisaac/pycbc that referenced this pull request Oct 12, 2022
* update fftw

* use OS given RTLD_DEEPBIND

* set default

* point to libutils

* update

* update

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants