-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MKL causes FFTW to segfault if loaded first #4054
Conversation
351d95b
to
828f410
Compare
This fixes the test case segfault reported in #4052 |
pycbc/libutils.py
Outdated
# library rather than first going to the global symbol table | ||
# This should be used when multiple libraries may export incompatible | ||
# symbols | ||
RTLD_DEEPBIND = 0x00008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahnitz I don't know if you saw my comments in slack, but is is easy for you to test rather than hard-coding this, to instead do:
import os
RTLD_DEEPBIND = os.RTLD_DEEPBIND
Is that a hard test for you to do in short order, or do you otherwise prefer not to do that? I guess we really need to wrap that in a test that the imported os
has an RTLD_DEEPBIND
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer using this (thanks Josh!), but it seems like mac does not have this. If we are right that this flag is not needed in mac anyway, then perhaps just wrap this in a "is linux" block and use default flags for mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the least it should probably really be:
import os
if hasattr(os, RTLD_DEEPBIND):
RTLD_DEEPBIND = os.RTLD_DEEPBIND
else
RTLD_DEEPBIND = 0x00008 # Or maybe we set it to RTLD_GLOBAL if that flag
# just doesn't exist on a given platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-willis That's my fault, yeah, we should use the os provided value. I just saw that ctypes didn't have it assigned and jumped to the assumption that it wasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same value, so yes, in testing they both work for linux at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-willis Ok, looks like that this doesn't work on macos unfortunately as the variable doesn't exist on that platform. Darn.
@@ -30,8 +30,8 @@ | |||
# We need to construct them directly with CDLL so | |||
# we can give the RTLD_GLOBAL mode, which we must do | |||
# in order to use the threaded libraries as well. | |||
double_lib = get_ctypes_library('fftw3',['fftw3'],mode=ctypes.RTLD_GLOBAL) | |||
float_lib = get_ctypes_library('fftw3f',['fftw3f'],mode=ctypes.RTLD_GLOBAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the mode be set deeper, ie. in the get_ctypes_library
function itself, so that it is used all the time when using ctypes??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... Or make this flag the default??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check, but I think this would make sense. Certainly if we are putting logic to decide whether or not a given installation has (or should use) RTLD_DEEPCOPY
versus RTLD_GLOBAL
, then I think it makes sense to do that in only one place. However I think I prefer the route of making it a default, so that we retain the ability to do something different on a library by library basis, in case we find we need to do something different elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-willis @spxiwh I'm also OK with making DEEPBIND the 'default' assuming this works on all the platforms. Reading its documentation, it certainly sounds like what we would have wanted the default to be, at least in the common scenario where we are essentially treating each shared library as an isolated blob to call.
@josh-willis @spxiwh If there aren't any objections, I think we go with the current PR, where we both set the defaul to DEEPBIND, but also explicitly set so in the fftw package so we don't accidentally backslide on that. |
I'm happy to have this be the default and be set in fftw.py. But I think we do need to use I would prefer not changing the previous behaviour at all on mac, and using I tried running the initial test at the top of #4052 on my mac and it ran fine anyway!! |
@spxiwh Is the current change OK? I have it read from os.RTLD_DEEPBIND, but if not defined, it will equivalent to an unset flag. I'd like to keep the name though for use so the intention is clear in other modules when it is used, even if it essentially becomes a NOOP on macos |
pycbc/libutils.py
Outdated
# | ||
# This is only defined for linux systems, on macosx, the default behavior | ||
# is similar | ||
RTLD_DEEPBIND = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same as previous behaviour (on mac). Before we would call ctypes.CDLL
with mode not set, and the default is then ctypes.DEFAULT_MODE
which is 4 on mac, not 0 as it is on linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is set to 4, that would be RTLD_LOCAL, but we were explicitly setting it to RTLD_GLOBAL before.
https://github.com/realthunder/mac-headers/blob/master/usr/include/dlfcn.h
So should I set it to RTLD_GLOBAL for MacOSX, that would replicate the existing input we gave in FFTW, but I'm not sure if that's a sane default or not in MacOSX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh darn, it looks like the bit definitions aren't even compatible, for comparison with glibc https://code.woboq.org/userspace/glibc/bits/dlfcn.h.html
# is similar | ||
RTLD_DEEPBIND = 0 | ||
if hasattr(os, 'RTLD_DEEPBIND'): | ||
RTLD_DEEPBIND = os.RTLD_DEEPBIND | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so maybe taking @spxiwh 's comment into account, what we want here is:
if hasattr(os, 'RTLD_DEEPBIND'):
DEFAULT_RTLD_MODE = os.RTLD_DEEPBIND
else:
DEFAULT_RTLD_MODE = ctypes.DEFAULT_MODE
and then change below to:
def get_ctypes_library(libname, packages, mode=DEFAULT_RTLD_MODE)
which then also allows override in individual libraries. Or would you rather export something like LIBUTILS_RTLD_DEEPCOPY
and have that the default, so that we are then also explicitly setting that in fftw
rather than relying on the default? Although it is arguably also confusing to have something that implies you're doing DEEPCOPY
but in fact on some os's it is a noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-willis Yeah, either way, non-perfect, but I think I'll opt for just setting up the default as you've proposed and fftw won't set the mode at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming all unit tests pass. And I agree we should do a release after this.
@josh-willis Do you recall why we were setting the mode at all in FFTW to RTLD_GLOBAL? That's the ony niggle I'm not sure about. |
Test failures atm are due to git.ligo.org being down. |
I'm afraid I don't. The things I would want to check are that (for instance) we can load wisdom from a file and then that wisdom gets used, since that's a variable internal inside of FFTW that we can't inspect. I may have just added that "to be sure." Also, reading over the
So before, when in FFTW we were explicitly passing |
Well I should read my own code comments, which say:
so I think the issue was that if we use a multi-threaded FFT, then FFTW needs to link against additional FFTW libraries that contain the threading support (and you have to choose which!), and we need these to all somehow live in the same "FFTW" namespace in the module. So we definitely need to check that threading still works correctly after this change. I think it should, from the description of the RTLD modes in the docs, and also our test suite should exercise that. One other thing: do we also want to test building and running with this fix under conda? How confident are we that it will work there? Mostly because I know that conda relies heavily on using |
@josh-willis I tested this fixed the initial issue reported in #4052 in conda. @josh-willis @spxiwh Is there anything specific we need to run that isn't covered in the unittests, or should we merge this now? I think after we merge we should also do a release. |
# is similar | ||
RTLD_DEEPBIND = 0 | ||
if hasattr(os, 'RTLD_DEEPBIND'): | ||
RTLD_DEEPBIND = os.RTLD_DEEPBIND | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming all unit tests pass. And I agree we should do a release after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks good to me. For anyone referring to this PR in the future, we should be clear that this failure won't always happen if you have MKL and FFTW installed on conda. For e.g. our basic tests use conda, and install both MKL and FFTW, and work fine (though they may use pip install
to bring in at least MKL). The IGWN conda environments also seem to work fine (and I assume there conda is used to bring in MKL). I think the common factor is some form of accelerated libraries (BLAS??, numpy??) that are brought in when you want MKL for more than just doing FFTs.
Anyway, happy to have this merged and let's cut a new release.
* update fftw * use OS given RTLD_DEEPBIND * set default * point to libutils * update * update * fix
This changes FFTW to use RTLD_DEEPBIND when loading the shared libraries as opposed to RTLD_GLOBAL. The latter is ineffective if MKL is first loaded (which can happen due to innocuous reasons like someone loading numpy before pycbc) because the symbols will still point to the MKL version. DEEPBIND enforces that the symbol lookup first goes to the loaded library rather than the global table so we can ensure FFTW functions are actually called when they are intended.