-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MKL FFT failures #110
Comments
I thought this problem looked suspiciously familiar... I actually mostly fixed this in an earlier commit: But this was done directly to master accidentally as part of other changes and then reverted. The fix / cleanup was then never migrated to a branch for proper testing and merging. I'll cherry pick this into a branch now for testing. |
With the latest master branch (after merging the reworking of MKL FFTs in ee43a97), all C++ and python unit tests pass on a workstation using Intel compilers and MKL version 2018-beta. On edison.nersc.gov, using gcc and MKL, the following behavior is seen:
Leaving this issue open to track why this combination (gcc + MKL on edison) fails while a pure Intel build does not. |
Testing this on cori.nersc.gov (KNL) with Intel compilers and MKL also passes all tests. So this seems to be an interaction between MKL and gcc-6.3 on edison. I will test that same combination on a workstation to determine if it is NERSC-specific. |
Ok, confirmed that I see the same failures on a linux workstation with latest MKL and gcc-6.3. That should make it much easier to debug. |
On deeper inspection, this is actually a problem with the persistent cache of fft plans. In the older versions of toast, we had independent OpenMP threads that were processing pieces of the data and so we needed different FFT plans for each thread. In the current version of toast, we made a design decision to keep threading at a lower level, so that mid-level operations could be done in python. So rather than individual threads doing separate FFTs, we rely on the FFT libraries being smart about threading (which for 1D FFTs is not great, but...) The per-process cache of FFT plans should then have only one copy of the plan rather than one per thread. I now have unit tests that use the plan cache and fail as they should. Working on this fix. |
OK, I found the real source of the problem. We always link to the single-library version of MKL (libmkl_rt). However, as documented here: https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/657060 This defaults to using the Intel OpenMP library rather than the GNU one. This means that when compiling with gcc and linking to MKL, operations with OMP_NUM_THREADS > 1 will give the wrong answer. The solution is to explicitly link to the full set of individual MKL libraries, and to use the correct set depending on whether one is compiling with the Intel compilers or gcc. |
I explored this in more detail, and implemented a test during configure that links to MKL differently depending on the compiler used. This works when linking C++ code (like the toast_test executable used to run the compiled unit tests). However, when we dlopen (through ctypes) the libtoast.so library, it tries to dlopen its chain of dependency libraries and claims that libmkl_gnu_thread has missing symbols (it does, and they are defined in libmkl_core). This seems to leave 3 choices:
Static linking in option 2 might introduce problems in cross compilation. Also, trying to get the linking of libtoast "just right" so that option 1 works could be very machine / compiler / mkl version specific. Instead, I think option 3 is the way to go. There would be toast function "toast::mkl_init()" which would:
|
The discussion so far in this issue holds true and makes sense for C++ code linking to MKL. If you use gnu compilers and -fopenmp, but then want to link to threaded MKL, you must ensure that the gnu threading interface is used (by calling mkl_set_threading_layer() or setting the proper environment variable). Things become more difficult when throwing in anaconda python. Anaconda (and Intel Distribution for Python) ship with their own version of libmkl_rt and the Intel thread interface (no libmkl_gnu_thread.so). Since Python / Numpy does not use OpenMP, this is fine- the Intel thread library is always used. In toast, we can have the situation where we have a threaded C++ library built with gnu compilers, linking to threaded MKL. We can make these play together using the techniques above. However, we are dlopen'ing that library from python (using ctypes). When this happens, somehow libtoast.so calls to MKL end up using symbols / functions in the libmkl_rt from anaconda. For compatibility, I believe we will either have to statically link to MKL when building toast, OR simply not allow use of MKL with non-Intel compilers. In the latter case, it is troubling that our runtime calls would be using the MKL shipped with python packages rather than the (potentially newer) MKL directly from Intel at compile time. For example, this means that the C++ unit tests run from the toast_test executable and the same tests run from within the python toast.test() function would have different code paths (through different MKL libraries). This point is one that pushes us towards static linking of MKL at compile time. |
I have looked into the static linking option, and even after jumping through hoops with libtool this does not really work. I think the only sane path here is to simply only use MKL when we are also using the Intel compiler. Again the solution for pure C++ code is straightforward (use libmkl_rt with threading layer set to gnu or intel, OR link explicitly to the list of MKL libraries). When mixing threaded C++ code that uses MKL and loading this into python which has a different MKL library already loaded, the C++ code must be compatible with the MKL used by python. Which means it must use intel threads. Which means it must be built with the intel compiler. |
Closed by #114 |
When building with gcc and linking to modern MKL versions, the MKL unit tests fail. This happens for both single and batched FFTs. This problem is not seen when using FFTW. I will disable MKL FFTs in the mean time.
The text was updated successfully, but these errors were encountered: