Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen libOpenCL by SONAME to support split prefix distributions #472

Closed
maleadt opened this issue Apr 15, 2024 · 3 comments
Closed

dlopen libOpenCL by SONAME to support split prefix distributions #472

maleadt opened this issue Apr 15, 2024 · 3 comments

Comments

@maleadt
Copy link

maleadt commented Apr 15, 2024

Summary

MKL dynamically loads libOpenCL.so by doing dlopen("libOpenCL.so"). This ignores any previously loaded copy of the OpenCL loader, and instead MKL should dlopen by SONAME to re-use the already loaded copy.

Version

oneMKL from oneAPI 2024.1.0

Environment

Tested on Linux with MKL installed from Conda.

Observed behavior

At run time, oneMKL dynamically loads libOpenCL.so, for example here during execution of sgemm:

#0  ___dlopen (file=0x7fffffffad08 "libOpenCL.so", mode=257) at ./dlfcn/dlopen.c:77
#1  0x00007ffc9b99dbf8 in mkl_cl_load_lib () from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#2  0x00007ffc994395fd in oneapi::mkl::gpu::mkl_gpu_map_l0_to_cl(int*, _ze_device_handle_t*, _cl_device_id**, _cl_context**) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#3  0x00007ffc994364bb in oneapi::mkl::gpu::add_arch_info(sycl::_V1::queue*, oneapi::mkl::gpu::mkl_gpu_device_info_t*) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#4  0x00007ffc99438ed7 in oneapi::mkl::gpu::get_device_info_with_arch(sycl::_V1::queue*, oneapi::mkl::gpu::mkl_gpu_device_info_t*) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#5  0x00007ffc9acf1a6a in oneapi::mkl::gpu::mkl_blas_gpu_sgemm_driver_sycl(int*, sycl::_V1::queue*, oneapi::mkl::gpu::blas_arg_usm_t*, mkl_gpu_event_list_t*) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#6  0x00007ffc9acdc93b in oneapi::mkl::gpu::sgemm_sycl_internal(sycl::_V1::queue*, MKL_LAYOUT, MKL_TRANSPOSE, MKL_TRANSPOSE, long, long, long, oneapi::mkl::value_or_pointer<float>, float const*, long, float const*, long, oneapi::mkl::value_or_pointer<float>, float*, long, oneapi::mkl::blas::compute_mode, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&, long, long, long) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#7  0x00007ffc9acdaf5d in oneapi::mkl::gpu::sgemm_sycl(sycl::_V1::queue*, MKL_LAYOUT, MKL_TRANSPOSE, MKL_TRANSPOSE, long, long, long, oneapi::mkl::value_or_pointer<float>, float const*, long, float const*, long, oneapi::mkl::value_or_pointer<float>, float*, long, oneapi::mkl::blas::compute_mode, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&, long, long, long) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#8  0x00007ffc9b71ad22 in oneapi::mkl::blas::sgemm(sycl::_V1::queue&, MKL_LAYOUT, oneapi::mkl::transpose, oneapi::mkl::transpose, long, long, long, oneapi::mkl::value_or_pointer<float>, float const*, long, float const*, long, oneapi::mkl::value_or_pointer<float>, float*, long, oneapi::mkl::blas::compute_mode, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4
#9  0x00007ffc9b6a2f2f in oneapi::mkl::blas::column_major::gemm(sycl::_V1::queue&, oneapi::mkl::transpose, oneapi::mkl::transpose, long, long, long, oneapi::mkl::value_or_pointer<float>, float const*, long, float const*, long, oneapi::mkl::value_or_pointer<float>, float*, long, oneapi::mkl::blas::compute_mode, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) ()
   from /home/sdp/.julia/artifacts/c834e5913bd3a3923a5029184fba0ad0af2d08e6/lib/libmkl_sycl_blas.so.4

It does so by simply calling dlopen("libOpenCL.so"). This is problematic because it does not respect any previously loaded copy of the OpenCL loader, as can be seen in the LD_DEBUG=libs output of my application, here resulting in multiple copies of libOpenCL getting loaded:

# libOpenCL.so as loaded by my application
calling init: /vendored/lib/libOpenCL.so

# MKL loading a second copy
find library=libOpenCL.so [0]; searching
calling init: /lib/libOpenCL.so

Although libOpenCL.so seems to be fine with multiple copies of the library getting loaded, it doesn't seem like a good idea. What's worse is that there might not be a globally discoverable libOpenCL.so, resulting in MKL breaking down when it tries to load libOpenCL.so:

calling init: /vendored/lib/libOpenCL.so

find library=libOpenCL.so [0]; searching
Intel MKL FATAL ERROR: Error on loading function 'clGetPlatformIDs'.

Expected behavior

MKL should load libOpenCL.so by SONAME, by simply switching (or first trying) to dlopen("libOpenCL.so.1"). This would allow MKL to re-use the already loaded copy of the OpenCL loader, and avoid the issues with multiple copies of the library getting loaded, or the library not being found at all.

The above situation is common with environments that use so-called split prefixes, where libraries are not globally discoverable. In Julia, we make sure the necessary dependencies are discoverable by eagerly dlopening them, leading to the situation described above. However, there are other projects using similar split prefixes, like Nix and spack, so this issue is not limited to Julia.

@mkrainiuk
Copy link
Contributor

Thank you for rootcausing the issue! Looks like the issue is in Intel oneMKL product, so I reported the problem to the product team. Should be fixed in the next releases.
Since the problem is not related to the opensource oneMKL interfaces project, please let me know if I can close this issue? I can post here an update when Intel oneMKL release will include the fix.

@maleadt
Copy link
Author

maleadt commented Apr 18, 2024

Since the problem is not related to the opensource oneMKL interfaces project, please let me know if I can close this issue? I can post here an update when Intel oneMKL release will include the fix.

Ah OK, I didn't realize this tracker is only for the interfaces. Yes, this issue can be closed then.

What's the place to file issues against oneMKL itself then? #473 would probably be better suited there as well.

@mkrainiuk
Copy link
Contributor

What's the place to file issues against oneMKL itself then? #473 would probably be better suited there as well.

For Intel oneMKL related problems we usually work with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants