Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] New segmentation fault when using get_kernel_bundle #5618

Closed
Michoumichmich opened this issue Feb 20, 2022 · 0 comments · Fixed by #5624
Closed

[Regression] New segmentation fault when using get_kernel_bundle #5618

Michoumichmich opened this issue Feb 20, 2022 · 0 comments · Fixed by #5624
Assignees
Labels
bug Something isn't working confirmed runtime Runtime library related issue

Comments

@Michoumichmich
Copy link
Contributor

Michoumichmich commented Feb 20, 2022

Hi,
Since #5496 I have segmentation faults when using sycl::get_kernel_bundle.
The following code works as expected:

#include <sycl/sycl.hpp>

template<typename KernelName> static inline size_t max_work_groups_for_kernel(sycl::queue q) {
    size_t max_items = std::max(1U, std::min(4096U, static_cast<uint32_t>(q.get_device().get_info<sycl::info::device::max_work_group_size>())));
#if defined(SYCL_IMPLEMENTATION_INTEL) || defined(SYCL_IMPLEMENTATION_ONEAPI)
    try {
        sycl::kernel_id id = sycl::get_kernel_id<KernelName>();
        auto kernel = sycl::get_kernel_bundle<sycl::bundle_state::executable>(q.get_context()).get_kernel(id);
        max_items = std::min(max_items, kernel.get_info<sycl::info::kernel_device_specific::work_group_size>(q.get_device()));
    } catch (std::exception& e) {
        std::cout << "Couldn't read kernel properties for device: " << q.get_device().get_info<sycl::info::device::name>() << " got exception: " << e.what() << std::endl;
    }
#endif
    return max_items;
}

class my_kernel;

int main() {
    sycl::queue q{};
    q.parallel_for<my_kernel>(sycl::range(100), [=](sycl::item<1> it) { (void) it.get_id(); }).wait();
    std::cout << max_work_groups_for_kernel<my_kernel>(q) << std::endl;
}

However when the same code is used in another project (https://github.com/Michoumichmich/MolecularDynamics), it systematically fails. That project builds several backends as shared libraries. They all depends on SYCL. As discussed here: KhronosGroup/SYCL-Docs#209, each "kernel query" is (properly?) compiled in the same translation unit as where the kernels are defined (and shared libraries are compiled with -fvisibility=hidden) and everything works fine (on the "commercial" version of oneAPI too). Since 4817b3f using the code above "somewhere in the main.cpp" or a shared library systematically results in a segfault here:

ProgramManager::getSYCLDeviceImagesWithCompatibleState(

To Reproduce
To build and run the reproducer:

git clone https://github.com/Michoumichmich/MolecularDynamics -b sycl_kernel_bundle_segfault_reproducer
cd MolecularDynamics && mkdir -p build && cd build
CXX=clang++ cmake .. -DSYCL=CPU && cmake --build . main
SYCL_DEVICE_FILTER=cpu ./main

It seems to affect at least the CPU and CUDA backends

And the backtrace:

(gdb) bt
#0  0x00007ffff6dfbe3a in cl::sycl::detail::ProgramManager::getSYCLDeviceImagesWithCompatibleState(cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&, cl::sycl::bundle_state, std::vector<cl::sycl::kernel_id, std::allocator<cl::sycl::kernel_id> > const&) () from /home/michel/sycl_workspace/deploy/lib/libsycl.so.5
#1  0x00007ffff6e00419 in cl::sycl::detail::ProgramManager::getSYCLDeviceImages(cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&, cl::sycl::bundle_state) ()
   from /home/michel/sycl_workspace/deploy/lib/libsycl.so.5
#2  0x00007ffff6e8e8df in cl::sycl::detail::kernel_bundle_impl::kernel_bundle_impl(cl::sycl::context, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> >, cl::sycl::bundle_state) ()
   from /home/michel/sycl_workspace/deploy/lib/libsycl.so.5
#3  0x00007ffff6e8e6b7 in void __gnu_cxx::new_allocator<cl::sycl::detail::kernel_bundle_impl>::construct<cl::sycl::detail::kernel_bundle_impl, cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&, cl::sycl::bundle_state&>(cl::sycl::detail::kernel_bundle_impl*, cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&, cl::sycl::bundle_state&) ()
   from /home/michel/sycl_workspace/deploy/lib/libsycl.so.5
#4  0x00007ffff6e87475 in cl::sycl::detail::get_kernel_bundle_impl(cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&, cl::sycl::bundle_state) ()
   from /home/michel/sycl_workspace/deploy/lib/libsycl.so.5
#5  0x0000000000407628 in cl::sycl::kernel_bundle<(cl::sycl::bundle_state)2> cl::sycl::get_kernel_bundle<(cl::sycl::bundle_state)2>(cl::sycl::context const&, std::vector<cl::sycl::device, std::allocator<cl::sycl::device> > const&) ()
#6  0x000000000040734e in cl::sycl::kernel_bundle<(cl::sycl::bundle_state)2> cl::sycl::get_kernel_bundle<(cl::sycl::bundle_state)2>(cl::sycl::context const&) ()

Environment:

  • OS: RedHat Entreprise Linux 8
  • Target device and vendor: all?
  • DPC++ version: beb7277
@Michoumichmich Michoumichmich added the bug Something isn't working label Feb 20, 2022
Michoumichmich added a commit to Michoumichmich/MolecularDynamics that referenced this issue Feb 20, 2022
@Michoumichmich Michoumichmich changed the title [Regression] New segmentation faults when using get_kernel_bundle [Regression] New segmentation fault when using get_kernel_bundle Feb 20, 2022
@alexbatashev alexbatashev added the runtime Runtime library related issue label Feb 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed runtime Runtime library related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants