Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example fails to execute on amd gpu #1059

Closed
hiaselhans opened this issue Jan 27, 2020 · 12 comments
Closed

example fails to execute on amd gpu #1059

hiaselhans opened this issue Jan 27, 2020 · 12 comments
Labels
enhancement New feature or request help wanted We don't have ability to look into this at the moment, but contributions are welcome hip Issues related to execution on HIP backend.

Comments

@hiaselhans
Copy link
Contributor

I couldn't get this to run with my amd-gpu.
One thing i noted was that sycl supports out-of-order queue devices only.
So i wonder, could we provide a compatibility matrix for non-intel-gpu-users? and also provide a check for CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE in the example?

below is my stack-trace.

malloc(): invalid size (unsorted)
./sycl.sh: Zeile 7: 99250 Abgebrochen             (Speicherabzug geschrieben) ./sycl
           PID: 99073 (sycl)
           UID: 1000 (simon)
           GID: 985 (users)
        Signal: 6 (ABRT)
     Timestamp: Mon 2020-01-27 09:59:22 CET (1min 54s ago)
  Command Line: ./sycl
    Executable: /home/simon/dev/para/gpufem2/sycl
 Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-shell-wayland.service
          Unit: user@1000.service
     User Unit: gnome-shell-wayland.service
         Slice: user-1000.slice
     Owner UID: 1000 (simon)
       Boot ID: b51b7840a444460387160f00266bd337
    Machine ID: 083de9055791414a8b3ff42ababe9529
      Hostname: simon-thinkpad
       Storage: /var/lib/systemd/coredump/core.sycl.1000.b51b7840a444460387160f00266bd337.99073.1580115562000000000000.lz4
       Message: Process 99073 (sycl) of user 1000 dumped core.
                
                Stack trace of thread 99073:
                #0  0x00007fcb3ea12f25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007fcb3e9fc897 abort (libc.so.6 + 0x25897)
                #2  0x00007fcb3ea56258 __libc_message (libc.so.6 + 0x7f258)
                #3  0x00007fcb3ea5d77a malloc_printerr (libc.so.6 + 0x8677a)
                #4  0x00007fcb3ea606c4 _int_malloc (libc.so.6 + 0x896c4)
                #5  0x00007fcb3ea61fc9 malloc (libc.so.6 + 0x8afc9)
                #6  0x00007fcb3bed45dd n/a (libamdocl-orca64.so + 0x2fc05dd)
                #7  0x00007fcb3b3a1fef n/a (libamdocl-orca64.so + 0x248dfef)
      Hostname: simon-thinkpad
       Storage: /var/lib/systemd/coredump/core.sycl.1000.b51b7840a444460387160f00266bd337.99073.1580115562000000000000.lz4
       Message: Process 99073 (sycl) of user 1000 dumped core.
                
                Stack trace of thread 99073:
                #0  0x00007fcb3ea12f25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007fcb3e9fc897 abort (libc.so.6 + 0x25897)
                #2  0x00007fcb3ea56258 __libc_message (libc.so.6 + 0x7f258)
                #3  0x00007fcb3ea5d77a malloc_printerr (libc.so.6 + 0x8677a)
                #4  0x00007fcb3ea606c4 _int_malloc (libc.so.6 + 0x896c4)
                #5  0x00007fcb3ea61fc9 malloc (libc.so.6 + 0x8afc9)
                #6  0x00007fcb3bed45dd n/a (libamdocl-orca64.so + 0x2fc05dd)
                #7  0x00007fcb3b3a1fef n/a (libamdocl-orca64.so + 0x248dfef)
           GID: 985 (users)
        Signal: 6 (ABRT)
     Timestamp: Mon 2020-01-27 09:59:22 CET (1min 54s ago)
  Command Line: ./sycl
    Executable: /home/simon/dev/para/gpufem2/sycl
 Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-shell-wayland.service
          Unit: user@1000.service
     User Unit: gnome-shell-wayland.service
         Slice: user-1000.slice
     Owner UID: 1000 (simon)
       Boot ID: b51b7840a444460387160f00266bd337
    Machine ID: 083de9055791414a8b3ff42ababe9529
      Hostname: simon-thinkpad
       Storage: /var/lib/systemd/coredump/core.sycl.1000.b51b7840a444460387160f00266bd337.99073.1580115562000000000000.lz4
       Message: Process 99073 (sycl) of user 1000 dumped core.
                
                Stack trace of thread 99073:
                #0  0x00007fcb3ea12f25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007fcb3e9fc897 abort (libc.so.6 + 0x25897)
                #2  0x00007fcb3ea56258 __libc_message (libc.so.6 + 0x7f258)
                #3  0x00007fcb3ea5d77a malloc_printerr (libc.so.6 + 0x8677a)
                #4  0x00007fcb3ea606c4 _int_malloc (libc.so.6 + 0x896c4)
                #5  0x00007fcb3ea61fc9 malloc (libc.so.6 + 0x8afc9)
                #6  0x00007fcb3bed45dd n/a (libamdocl-orca64.so + 0x2fc05dd)
                #7  0x00007fcb3b3a1fef n/a (libamdocl-orca64.so + 0x248dfef)
           PID: 99073 (sycl)
           UID: 1000 (simon)
           GID: 985 (users)
        Signal: 6 (ABRT)
     Timestamp: Mon 2020-01-27 09:59:22 CET (1min 54s ago)
  Command Line: ./sycl
    Executable: /home/simon/dev/para/gpufem2/sycl
 Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-shell-wayland.service
          Unit: user@1000.service
     User Unit: gnome-shell-wayland.service
         Slice: user-1000.slice
     Owner UID: 1000 (simon)
       Boot ID: b51b7840a444460387160f00266bd337
    Machine ID: 083de9055791414a8b3ff42ababe9529
      Hostname: simon-thinkpad
       Storage: /var/lib/systemd/coredump/core.sycl.1000.b51b7840a444460387160f00266bd337.99073.1580115562000000000000.lz4
       Message: Process 99073 (sycl) of user 1000 dumped core.
                
                Stack trace of thread 99073:
                #0  0x00007fcb3ea12f25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007fcb3e9fc897 abort (libc.so.6 + 0x25897)
                #2  0x00007fcb3ea56258 __libc_message (libc.so.6 + 0x7f258)
                #3  0x00007fcb3ea5d77a malloc_printerr (libc.so.6 + 0x8677a)
                #4  0x00007fcb3ea606c4 _int_malloc (libc.so.6 + 0x896c4)
                #5  0x00007fcb3ea61fc9 malloc (libc.so.6 + 0x8afc9)
                #6  0x00007fcb3bed45dd n/a (libamdocl-orca64.so + 0x2fc05dd)
                #7  0x00007fcb3b3a1fef n/a (libamdocl-orca64.so + 0x248dfef)
           PID: 99073 (sycl)
           PID: 99073 (sycl)
           UID: 1000 (simon)
           GID: 985 (users)
        Signal: 6 (ABRT)
     Timestamp: Mon 2020-01-27 09:59:22 CET (1min 54s ago)
  Command Line: ./sycl
    Executable: /home/simon/dev/para/gpufem2/sycl
 Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-shell-wayland.service
          Unit: user@1000.service
     User Unit: gnome-shell-wayland.service
         Slice: user-1000.slice
     Owner UID: 1000 (simon)
       Boot ID: b51b7840a444460387160f00266bd337
    Machine ID: 083de9055791414a8b3ff42ababe9529
      Hostname: simon-thinkpad
       Storage: /var/lib/systemd/coredump/core.sycl.1000.b51b7840a444460387160f00266bd337.99073.1580115562000000000000.lz4
       Message: Process 99073 (sycl) of user 1000 dumped core.
                
                Stack trace of thread 99073:
                #0  0x00007fcb3ea12f25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007fcb3e9fc897 abort (libc.so.6 + 0x25897)
                #2  0x00007fcb3ea56258 __libc_message (libc.so.6 + 0x7f258)
                #3  0x00007fcb3ea5d77a malloc_printerr (libc.so.6 + 0x8677a)
                #4  0x00007fcb3ea606c4 _int_malloc (libc.so.6 + 0x896c4)
                #5  0x00007fcb3ea61fc9 malloc (libc.so.6 + 0x8afc9)
                #6  0x00007fcb3bed45dd n/a (libamdocl-orca64.so + 0x2fc05dd)
                #7  0x00007fcb3b3a1fef n/a (libamdocl-orca64.so + 0x248dfef)
                #8  0x00007fcb3b3a1e21 n/a (libamdocl-orca64.so + 0x248de21)
                #9  0x00007fcb3b384890 n/a (libamdocl-orca64.so + 0x2470890)
                #10 0x00007fcb3b3849c7 n/a (libamdocl-orca64.so + 0x24709c7)
                #11 0x00007fcb3b385600 n/a (libamdocl-orca64.so + 0x2471600)
                #12 0x00007fcb3b36494c n/a (libamdocl-orca64.so + 0x245094c)
                #13 0x00007fcb3b3707e8 n/a (libamdocl-orca64.so + 0x245c7e8)
                #14 0x00007fcb3b336daa n/a (libamdocl-orca64.so + 0x2422daa)
                #15 0x00007fcb3a3e41bb n/a (libamdocl-orca64.so + 0x14d01bb)
                #16 0x00007fcb3a3dfb04 n/a (libamdocl-orca64.so + 0x14cbb04)
                #17 0x00007fcb3a3b0028 aclCompile (libamdocl-orca64.so + 0x149c028)
                #18 0x00007fcb39a06743 n/a (libamdocl-orca64.so + 0xaf2743)
                #19 0x00007fcb39a08d8f n/a (libamdocl-orca64.so + 0xaf4d8f)
                #20 0x00007fcb399f1998 n/a (libamdocl-orca64.so + 0xadd998)
                #21 0x00007fcb399dea52 clLinkProgram (libamdocl-orca64.so + 0xacaa52)
                #22 0x00007fcb3f203c45 n/a (/home/simon/dev/para/gpufem2/build/lib/libpi_opencl.so + 0x4c45)
                #23 0x00007fcb3edf4021 n/a (/home/simon/dev/para/gpufem2/build/lib/libsycl.so + 0x256021)
                #24 0x00007fcb3edf588f n/a (/home/simon/dev/para/gpufem2/build/lib/libsycl.so + 0x25788f)
                #25 0x0000000001f572d0 n/a (n/a + 0x0)
@bader
Copy link
Contributor

bader commented Jan 27, 2020

SYCL runtime library supports in-order queues as well.
@hiaselhans, could you provide a small reproducer for the issue (source code if possible), please?

@hiaselhans
Copy link
Contributor Author

@bader thanks for the fast response.

I am actually using more less the example.
It's running on Arch-Linux using the amdgpu-pro opencl drivers.

#include <CL/sycl.hpp>



int main() {

  class GPUSelector : public cl::sycl::device_selector {
  public:
    int operator()(const cl::sycl::device &Device) const override {
      using namespace cl::sycl::info;

      const std::string DeviceName = Device.get_info<device::name>();
      const std::string DeviceVendor = Device.get_info<device::vendor>();

      std::cout << DeviceName << " (" << DeviceVendor << ")" << std::endl;

      int score = Device.is_gpu();
      std::cout << score << std::endl;

      return score;
    }
  };


  // Creating SYCL queue
  GPUSelector gpu_selector;
  cl::sycl::queue Queue(gpu_selector);

  // Creating buffer of 4 ints to be used inside the kernel code
  cl::sycl::buffer<cl::sycl::cl_int, 1> Buffer(4);


  // Size of index space for kernel
  cl::sycl::range<1> NumOfWorkItems{Buffer.get_count()};

  std::cout << "x " << std::endl;
  // Submitting command group(work) to queue
  Queue.submit([&](cl::sycl::handler &cgh) {
    // Getting write only access to the buffer on a device
    auto Accessor = Buffer.get_access<cl::sycl::access::mode::write>(cgh);
    // Executing kernel
    cgh.parallel_for<class FillBuffer>(
        NumOfWorkItems, [=](cl::sycl::id<1> WIid) {
          // Fill buffer with indexes
          Accessor[WIid] = (cl::sycl::cl_int)WIid.get(0);
        });
  });
  std::cout << "x2 " << std::endl;

  // Getting read only access to the buffer on the host.
  // Implicit barrier waiting for queue to complete the work.
  const auto HostAccessor = Buffer.get_access<cl::sycl::access::mode::read>();

  std::cout << "x3 " << std::endl;
  // Check the results
  bool MismatchFound = false;
  for (size_t I = 0; I < Buffer.get_count(); ++I) {
    if (HostAccessor[I] != I) {
      std::cout << "The result is incorrect for element: " << I
                << " , expected: " << I << " , got: " << HostAccessor[I]
                << std::endl;
      MismatchFound = true;
    }
  }

  if (!MismatchFound) {
    std::cout << "The results are correct!" << std::endl;
  }

  return MismatchFound;
}

@bader
Copy link
Contributor

bader commented Jan 27, 2020

I don't see why this code can't be executed on non-Intel GPU.

                #17 0x00007fcb3a3b0028 aclCompile (libamdocl-orca64.so + 0x149c028)
                #18 0x00007fcb39a06743 n/a (libamdocl-orca64.so + 0xaf2743)
                #19 0x00007fcb39a08d8f n/a (libamdocl-orca64.so + 0xaf4d8f)
                #20 0x00007fcb399f1998 n/a (libamdocl-orca64.so + 0xadd998)
                #21 0x00007fcb399dea52 clLinkProgram (libamdocl-orca64.so + 0xacaa52)

It seems like OpenCL implementation failed to compile device code.
I looked at the device code I don't see any problems, so it might a be a bug in libamdocl-orca64.so.
I suggest reporting this issue to the team working on this OpenCL implementation.

You can get device code in LLVM IR or SPIR-V from from the source code using this instructions:
https://github.com/intel/llvm/blob/sycl/sycl/doc/SYCLCompilerUserManual.md#sycl-device-code-compilation

@hiaselhans
Copy link
Contributor Author

Well thanks for the fast response. I tried to investigate but didn't come further with the libamdocl...

However i tried on a different machine with a gtx 1050 card.
That card supports out-of-order-queues.
It failed too for this. I will try to make a test case to reproduce.

Running on GeForce GTX 1050
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  OpenCL API failed. OpenCL API returns: -34 (CL_INVALID_CONTEXT) -34 (CL_INVALID_CONTEXT)
./sycl.sh: Zeile 7: 22442 Abgebrochen             (Speicherabzug geschrieben) ./sycl 

@bader
Copy link
Contributor

bader commented Jan 27, 2020

This is expected. Our compiler produces binary, which relies on OpenCL driver to support SPIR-V format of device code. AFAIK, NVIDIA OpenCL doesn't support SPIR-V,
I'm not sure why return code is -34 (CL_INVALID_CONTEXT).
@romanovvlad is it possible to improve error reporting for this case?

@hiaselhans
Copy link
Contributor Author

Ok. Thanks for all the help!

I could contribute to the "getting started" md with some hints on how to find out if one's card is supported and maybe add some logic to the selector.

clinfo | grep -i spir

with clear instructions on the fact that it needs to support spir-v might already save some users time.
Error reporting too of course...

Anyhow I'm glad a big name took over on open standards in GPU computing!
:)

@bader
Copy link
Contributor

bader commented Jan 28, 2020

We have a few words about supported device in our FAQ document: https://github.com/intel/llvm/blob/sycl/sycl/doc/FAQ.md#q-what-devices-are-supported-by-intel-sycl-compiler.

If have any ideas how we can make this information more accessible, feel free to share them via issues or open a PR.

I could contribute to the "getting started" md with some hints on how to find out if one's card is supported and maybe add some logic to the selector.

I expected an error message from the default_selector to be a little bit more clear in this case.

clinfo | grep -i spir

NOTE: spir here is SPIR 1.2 - the old LLVM-based version of SPIR standard and our implementation uses new version - SPIR-V. I think the right name for the extension enabling SPIR-V support is cl_khr_il_program.

@bader
Copy link
Contributor

bader commented Feb 5, 2020

I looked at the device code I don't see any problems, so it might a be a bug in libamdocl-orca64.so.
I suggest reporting this issue to the team working on this OpenCL implementation.

I recently was educated that AMD OpenCL doesn't support SPIR-V, so it looks like AMD GPU require different compilation workflow. The one we use Intel devices requires SPIR-V support.

@bader bader added enhancement New feature or request help wanted We don't have ability to look into this at the moment, but contributions are welcome labels Feb 12, 2020
@jeffhammond
Copy link
Contributor

@hiaselhans I don't see it in this thread, but if you are not already aware, you should be able to use https://github.com/illuhad/hipSYCL to compile SYCL programs on AMD GPUs that support HIP. Unfortunately, HIP support for AMD GPUs is not as widespread as one would hope (https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support) but if you are so lucky as to be using the right Linux distro and AMD GPU architecture to get HIP support, hipSYCL will serve you well. If your AMD GPU doesn't support HIP, then I guess you'll have to communicate with the vendor.

Right now, DPC++ is ahead of hipSYCL w.r.t. SYCL 2020 features, so if you want to use e.g. USM, you'll have to wait a bit, but probably not too long. @illuhad might have more to say here.

@illuhad
Copy link

illuhad commented Aug 16, 2020

@hiaselhans To my knowledge, no AMD OpenCL driver stack supports SPIR-V. You will not be able to use DPC++ on those devices.
As @jeffhammond points out, hipSYCL supports AMD GPUs that are supported by AMD's ROCm compute platform (not amdgpu-pro). This includes in particular gfx9 (Vega 10 and Vega 20) chips as well as gfx8 (Fiji, Polaris) although gfx8 has occasionally been reported to be less stable. Unfortunately AMD does not yet support Navi (gfx 10) for ROCm compute, so if you have a Navi card it will be difficult.
There are hipSYCL packages for Arch Linux in the AUR. Additionally, we also provide binary packages, including the entire ROCm stack for Arch. See here for more details: https://github.com/illuhad/hipSYCL/blob/develop/doc/installing.md#repositories

SYCL 2020 in hipSYCL is indeed not yet as far as DPC++, but USM and all those fancy other SYCL 2020 will come sooner than later to hipSYCL. However, unless you have specific needs such as device memory oversubscription, complex pointer-based data structures or integration with existing pointer-based code I would generally recommend to stick with accessors and buffers instead of USM for performance and portability reasons anyway.
From a quick glance I don't see any SYCL 2020 features in your code, so hipSYCL might work well for you. I do however see you use cl_int. Note that there's zero OpenCL interoperability in hipSYCL because it does not use OpenCL as backend.

@hiaselhans
Copy link
Contributor Author

hiaselhans commented Aug 17, 2020

yes, sorry i forgot this issue was still open.
I learned that without Rocm there is no OpenCl >= 2 support.

I'm using unnamed kernel functions and have a CUDA card now...

@illuhad
Copy link

illuhad commented Aug 18, 2020

I learned that without Rocm there is no OpenCl >= 2 support.

Even with ROCm you don't get SPIR-V.

I'm using unnamed kernel functions and have a CUDA card now...

current hipSYCL supports unnamed kernel lambdas when using clang >= 10.

@bader bader added the hip Issues related to execution on HIP backend. label Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted We don't have ability to look into this at the moment, but contributions are welcome hip Issues related to execution on HIP backend.
Projects
None yet
Development

No branches or pull requests

4 participants