Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] sycl::atomic_ref usage on CUDA backend produces linking error #5647

Closed
krasznaa opened this issue Feb 23, 2022 · 8 comments
Closed
Labels
bug Something isn't working compiler Compiler related issue cuda CUDA back-end

Comments

@krasznaa
Copy link
Contributor

Describe the bug

In the past I've made use of sycl::atomic for atomic operations in our code, which used to behave well on all backends that I tried. (Including the CUDA and HIP ones.) Now that the latest nightlies warn about sycl::atomic being deprecated, I tried to teach our code to use sycl::atomic_ref instead when it is available.

But when I do that, building the code for an NVIDIA backend fails like:

ptxas fatal   : Unresolved extern function '_Z18__spirv_AtomicIAddPiN5__spv5Scope4FlagENS0_19MemorySemanticsMask4FlagEi'
llvm-foreach: 
clang-15: error: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 15.0.0 (https://github.com/intel/llvm.git 64e92cbc41d1bcde9c728798e4e1fed9e3fab253)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).

To Reproduce

Take the following example:

#include <CL/sycl.hpp>
#include <cassert>
#include <iostream>

//#define USE_ATOMIC_REF

int main() {

  // Set up the queue.
  sycl::queue queue;
  std::cout << "Running on device: " << queue.get_device().get_info<sycl::info::device::name>() << std::endl;

  // Set up the array.
  const std::size_t ARRAY_SIZE = 100;
  int* mem = static_cast<int*>(sycl::malloc_shared(ARRAY_SIZE * sizeof(int), queue));
  for (std::size_t i = 0; i < ARRAY_SIZE; ++i) {
    mem[i] = 0;
  }

  // Modify the array atomically on a device.
  queue.submit([mem, ARRAY_SIZE](sycl::handler& h) {
		 h.parallel_for<class atomic_test>(sycl::range<1>(ARRAY_SIZE),
						   [mem](sycl::id<1> id) {
#ifdef USE_ATOMIC_REF
						     sycl::atomic_ref<int, sycl::memory_order::relaxed,
								      sycl::memory_scope::device> aref(mem[id]);
						     aref.fetch_add(1);
#else
						     sycl::atomic_fetch_add<int>(sycl::atomic<int>(sycl::global_ptr<int>(mem + id)), 1);
#endif
						   });
	       }).wait_and_throw();

  // Check the array's payload.
  for (std::size_t i = 0; i < ARRAY_SIZE; ++i) {
    assert(mem[i] == 1);
  }

  // Finish up.
  std::cout << "All OK!" << std::endl;
  sycl::free(mem, queue);
  return 0;
}

In this exact form (without USE_ATOMIC_REF being defined) it builds and runs correctly, albeit with a good number of warnings.

[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -g atomic_ref_error.cpp 
atomic_ref_error.cpp:30:46: warning: 'atomic<int, sycl::access::address_space::global_space>' is deprecated: sycl::atomic is deprecated since SYCL 2020 [-Wdeprecated-declarations]
                                                     sycl::atomic_fetch_add<int>(sycl::atomic<int>(sycl::global_ptr<int>(mem + id)), 1);
                                                                                       ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/atomic.hpp:171:7: note: 'atomic<int, sycl::access::address_space::global_space>' has been explicitly marked deprecated here
class __SYCL2020_DEPRECATED(
      ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:54:40: note: expanded from macro '__SYCL2020_DEPRECATED'
#define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
                                       ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:45:38: note: expanded from macro '__SYCL_DEPRECATED'
#define __SYCL_DEPRECATED(message) [[deprecated(message)]]
                                     ^
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'atomic_ref_error.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
2 warnings generated.
atomic_ref_error.cpp:30:46: warning: 'atomic<int, sycl::access::address_space::global_space>' is deprecated: sycl::atomic is deprecated since SYCL 2020 [-Wdeprecated-declarations]
                                                     sycl::atomic_fetch_add<int>(sycl::atomic<int>(sycl::global_ptr<int>(mem + id)), 1);
                                                                                       ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/atomic.hpp:171:7: note: 'atomic<int, sycl::access::address_space::global_space>' has been explicitly marked deprecated here
class __SYCL2020_DEPRECATED(
      ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:54:40: note: expanded from macro '__SYCL2020_DEPRECATED'
#define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
                                       ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:45:38: note: expanded from macro '__SYCL_DEPRECATED'
#define __SYCL_DEPRECATED(message) [[deprecated(message)]]
                                     ^
1 warning generated.
[bash][Legolas]:sycl > SYCL_DEVICE_FILTER=CUDA ./a.out 
Running on device: NVIDIA GeForce RTX 3080
All OK!
[bash][Legolas]:sycl >

But if I un-comment the line defining USE_ATOMIC_REF, the build fails with:

[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -g atomic_ref_error.cpp 
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'atomic_ref_error.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
ptxas fatal   : Unresolved extern function '_Z18__spirv_AtomicIAddPiN5__spv5Scope4FlagENS0_19MemorySemanticsMask4FlagEi'
llvm-foreach: 
clang-15: error: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 15.0.0 (https://github.com/intel/llvm.git 64e92cbc41d1bcde9c728798e4e1fed9e3fab253)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).
[bash][Legolas]:sycl >

I tried a few different arguments for the memory order and memory scope template arguments, but that didn't seem to make much of a difference.

Environment (please complete the following information)

  • OS: Ubuntu 20.04
  • Target device and vendor: NVIDIA GPU
  • DPC++ version:
[bash][Legolas]:sycl > clang++ -v
clang version 15.0.0 (https://github.com/intel/llvm.git 64e92cbc41d1bcde9c728798e4e1fed9e3fab253)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
Found CUDA installation: /home/krasznaa/software/cuda/11.5.2/x86_64-ubuntu2004, version 11.5
Found HIP installation: /opt/rocm, version 4.2.21155-37cb3a34
[bash][Legolas]:sycl >
  • Dependencies version: N/A

Pinging @fwyzard and @ivorobts.

@Michoumichmich
Copy link
Contributor

I had the same error previously. I believe it's because the default address space is generic. Using sycl::atomic_ref<int, sycl::memory_order::relaxed, sycl::memory_scope::device, sycl::access::address_space::global_space> aref(mem[id]); works with your code (on CUDA).

@Michoumichmich
Copy link
Contributor

Michoumichmich commented Feb 23, 2022

Yes it generic:

access::address_space AddressSpace =
and it's not supported on CUDA: #5215

@AerialMantis AerialMantis added cuda CUDA back-end compiler Compiler related issue labels Feb 23, 2022
@krasznaa
Copy link
Contributor Author

Thanks a lot! That indeed works. I'll have to check tomorrow if that combination of template parameters will work in our actual code under all circumstances, but it smells like it should.

@Michoumichmich
Copy link
Contributor

Michoumichmich commented Feb 23, 2022

@krasznaa Glad it helped :) You might still need to adapt the address_space (as global_space was just a suggestion, of course)

@zhongMou-lilSister
Copy link

zhongMou-lilSister commented Jun 24, 2022

I wrote

              atomic_ref<float, 
                memory_order::relaxed, 
                memory_scope::device,
                access::address_space::global_space> ao (shRow[colW - i * COL_BLK]);
              ao.fetch_add(valY * valW);

However, it gives this error

fatal error: error in backend: Cannot cast between two non-generic address spaces
llvm-foreach: 
clang-15: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 15.0.0 (https://github.com/intel/llvm a258a552dc28a10dda618aeb6da1057797bc18f8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/name/sycl_workspace/llvm/build/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).

@zhongMou-lilSister
Copy link

I wrote

              atomic_ref<float, 
                memory_order::relaxed, 
                memory_scope::device,
                access::address_space::global_space> ao (shRow[colW - i * COL_BLK]);
              ao.fetch_add(valY * valW);

However, it gives this error

fatal error: error in backend: Cannot cast between two non-generic address spaces
llvm-foreach: 
clang-15: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 15.0.0 (https://github.com/intel/llvm a258a552dc28a10dda618aeb6da1057797bc18f8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/name/sycl_workspace/llvm/build/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).

When I tried atomic<float>(global_ptr<float>(&shRow[colW - i * COL_BLK])).fetch_add(valY * valW);, it says float aren't available yet. error: static_assert failed due to requirement '!std::is_same<float, float>::value' "SYCL atomic function not available for float type", and it seems that this "atomic().fetch_add()" function is somewhat deprecated.

Thanks in advance!

@npmiller
Copy link
Contributor

This has been fixed in the CUDA backend in: #7391

@krasznaa
Copy link
Contributor Author

Let me close this then. It has not affected out code in quite some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler Compiler related issue cuda CUDA back-end
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants