Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA device code does not support variadic functions #58410

Open
vchuravy opened this issue Oct 17, 2022 · 9 comments
Open

CUDA device code does not support variadic functions #58410

vchuravy opened this issue Oct 17, 2022 · 9 comments
Assignees
Labels
backend:NVPTX clang:frontend Language frontend issues, e.g. anything involving "Sema" cuda

Comments

@vchuravy
Copy link
Contributor

vchuravy commented Oct 17, 2022

I was looking at atomics support on CUDA: https://godbolt.org/z/3WqPYxEve

#include <cuda/std/atomic>

__global__ void k1(cuda::std::atomic<int>& a) {
    a++;
}

__global__ void k2(cuda::std::atomic<int>* a) {
    a->load(cuda::std::memory_order_seq_cst);
}

// __global__ void k3(cuda::std::atomic<int>* a) {
//     a->load(cuda::std::memory_order_acq_rel);
// }

__global__ void k4(cuda::std::atomic<int>* a) {
    a->load(cuda::std::memory_order_relaxed);
}

int main() {}

fails on Clang with:

In file included from <source>:1:
In file included from /opt/compiler-explorer/cuda/11.3.1/include/cuda/std/atomic:42:
In file included from /opt/compiler-explorer/cuda/11.3.1/include/cuda/std/type_traits:24:
/opt/compiler-explorer/cuda/11.3.1/include/cuda/std/detail/libcxx/include/type_traits:520:12: error: CUDA device code does not support variadic functions
false_type __sfinae_test_impl(...);
           ^
/opt/compiler-explorer/cuda/11.3.1/include/cuda/std/detail/libcxx/include/type_traits:1059:69: error: CUDA device code does not support variadic functions
    template <class _Tp> _LIBCUDACXX_INLINE_VISIBILITY static __two __test(...);
                                                                    ^
/opt/compiler-explorer/cuda/11.3.1/include/cuda/std/detail/libcxx/include/type_traits:1171:5: error: CUDA device code does not support variadic functions
    __any(...);
    ^
/opt/compiler-explorer/cuda/11.3.1/include/cuda/std/detail/libcxx/include/type_traits:1848:16: error: CUDA device code does not support variadic functions
   static void __test(...);
               ^
/opt/compiler-explorer/cuda/11.3.1/include/cuda/std/detail/libcxx/include/type_traits:4332:16: error: CUDA device code does not support variadic functions
  static __nat __try_call(...);
               ^
5 errors generated when compiling for sm_86.
Compiler returned: 1

cc: @wsmoses

@vchuravy vchuravy added cuda clang:frontend Language frontend issues, e.g. anything involving "Sema" backend:NVPTX labels Oct 17, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 17, 2022

@llvm/issue-subscribers-clang-frontend

@Artem-B
Copy link
Member

Artem-B commented Oct 17, 2022

It's a known issue that can be worked around with -Xclang -fcuda-allow-variadic-functions.
We do not know how to lower varargs on the GPU side (though it may be possible in recent CUDA versions), but in this particular case variadic functions are only used as function declarations used for SFINAE, so it's OK to allow them during parsing.

I guess we may eventually relax the restriction on variadic arguments and make them deferred, so they become an error only if we end up generating code for such function.

@Artem-B Artem-B self-assigned this Oct 17, 2022
@Artem-B
Copy link
Member

Artem-B commented Oct 17, 2022

PTX does not support variadic functions directly, but does allow passing unsized array parameter to a function which can be used to implement variadic functions. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#variadic-functions

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func

The last parameter in the parameter list may be a .param array of type .b8 with no size specified. It is used to pass an arbitrary number of parameters to the function packed into a single array object.

When calling a function with such an unsized last argument, the last argument may be omitted from the call instruction if no parameter is passed through it. Accesses to this array parameter must be within the bounds of the array. The result of an access is undefined if no array was passed, or if the access was outside the bounds of the actual array being passed.

@PavelKopyl
Copy link
Contributor

Hello,

Please, check the following patch that adds support of variadic functions to NVPTX backend:
https://reviews.llvm.org/D138531
But this is only half of the story. Clang frontend also needs corresponding support.

@Artem-B
Copy link
Member

Artem-B commented May 16, 2023

Patch to automatically enable -fcuda-allow-variadic-functions : https://reviews.llvm.org/D150718
We still can't actually compile variadic functions that need to access variadic arguments, but this should avoid the original problem in this bug report.

Artem-B added a commit that referenced this issue May 17, 2023
Allow parsing GPU-side variadic functions when we're compiling with CUDA-9 or
newer. We still do not allow accessing variadic arguments.

CUDA-9 was the version which introduced PTX-6.0 which allows implementing
variadic functions, so older versions can't have variadics in principle.

This is required for dealing with headers in recent CUDA versions that rely on
variadic function declarations in some of the templated code in libcu++.
E.g. #58410

Differential Revision: https://reviews.llvm.org/D150718
@Artem-B
Copy link
Member

Artem-B commented May 23, 2023

Clang should be able to compile CUDA atomics now, though it needs _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL to enable the right code path in CUDA headers: https://godbolt.org/z/f3sj36rq6

@jrhemstad
Copy link

Clang should be able to compile CUDA atomics now, though it needs _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL to enable the right code path in CUDA headers: https://godbolt.org/z/f3sj36rq6

Is there a feature test macro or specific version of clang we could test to appropriately update the definition of _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL to be enabled when using a new enough version of clang?

@Artem-B
Copy link
Member

Artem-B commented May 23, 2023

AFAICT, the atomics support has been there since ~clang-13: https://godbolt.org/z/cGGroW74q

__clang_major__ >= 13 should work.

@jrhemstad
Copy link

Filed NVIDIA/cccl#1020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX clang:frontend Language frontend issues, e.g. anything involving "Sema" cuda
Projects
None yet
Development

No branches or pull requests

5 participants