CUDA code compile error: function template partial specialization is not allowed #68772

zhiweij1 · 2023-10-11T07:32:24Z

Smaller reproducer:

template<class T1, class T2>
struct A {
  template <class T3>
  void foo(T3 x);
};

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo<T3>(T3 x) {

}

https://godbolt.org/z/xn7vre9db

The original code comes from https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L426

It can be built with nvcc successfully.

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-10-11T08:36:49Z

@llvm/issue-subscribers-c-1

Author: Jiang, Zhiwei (zhiweij1)

Smaller reproducer: ``` template<class T1, class T2> struct A { template <class T3> void foo(T3 x); };

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo<T3>(T3 x) {

}

https://godbolt.org/z/xn7vre9db

The original code comes from https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L426

It can be built with nvcc successfully.
</details>

Endilll · 2023-10-11T08:38:24Z

I think out-of-line member function template definition looks the following way:

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo(T3 x) {}

So Clang is correct to reject your code.

zhiweij1 · 2023-10-11T08:44:28Z

Hi @Endilll ,

Actually, the key point here is that the original code is compliable witn nvcc (and it is treated as CUDA language).
So, for clang, if I also treat it as CUDA code, I expect the code can be compiled.

I changed the title from "C++" => "CUDA"

Endilll · 2023-10-11T08:51:12Z

I'm not sure we are bug-compatible with nvcc (or want to be) when it clashes with the C++ standard. That said, I think your code falls into conforming extension territory.
CC @AaronBallman

AaronBallman · 2023-10-11T11:48:22Z

We don't currently aim for compatibility with NVCC within Clang, though given that we aim for compatibility with GCC and MSVC, I would suspect an RFC with a plan for NVCC compatibility would be well-received once we knew the shape of what the goal was (ABI compat sufficient, or do we need bug for bug compat, etc). CC @erichkeane on this point.

The official CUDA documentation lists C++ language extensions, but makes no mention of this being a language extension. So I think this is very plausibly a bug in NVCC that their headers are relying on. Have you reported this to NVidia and gotten their stance on it?

erichkeane · 2023-10-11T13:54:00Z

I would presume that this is something that is a biproduct of using a different CFE that doesn't catch this (EDG) and is not an intentional extension. At the moment, there is no plan to get CUDA to be NVCC compatible, I don't believe we make any guarantees at all. AFAIK, Google did the implementation for the purposes of some internal products, so I doubt we'll see any additional work on it.

I would think this is a situation where filing a bug against the FasterTransform library to see if they would use a correct type of C++ would be the best course of action.

AaronBallman · 2023-10-11T15:13:42Z

I would think this is a situation where filing a bug against the FasterTransform library to see if they would use a correct type of C++ would be the best course of action.

Agreed.

I'm going to close this as not a bug, but if we do decide we want NVCC CUDA compatibility at some point in the future or if @zhiweij1 comes back with more information on why this is needed regardless, we can reopen the issue at that point.

cor3ntin · 2023-10-11T15:26:40Z

@AaronBallman @erichkeane This is probably an edg bug https://godbolt.org/z/81osqz3nd - it's not a partial specialization, it's just lets you repeat the parameter list exactly, probably a missing diag

erichkeane · 2023-10-11T15:28:09Z

@AaronBallman @erichkeane This is probably an edg bug https://godbolt.org/z/81osqz3nd - it's not a partial specialization, it's just lets you repeat the parameter list exactly, probably a missing diag

That was my conclusion as well (though I phrased it less explicitly).

zhiweij1 · 2023-10-12T00:43:17Z

I have reported this issue in nvidia forum https://forums.developer.nvidia.com/t/nvcc-bug-incorrect-template-method-definition-syntax-can-pass-nvcc-compilation/269184 and in FasterTransformer community NVIDIA/FasterTransformer#765

github-actions bot added the new issue label Oct 11, 2023

Endilll added c++ clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer and removed new issue labels Oct 11, 2023

zhiweij1 changed the title ~~C++ code compile error: function template partial specialization is not allowed~~ CUDA code compile error: function template partial specialization is not allowed Oct 11, 2023

Endilll added the cuda label Oct 11, 2023

AaronBallman added the invalid Resolved as invalid, i.e. not a bug label Oct 11, 2023

AaronBallman closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023

zhiweij1 mentioned this issue Oct 12, 2023

CUDA code compile error with clang: function template partial specialization is not allowed NVIDIA/FasterTransformer#765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA code compile error: function template partial specialization is not allowed #68772

CUDA code compile error: function template partial specialization is not allowed #68772

zhiweij1 commented Oct 11, 2023

llvmbot commented Oct 11, 2023

Endilll commented Oct 11, 2023 •

edited

Loading

zhiweij1 commented Oct 11, 2023 •

edited

Loading

Endilll commented Oct 11, 2023

AaronBallman commented Oct 11, 2023

erichkeane commented Oct 11, 2023

AaronBallman commented Oct 11, 2023

cor3ntin commented Oct 11, 2023

erichkeane commented Oct 11, 2023

zhiweij1 commented Oct 12, 2023 •

edited

Loading

CUDA code compile error: function template partial specialization is not allowed #68772

CUDA code compile error: function template partial specialization is not allowed #68772

Comments

zhiweij1 commented Oct 11, 2023

llvmbot commented Oct 11, 2023

Endilll commented Oct 11, 2023 • edited Loading

zhiweij1 commented Oct 11, 2023 • edited Loading

Endilll commented Oct 11, 2023

AaronBallman commented Oct 11, 2023

erichkeane commented Oct 11, 2023

AaronBallman commented Oct 11, 2023

cor3ntin commented Oct 11, 2023

erichkeane commented Oct 11, 2023

zhiweij1 commented Oct 12, 2023 • edited Loading

Endilll commented Oct 11, 2023 •

edited

Loading

zhiweij1 commented Oct 11, 2023 •

edited

Loading

zhiweij1 commented Oct 12, 2023 •

edited

Loading