Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA code compile error: function template partial specialization is not allowed #68772

Closed
zhiweij1 opened this issue Oct 11, 2023 · 10 comments
Closed
Labels
c++ clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer cuda invalid Resolved as invalid, i.e. not a bug

Comments

@zhiweij1
Copy link

Smaller reproducer:

template<class T1, class T2>
struct A {
  template <class T3>
  void foo(T3 x);
};

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo<T3>(T3 x) {

}

https://godbolt.org/z/xn7vre9db

The original code comes from https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L426

It can be built with nvcc successfully.

@Endilll Endilll added c++ clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer and removed new issue labels Oct 11, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 11, 2023

@llvm/issue-subscribers-c-1

Author: Jiang, Zhiwei (zhiweij1)

Smaller reproducer: ``` template<class T1, class T2> struct A { template <class T3> void foo(T3 x); };

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo<T3>(T3 x) {

}

https://godbolt.org/z/xn7vre9db

The original code comes from https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L426

It can be built with nvcc successfully.
</details>

@Endilll
Copy link
Contributor

Endilll commented Oct 11, 2023

I think out-of-line member function template definition looks the following way:

template<typename T1, typename T2>
template<typename T3>
void A<T1, T2>::foo(T3 x) {}

So Clang is correct to reject your code.

@zhiweij1
Copy link
Author

zhiweij1 commented Oct 11, 2023

Hi @Endilll ,

Actually, the key point here is that the original code is compliable witn nvcc (and it is treated as CUDA language).
So, for clang, if I also treat it as CUDA code, I expect the code can be compiled.

I changed the title from "C++" => "CUDA"

@zhiweij1 zhiweij1 changed the title C++ code compile error: function template partial specialization is not allowed CUDA code compile error: function template partial specialization is not allowed Oct 11, 2023
@Endilll Endilll added the cuda label Oct 11, 2023
@Endilll
Copy link
Contributor

Endilll commented Oct 11, 2023

I'm not sure we are bug-compatible with nvcc (or want to be) when it clashes with the C++ standard. That said, I think your code falls into conforming extension territory.
CC @AaronBallman

@AaronBallman
Copy link
Collaborator

We don't currently aim for compatibility with NVCC within Clang, though given that we aim for compatibility with GCC and MSVC, I would suspect an RFC with a plan for NVCC compatibility would be well-received once we knew the shape of what the goal was (ABI compat sufficient, or do we need bug for bug compat, etc). CC @erichkeane on this point.

The official CUDA documentation lists C++ language extensions, but makes no mention of this being a language extension. So I think this is very plausibly a bug in NVCC that their headers are relying on. Have you reported this to NVidia and gotten their stance on it?

@erichkeane
Copy link
Collaborator

I would presume that this is something that is a biproduct of using a different CFE that doesn't catch this (EDG) and is not an intentional extension. At the moment, there is no plan to get CUDA to be NVCC compatible, I don't believe we make any guarantees at all. AFAIK, Google did the implementation for the purposes of some internal products, so I doubt we'll see any additional work on it.

I would think this is a situation where filing a bug against the FasterTransform library to see if they would use a correct type of C++ would be the best course of action.

@AaronBallman AaronBallman added the invalid Resolved as invalid, i.e. not a bug label Oct 11, 2023
@AaronBallman
Copy link
Collaborator

I would think this is a situation where filing a bug against the FasterTransform library to see if they would use a correct type of C++ would be the best course of action.

Agreed.

I'm going to close this as not a bug, but if we do decide we want NVCC CUDA compatibility at some point in the future or if @zhiweij1 comes back with more information on why this is needed regardless, we can reopen the issue at that point.

@AaronBallman AaronBallman closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023
@cor3ntin
Copy link
Contributor

@AaronBallman @erichkeane This is probably an edg bug https://godbolt.org/z/81osqz3nd - it's not a partial specialization, it's just lets you repeat the parameter list exactly, probably a missing diag

@erichkeane
Copy link
Collaborator

@AaronBallman @erichkeane This is probably an edg bug https://godbolt.org/z/81osqz3nd - it's not a partial specialization, it's just lets you repeat the parameter list exactly, probably a missing diag

That was my conclusion as well (though I phrased it less explicitly).

@zhiweij1
Copy link
Author

zhiweij1 commented Oct 12, 2023

I have reported this issue in nvidia forum https://forums.developer.nvidia.com/t/nvcc-bug-incorrect-template-method-definition-syntax-can-pass-nvcc-compilation/269184 and in FasterTransformer community NVIDIA/FasterTransformer#765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer cuda invalid Resolved as invalid, i.e. not a bug
Projects
None yet
Development

No branches or pull requests

6 participants