-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fp16 in the HIP backend #4688
Conversation
c3e23e8
to
4b7fa60
Compare
Is this rebased on top of #4650? I think we agreed to do that one first and go from there for other backends. |
@Rombur please rebase on develop |
#ifdef __HIP_DEVICE_COMPILE__ | ||
return half_t(__short2half_rn(val)); | ||
#else | ||
return half_t(__float2half(static_cast<float>(val))); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use KOKKOS_IF_ON_HOST
?
(I see that we did use #ifdef __CUDA_ARCH__
but wondering what is the right thing to do. In any case does not need to be resolved in this PR but want to get the conversation started.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to keep the macros for code that is only for one backend.
@@ -76,17 +76,19 @@ struct in_place_shfl_op { | |||
union conv_type { | |||
Scalar orig; | |||
shfl_type conv; | |||
// This should be fine, members get explicitly reset, which changes the | |||
// active member | |||
KOKKOS_FUNCTION conv_type() { conv = 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here just copying the Cuda implementation
kokkos/core/src/Cuda/Kokkos_Cuda_Vectorization.hpp
Lines 79 to 102 in 20a090b
// sizeof(Scalar) <= sizeof(int) case | |
template <class Scalar> | |
// requires _assignable_from_bits<Scalar> | |
__device__ inline typename std::enable_if<sizeof(Scalar) <= sizeof(int)>::type | |
operator()(Scalar& out, Scalar const& in, int lane_or_delta, int width, | |
unsigned mask = shfl_all_mask) const noexcept { | |
using shfl_type = int; | |
union conv_type { | |
Scalar orig; | |
shfl_type conv; | |
// This should be fine, members get explicitly reset, which changes the | |
// active member | |
KOKKOS_FUNCTION conv_type() { conv = 0; } | |
}; | |
conv_type tmp_in; | |
tmp_in.orig = in; | |
shfl_type tmp_out; | |
tmp_out = reinterpret_cast<shfl_type&>(tmp_in.orig); | |
conv_type res; | |
//------------------------------------------------ | |
res.conv = self().do_shfl_op(mask, tmp_out, lane_or_delta, width); | |
//------------------------------------------------ | |
out = reinterpret_cast<Scalar&>(res.conv); | |
} |
It was updated in #2991
I suppose you don't want to keep the FIXME L68?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
#ifdef __HIP_DEVICE_COMPILE__ | ||
return half_t(__short2half_rn(val)); | ||
#else | ||
return half_t(__float2half(static_cast<float>(val))); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to keep the macros for code that is only for one backend.
Failure (perf test in |
The code is identical to
fp16
in the CUDA backend except here