Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fp16 in the HIP backend #4688

Merged
merged 2 commits into from
Jan 26, 2022
Merged

Add support for fp16 in the HIP backend #4688

merged 2 commits into from
Jan 26, 2022

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Jan 19, 2022

The code is identical to fp16 in the CUDA backend except here

@Rombur Rombur force-pushed the half branch 2 times, most recently from c3e23e8 to 4b7fa60 Compare January 19, 2022 14:54
@masterleinad
Copy link
Contributor

Is this rebased on top of #4650? I think we agreed to do that one first and go from there for other backends.

@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jan 19, 2022
@nmm0 nmm0 added this to In progress in Kokkos Release 3.6 via automation Jan 19, 2022
@crtrott
Copy link
Member

crtrott commented Jan 21, 2022

@Rombur please rebase on develop

core/src/HIP/Kokkos_HIP_Half_Impl_Type.hpp Outdated Show resolved Hide resolved
Comment on lines +73 to +77
#ifdef __HIP_DEVICE_COMPILE__
return half_t(__short2half_rn(val));
#else
return half_t(__float2half(static_cast<float>(val)));
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use KOKKOS_IF_ON_HOST?
(I see that we did use #ifdef __CUDA_ARCH__ but wondering what is the right thing to do. In any case does not need to be resolved in this PR but want to get the conversation started.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to keep the macros for code that is only for one backend.

@@ -76,17 +76,19 @@ struct in_place_shfl_op {
union conv_type {
Scalar orig;
shfl_type conv;
// This should be fine, members get explicitly reset, which changes the
// active member
KOKKOS_FUNCTION conv_type() { conv = 0; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here just copying the Cuda implementation

// sizeof(Scalar) <= sizeof(int) case
template <class Scalar>
// requires _assignable_from_bits<Scalar>
__device__ inline typename std::enable_if<sizeof(Scalar) <= sizeof(int)>::type
operator()(Scalar& out, Scalar const& in, int lane_or_delta, int width,
unsigned mask = shfl_all_mask) const noexcept {
using shfl_type = int;
union conv_type {
Scalar orig;
shfl_type conv;
// This should be fine, members get explicitly reset, which changes the
// active member
KOKKOS_FUNCTION conv_type() { conv = 0; }
};
conv_type tmp_in;
tmp_in.orig = in;
shfl_type tmp_out;
tmp_out = reinterpret_cast<shfl_type&>(tmp_in.orig);
conv_type res;
//------------------------------------------------
res.conv = self().do_shfl_op(mask, tmp_out, lane_or_delta, width);
//------------------------------------------------
out = reinterpret_cast<Scalar&>(res.conv);
}

It was updated in #2991
I suppose you don't want to keep the FIXME L68?

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Comment on lines +73 to +77
#ifdef __HIP_DEVICE_COMPILE__
return half_t(__short2half_rn(val));
#else
return half_t(__float2half(static_cast<float>(val)));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to keep the macros for code that is only for one backend.

@dalg24
Copy link
Member

dalg24 commented Jan 26, 2022

Failure (perf test in CUDA-9.2-NVCC build is clearly unrelated.

@dalg24 dalg24 merged commit 4e0611f into kokkos:develop Jan 26, 2022
Kokkos Release 3.6 automation moved this from In progress to Done Jan 26, 2022
@Rombur Rombur deleted the half branch February 1, 2022 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants