Add support for fp16 in the HIP backend #4688

Rombur · 2022-01-19T14:31:29Z

The code is identical to fp16 in the CUDA backend except here

masterleinad · 2022-01-19T14:57:16Z

Is this rebased on top of #4650? I think we agreed to do that one first and go from there for other backends.

crtrott · 2022-01-21T20:21:20Z

@Rombur please rebase on develop

core/src/HIP/Kokkos_HIP_Half_Impl_Type.hpp

dalg24 · 2022-01-25T21:19:19Z

core/src/HIP/Kokkos_HIP_Half_Conversion.hpp

+#ifdef __HIP_DEVICE_COMPILE__
+  return half_t(__short2half_rn(val));
+#else
+  return half_t(__float2half(static_cast<float>(val)));
+#endif


Should we use KOKKOS_IF_ON_HOST?
(I see that we did use #ifdef __CUDA_ARCH__ but wondering what is the right thing to do. In any case does not need to be resolved in this PR but want to get the conversation started.)

My preference is to keep the macros for code that is only for one backend.

dalg24 · 2022-01-25T21:30:06Z

core/src/HIP/Kokkos_HIP_Vectorization.hpp

@@ -76,17 +76,19 @@ struct in_place_shfl_op {
    union conv_type {
      Scalar orig;
      shfl_type conv;
+      // This should be fine, members get explicitly reset, which changes the
+      // active member
+      KOKKOS_FUNCTION conv_type() { conv = 0; }


So here just copying the Cuda implementation

kokkos/core/src/Cuda/Kokkos_Cuda_Vectorization.hpp

Lines 79 to 102 in 20a090b

// sizeof(Scalar) <= sizeof(int) case

template <class Scalar>

// requires _assignable_from_bits<Scalar>

__device__ inline typename std::enable_if<sizeof(Scalar) <= sizeof(int)>::type

operator()(Scalar& out, Scalar const& in, int lane_or_delta, int width,

unsigned mask = shfl_all_mask) const noexcept {

using shfl_type = int;

union conv_type {

Scalar orig;

shfl_type conv;

// This should be fine, members get explicitly reset, which changes the

// active member

KOKKOS_FUNCTION conv_type() { conv = 0; }

};

conv_type tmp_in;

tmp_in.orig = in;

shfl_type tmp_out;

tmp_out = reinterpret_cast<shfl_type&>(tmp_in.orig);

conv_type res;

//------------------------------------------------

res.conv = self().do_shfl_op(mask, tmp_out, lane_or_delta, width);

//------------------------------------------------

out = reinterpret_cast<Scalar&>(res.conv);

}

It was updated in #2991
I suppose you don't want to keep the FIXME L68?

…e a default constructor

masterleinad

Looks good to me.

masterleinad · 2022-01-25T22:29:17Z

core/src/HIP/Kokkos_HIP_Half_Conversion.hpp

+#ifdef __HIP_DEVICE_COMPILE__
+  return half_t(__short2half_rn(val));
+#else
+  return half_t(__float2half(static_cast<float>(val)));
+#endif


My preference is to keep the macros for code that is only for one backend.

dalg24 · 2022-01-26T19:22:20Z

Failure (perf test in CUDA-9.2-NVCC build is clearly unrelated.

Rombur force-pushed the half branch 2 times, most recently from c3e23e8 to 4b7fa60 Compare January 19, 2022 14:54

crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jan 19, 2022

nmm0 added this to In progress in Kokkos Release 3.6 via automation Jan 19, 2022

Rombur force-pushed the half branch from 4b7fa60 to a687248 Compare January 25, 2022 21:10

dalg24 reviewed Jan 25, 2022

View reviewed changes

Rombur force-pushed the half branch from a687248 to ef81423 Compare January 25, 2022 21:41

Rombur added 2 commits January 25, 2022 15:55

Fix bug when using shuffle on types of 4 bytes of less that don't hav…

06ad983

…e a default constructor

Add support for fp16 in the HIP backend

e56d5d4

Rombur force-pushed the half branch from ef81423 to e56d5d4 Compare January 25, 2022 21:55

dalg24 requested review from e10harvey and masterleinad January 25, 2022 22:08

masterleinad approved these changes Jan 25, 2022

View reviewed changes

dalg24 approved these changes Jan 26, 2022

View reviewed changes

dalg24 merged commit 4e0611f into kokkos:develop Jan 26, 2022

Kokkos Release 3.6 automation moved this from In progress to Done Jan 26, 2022

Rombur deleted the half branch February 1, 2022 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fp16 in the HIP backend #4688

Add support for fp16 in the HIP backend #4688

Rombur commented Jan 19, 2022

masterleinad commented Jan 19, 2022

crtrott commented Jan 21, 2022

dalg24 Jan 25, 2022

masterleinad Jan 25, 2022

dalg24 Jan 25, 2022

masterleinad left a comment

masterleinad Jan 25, 2022

dalg24 commented Jan 26, 2022

	// sizeof(Scalar) <= sizeof(int) case
	template <class Scalar>
	// requires _assignable_from_bits<Scalar>
	__device__ inline typename std::enable_if<sizeof(Scalar) <= sizeof(int)>::type
	operator()(Scalar& out, Scalar const& in, int lane_or_delta, int width,
	unsigned mask = shfl_all_mask) const noexcept {
	using shfl_type = int;
	union conv_type {
	Scalar orig;
	shfl_type conv;
	// This should be fine, members get explicitly reset, which changes the
	// active member
	KOKKOS_FUNCTION conv_type() { conv = 0; }
	};
	conv_type tmp_in;
	tmp_in.orig = in;
	shfl_type tmp_out;
	tmp_out = reinterpret_cast<shfl_type&>(tmp_in.orig);
	conv_type res;
	//------------------------------------------------
	res.conv = self().do_shfl_op(mask, tmp_out, lane_or_delta, width);
	//------------------------------------------------
	out = reinterpret_cast<Scalar&>(res.conv);
	}

Add support for fp16 in the HIP backend #4688

Add support for fp16 in the HIP backend #4688

Conversation

Rombur commented Jan 19, 2022

masterleinad commented Jan 19, 2022

crtrott commented Jan 21, 2022

dalg24 Jan 25, 2022

Choose a reason for hiding this comment

masterleinad Jan 25, 2022

Choose a reason for hiding this comment

dalg24 Jan 25, 2022

Choose a reason for hiding this comment

masterleinad left a comment

Choose a reason for hiding this comment

masterleinad Jan 25, 2022

Choose a reason for hiding this comment

dalg24 commented Jan 26, 2022