Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add half_t and bhalf_t limits #5778

Merged
merged 10 commits into from
Mar 30, 2023
Merged

Add half_t and bhalf_t limits #5778

merged 10 commits into from
Mar 30, 2023

Conversation

e10harvey
Copy link
Contributor

Move these limits from KokkosKernels. Mostly epsilon is used in unit-tests. ArithTraits also uses the other limits.

Related to kokkos/kokkos-kernels#1414.

@e10harvey e10harvey self-assigned this Jan 18, 2023
@ajpowelsnl ajpowelsnl requested a review from lucbv January 18, 2023 19:15
@cz4rs
Copy link
Contributor

cz4rs commented Jan 18, 2023

Is this relevant?
llvm/llvm-project@059b823

@e10harvey
Copy link
Contributor Author

Is this relevant? llvm/llvm-project@059b823

It looks relevant. I don't think all toolchains define these constants though.

@cz4rs
Copy link
Contributor

cz4rs commented Jan 18, 2023

Is this relevant? llvm/llvm-project@059b823

It looks relevant. I don't think all toolchains define these constants though.

Specifically, changing MIN_EXP to -13 and MAX_EXP to 16 (as it happens in the linked commit due to C specification).

dalg24
dalg24 previously requested changes Jan 20, 2023
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define macros rather than provide specialization of the numeric traits?

@crtrott
Copy link
Member

crtrott commented Jan 23, 2023

Yeah: definitely lets just specialize the numeric traits. We need those anyway ...

@e10harvey
Copy link
Contributor Author

Yea, I was thinking about putting them in numeric traits. Let me know if you want the defines somewhere else. I'd prefer to keep the defines.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the macros for anything. Those should go away.
Also the numeric traits specialization for half should be in the half header not in the Kokkos_NumericTraits header.

@masterleinad
Copy link
Contributor

diff --git a/core/src/Kokkos_Half.hpp b/core/src/Kokkos_Half.hpp
index 656b177cf..59bbac948 100644
--- a/core/src/Kokkos_Half.hpp
+++ b/core/src/Kokkos_Half.hpp
@@ -321,9 +321,7 @@ class alignas(FloatType) floating_point_wrapper {
   KOKKOS_FUNCTION
   floating_point_wrapper(long long rhs) : val(cast_to_wrapper(rhs, val).val) {}
   KOKKOS_FUNCTION
-  constexpr floating_point_wrapper(unsigned short rhs) {
-    reinterpret_cast<unsigned short>(val) = rhs;
-  }
+  constexpr floating_point_wrapper(unsigned short rhs) : val(cast_to_wrapper(rhs, val).val) {}
   KOKKOS_FUNCTION
   floating_point_wrapper(unsigned int rhs)
       : val(cast_to_wrapper(rhs, val).val) {}
@@ -865,7 +863,7 @@ class alignas(FloatType) floating_point_wrapper {
 template <class T>
 static KOKKOS_INLINE_FUNCTION constexpr Kokkos::Experimental::half_t
 cast_to_wrapper(T x, const volatile Kokkos::Impl::half_impl_t::type&) {
-  return Kokkos::Experimental::cast_to_half(x);
+  return Kokkos::Experimental::half_t::impl_type(x);
 }
 
 #ifdef KOKKOS_IMPL_BHALF_TYPE_DEFINED
@@ -1046,7 +1044,7 @@ struct Kokkos::Experimental::Impl::round_error_helper<
 template <>
 struct Kokkos::Experimental::Impl::norm_min_helper<
     Kokkos::Experimental::half_t> {
-  static constexpr int value = 0.000000059604645F;
+  static constexpr Kokkos::Experimental::half_t value = 0.000000059604645F;
 };
 // Quiet not a half precisioin number
 template <>
diff --git a/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp b/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
index d7b0271e1..89a167903 100644
--- a/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
+++ b/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
@@ -29,7 +29,7 @@ namespace Experimental {
 KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(half_t val) { return val; }
 
-KOKKOS_INLINE_FUNCTION
+constexpr KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(float val) { return half_t::impl_type(val); }
 KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(double val) { return half_t::impl_type(val); }

is enough to fix this for SYCL.

@e10harvey e10harvey requested a review from dalg24 February 2, 2023 15:41
@e10harvey
Copy link
Contributor Author

diff --git a/core/src/Kokkos_Half.hpp b/core/src/Kokkos_Half.hpp
index 656b177cf..59bbac948 100644
--- a/core/src/Kokkos_Half.hpp
+++ b/core/src/Kokkos_Half.hpp
@@ -321,9 +321,7 @@ class alignas(FloatType) floating_point_wrapper {
   KOKKOS_FUNCTION
   floating_point_wrapper(long long rhs) : val(cast_to_wrapper(rhs, val).val) {}
   KOKKOS_FUNCTION
-  constexpr floating_point_wrapper(unsigned short rhs) {
-    reinterpret_cast<unsigned short>(val) = rhs;
-  }
+  constexpr floating_point_wrapper(unsigned short rhs) : val(cast_to_wrapper(rhs, val).val) {}
   KOKKOS_FUNCTION
   floating_point_wrapper(unsigned int rhs)
       : val(cast_to_wrapper(rhs, val).val) {}
@@ -865,7 +863,7 @@ class alignas(FloatType) floating_point_wrapper {
 template <class T>
 static KOKKOS_INLINE_FUNCTION constexpr Kokkos::Experimental::half_t
 cast_to_wrapper(T x, const volatile Kokkos::Impl::half_impl_t::type&) {
-  return Kokkos::Experimental::cast_to_half(x);
+  return Kokkos::Experimental::half_t::impl_type(x);
 }
 
 #ifdef KOKKOS_IMPL_BHALF_TYPE_DEFINED
@@ -1046,7 +1044,7 @@ struct Kokkos::Experimental::Impl::round_error_helper<
 template <>
 struct Kokkos::Experimental::Impl::norm_min_helper<
     Kokkos::Experimental::half_t> {
-  static constexpr int value = 0.000000059604645F;
+  static constexpr Kokkos::Experimental::half_t value = 0.000000059604645F;
 };
 // Quiet not a half precisioin number
 template <>
diff --git a/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp b/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
index d7b0271e1..89a167903 100644
--- a/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
+++ b/core/src/SYCL/Kokkos_SYCL_Half_Conversion.hpp
@@ -29,7 +29,7 @@ namespace Experimental {
 KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(half_t val) { return val; }
 
-KOKKOS_INLINE_FUNCTION
+constexpr KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(float val) { return half_t::impl_type(val); }
 KOKKOS_INLINE_FUNCTION
 half_t cast_to_half(double val) { return half_t::impl_type(val); }

is enough to fix this for SYCL.

The same changes on CUDA give:

kokkos/core/src/Kokkos_Half.hpp(1043): error: expression must have a constant value

@masterleinad
Copy link
Contributor

The same changes on CUDA give:

Yes, I am seeing

/tmp/kokkos/core/src/Kokkos_Half.hpp(1024): error: expression must have a constant value
/tmp/kokkos/core/src/Kokkos_Half.hpp(866): note #2703-D: cannot call non-constexpr function "__half::__half(float)"
/soft/compilers/cuda/cuda-11.6.2/include/cuda_fp16.hpp(201): here
/tmp/kokkos/core/src/Kokkos_Half.hpp(309): note #2693-D: called from:

indicating that none of the conversion functions is constexpr in Cuda.

@ajpowelsnl
Copy link
Contributor

@e10harvey - does this issue need some discussion today?

@e10harvey
Copy link
Contributor Author

@e10harvey - does this issue need some discussion today?

@ajpowelsnl - thanks for following up. I think it needs another review by @crtrott and @dalg24 -- please let me know if you have any questions.

@masterleinad
Copy link
Contributor

There are failing tests.

@masterleinad
Copy link
Contributor

[ RUN      ] defaultdevicetype.bhalf_operators
11: /var/jenkins/workspace/Kokkos/core/unit_test/TestHalfOperators.hpp:926: Failure
11: The difference between f_device_actual_lhs(op_test) and f_device_expected_lhs(op_test) is 0.00020498037338256836, which exceeds epsilon, where
11: f_device_actual_lhs(op_test) evaluates to -0.765625,
11: f_device_expected_lhs(op_test) evaluates to -0.76542001962661743, and
11: epsilon evaluates to 1.1920928955078125e-07.
11: /var/jenkins/workspace/Kokkos/core/unit_test/TestHalfOperators.hpp:926: Failure
11: The difference between f_device_actual_lhs(op_test) and f_device_expected_lhs(op_test) is 3.9145350456237793e-05, which exceeds epsilon, where
11: f_device_actual_lhs(op_test) evaluates to 0.234619140625,
11: f_device_expected_lhs(op_test) evaluates to 0.23457999527454376, and
11: epsilon evaluates to 1.1920928955078125e-07.
11: /var/jenkins/workspace/Kokkos/core/unit_test/TestHalfOperators.hpp:926: Failure
11: The difference between f_device_actual_lhs(op_test) and f_device_expected_lhs(op_test) is 0.00020503997802734375, which exceeds epsilon, where
11: f_device_actual_lhs(op_test) evaluates to 1.234375,
11: f_device_expected_lhs(op_test) evaluates to 1.2345800399780273, and
11: epsilon evaluates to 1.1920928955078125e-07.
11: /var/jenkins/workspace/Kokkos/core/unit_test/TestHalfOperators.hpp:926: Failure
11: The difference between f_device_actual_lhs(op_test) and f_device_expected_lhs(op_test) is 0.00020503997802734375, which exceeds epsilon, where
11: f_device_actual_lhs(op_test) evaluates to 2.234375,
11: f_device_expected_lhs(op_test) evaluates to 2.2345800399780273, and
11: epsilon evaluates to 1.1920928955078125e-07.
11: [  FAILED  ] defaultdevicetype.bhalf_operators (0 ms)

@e10harvey
Copy link
Contributor Author

Hi @mhoemmen ! Is there a way to invoke CUDA half and bhalf casting intrinsics at compile time?

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I get it now.

@crtrott
Copy link
Member

crtrott commented Mar 15, 2023

Uhoh work graph test failed???

@crtrott
Copy link
Member

crtrott commented Mar 15, 2023

I mean workgraph is clearly not an issue coming from this I would expect but this is weird

@crtrott
Copy link
Member

crtrott commented Mar 15, 2023

Retest this please

@dalg24
Copy link
Member

dalg24 commented Mar 15, 2023

How did you come up with these distinguished values?

Ping

@e10harvey
Copy link
Contributor Author

@dalg24, see response in #5778 (comment).

@e10harvey
Copy link
Contributor Author

I mean workgraph is clearly not an issue coming from this I would expect but this is weird

I don't see the workgraph failure. Instead, I see that OPENMPTARGET-Clang failed in 0s. How do I triage this?

@dalg24
Copy link
Member

dalg24 commented Mar 16, 2023

@dalg24, see response in #5778 (comment).

Is there documentation somewhere that you can refer to?

@e10harvey
Copy link
Contributor Author

Is there documentation somewhere that you can refer to?

@dalg24: I'm confused. These values are removed now.

@dalg24
Copy link
Member

dalg24 commented Mar 16, 2023

Is there documentation somewhere that you can refer to?

@dalg24: I'm confused. These values are removed now.

What I am asking is "where would one find out what is the maximum value that is representable with half type". Presumably you looked that up somewhere or you did some experiment. Please document how you came up with these numbers.

@e10harvey
Copy link
Contributor Author

Is there documentation somewhere that you can refer to?

@dalg24: I'm confused. These values are removed now.

What I am asking is "where would one find out what is the maximum value that is representable with half type". Presumably you looked that up somewhere or you did some experiment. Please document how you came up with these numbers.

Gotcha. I will document this. Should this be documented in the Kokkos_Half.hpp file?

@dalg24
Copy link
Member

dalg24 commented Mar 16, 2023

Gotcha. I will document this. Should this be documented in the Kokkos_Half.hpp file?

At the very least here in the discussion but if there is a nice reference yes also add as a comment in code.

@e10harvey
Copy link
Contributor Author

Gotcha. I will document this. Should this be documented in the Kokkos_Half.hpp file?

At the very least here in the discussion but if there is a nice reference yes also add as a comment in code.

OK, thanks for the clarifications, @dalg24. I did not follow a particular reference. I simply looked up definitions of the descriptions for each type of constant, as well as the format of each half precision type (i.e. N bits for significand and M bits for mantissa) and then computed the constants. For the format of the half type I followed https://ieeexplore.ieee.org/abstract/document/9973611. For computing the values, I used https://observablehq.com/@benaubin/floating-point or python, depending on whether I wanted to set individual bits in the half precision type or evaluate an expression. In some cases, I also set the bits via hex constants in the Kokkos half type and used printf to print the base10 representation.

In short, I do not have a single reference for where these values came from but I did my best to verify each constant is correct. Perhaps there is a better reference for these constants, but I did not find one during my half precision development work.

I also documented the human readable description and equation or high bits above each constants in Kokkos_Half.hpp.

I hope this helps.

@e10harvey
Copy link
Contributor Author

@dalg24: If you think a509f58 suffices, I will add similar docs for bhalf_t.

@dalg24
Copy link
Member

dalg24 commented Mar 29, 2023

Retest this please

@e10harvey
Copy link
Contributor Author

The GCC 8.4.0 pipeline stage in jenkins timed out.

@e10harvey
Copy link
Contributor Author

Retest this please

@dalg24
Copy link
Member

dalg24 commented Mar 29, 2023

The GCC 8.4.0 pipeline stage in jenkins timed out.

We are having some issues with the CPU only testing machines. I would have merged w/o that one :/

@dalg24
Copy link
Member

dalg24 commented Mar 30, 2023

Retest this please

@dalg24
Copy link
Member

dalg24 commented Mar 30, 2023

Evan dug out a build that had only GCC 8.4 not run
https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/12521/pipeline/57/
That is good enough for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants