-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add half precision base support #3439
Add half precision base support #3439
Conversation
578c4e4
to
abbfe23
Compare
Forgot the license notice ... |
#include <cstdint> | ||
#include <Cuda/Kokkos_Cuda_Half.hpp> | ||
|
||
#ifndef KOKKOS_IMPL_HALF_TYPE_DEFINED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be IMPL? I feel like users might want to know whether their Kokkos has half-precision (though I'm open to other mechanisms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
half_t will always be defined, the way to figure out whether we use the fallback is half_is_float constexpr bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott and I discussed in Slack. I was worried about a model where a code used different versions of Kokkos (including one before we make this change), and they might switch off functionality based on that macro. We're going with a model where codes use a version of Kokkos that evolves with the code, so this won't come up
7d178f8
to
cfe944f
Compare
KOKKOS_INLINE_FUNCTION | ||
half_t cast_to_half(float val) { return __float2half(val); } | ||
KOKKOS_INLINE_FUNCTION | ||
half_t cast_to_half(double val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care that double2half(val)
and float2half(static_cast<float>(val))
don't have to return identical values (at least this is true for arm-gcc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users might be surprised if on one platform they get different results from cast_to_half
than on another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the nature of half precision right now. To the degree vendors agree on conversions we should be good. But CUDA doesn't provide us the conversion calls on both sides so I rather call the fast one on the device and do the extra conversion (with potentially different result) on host. I think you argument above though that is just generally expected. static_cast<int>(val)
doesn't necessarily give the same result as static_cast<int>(static_cast<float>(val))
either.
cfe944f
to
5339250
Compare
|
||
template <class T> | ||
void test_half_conversion_type() { | ||
double epsilon = Kokkos::Experimental::half_is_float ? 0.0000003 : 0.0003; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you come up with this relative tolerance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wikipedia :-) This is the actual (or pretty close) value based on the number of bits for the mantisse.
T base = static_cast<T>(3.3); | ||
Kokkos::Experimental::half_t a = Kokkos::Experimental::cast_to_half(base); | ||
T b = Kokkos::Experimental::cast_from_half<T>(a); | ||
ASSERT_TRUE((double(b - base) / double(base)) < epsilon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look into gtest support for fp comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ASSERT_NEAR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm I am not sure that that works, since I want to explicitly cast b-base not b and base and then do the difference.
5339250
to
288ab36
Compare
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
#include <Kokkos_Macros.hpp> | ||
#ifdef KOKKOS_ENABLE_CUDA | ||
#include <cuda_fp16.h> | ||
#include <cstdint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need it any more
#include <cstdint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
KOKKOS_INLINE_FUNCTION | ||
half_t cast_to_half(double val) { | ||
// double2half was only introduced in CUDA 11 too | ||
return __float2half(static_cast<float>(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if you wrote [unsigned] short int, [unsigned] long long int
(not omitting the trailing "int")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
template <class T> | ||
KOKKOS_INLINE_FUNCTION | ||
typename std::enable_if<std::is_same<T, float>::value, T>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typename std::enable_if<std::is_same<T, float>::value, T>::type | |
std::enable_if_t<std::is_same<T, float>::value, T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right I am forgetting that we now got C++14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets change all of those.
There are more changes coming to this. In particular for the CUDA version we need our own struct which does the right thing on the host and the device (i.e. overloaded math operators). Later we will add more versions: Intel only, CUDA on ARM (where we can actually use __fp16 on the host), HIP, etc. |
Adds conversion functions.
288ab36
to
e7fae55
Compare
Adds operator overloads for half_t. Adds support for half_t arithmetic on host via promotion to float.
Add Cuda half_t type and operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of changes which need to be done see my comments.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
operator half_device_type() const { return val; } | ||
|
||
// NOTE: Changing below to 1 produces constructor overload error | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok we need to figure out why this is still an issue.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
namespace Kokkos { | ||
namespace Experimental { | ||
|
||
using half_device_type = __half; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be an internal typedef to half_t and it should be called impl_type.
|
||
KOKKOS_FUNCTION | ||
half_t(half_device_type rhs = 0) : val(rhs) {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the bool thing gets solved if we explicitly add half_t(const half_t& rhs):val(rhs.val) {} constructor?
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
half_t operator+() const { | ||
half_t tmp = *this; | ||
#ifdef __CUDA_ARCH__ | ||
// printf("half_t unary operator+\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all the printf statements.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
// Logical operators | ||
KOKKOS_FUNCTION | ||
half_t operator!() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to return bool. Which also means we don't need the tmp but simply return !val or !__half2float(val)
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
return tmp; | ||
} | ||
|
||
#if 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of the #if 1
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
// NOTE: Loses short-circuit evaluation | ||
KOKKOS_FUNCTION | ||
bool operator&&(half_t rhs) const { | ||
half_t tmp = *this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need a tmp here.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
// NOTE: Loses short-circuit evaluation | ||
KOKKOS_FUNCTION | ||
bool operator||(half_t rhs) const { | ||
half_t tmp = *this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need a tmp here.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
// Comparison operators | ||
KOKKOS_FUNCTION | ||
bool operator==(half_t rhs) const { | ||
half_t tmp = *this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need a tmp here.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
template <class T> | ||
KOKKOS_INLINE_FUNCTION | ||
typename std::enable_if<std::is_same<T, float>::value, T>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets change all of those.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
} | ||
|
||
template <class T> | ||
KOKKOS_FUNCTION half_t operator=(T rhs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return reference.
I've attempted to address the CI build errors and feedback in crtrott#7. |
Implement PR feedback
core/src/Kokkos_Half.hpp
Outdated
// Using an explicit list here too, since the other ones are explicit and for | ||
// example don't include char | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION typename std::enable_if< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we can use enable_if_t
here (C++14).
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
half_t(const half_t&) = default; | ||
|
||
KOKKOS_FUNCTION | ||
half_t(impl_type rhs = cast_to_half(0)) : val(rhs) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for two constructors here instead of relying on defaulted parameters.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
// Cast rhs to half for assignment to lhs of type half_t | ||
template <class T> | ||
KOKKOS_FUNCTION half_t(T rhs) : half_t(cast_to_half(rhs)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should constrain this (otherwise it matches just about everything) instead of the hard error calling it on something not convertible to a parameter that cast_to_half
can take.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
} | ||
|
||
KOKKOS_FUNCTION | ||
half_t(const half_t&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking Rule of 0 here. I can't imagine move does anything different vs. copy for __half
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had some issues with ambiguity before anything to consider in that direction? Or are you just saying we shouldn't define any of those constructors, and just write the explicit conversion constructors explicitly.
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
// Binary Arithmetic | ||
KOKKOS_FUNCTION | ||
half_t operator+(half_t rhs) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These binary operators should be hidden friends, as in:
half_t friend operator+(half_t lhs, half_t rhs)
{
#ifdef __CUDA_ARCH__
lhs.val += rhs.val;
#else
lhs.val = __float2half(__half2float(lhs.val) + __half2float(rhs.val));
#endif
return lhs;
}
The benefits are symmetry and ADL.
It is less common to do those for unary operators, although Anthony Williams makes a case for it at https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html.
Remove implicit conversion from half_t to __half and bool. This may cause the compiler to consider all __half operators or bool operators for expressions involving half_t. Make operator{+,-,*,/} symmetric Explicitly overload half_t constructors Add casting to/from bool Add test cases for symmetry Do not force a copy constructor for half_t
Implement PR feedback and implicit ops
Add explicit conversion ops for casting from half_t to supported types Add copy constructor back to make half_t both trivially and copy constructible
Use explicit casting to support deep_copy from T to half_t
half_t Cuda and deep_copy updates
CI workarounds and fixes
Add back type_traits include...
apply-clang-format...
Try KOKKOS_ENABLE_SYCL to disable the lambda...
Conditionally disable TestHalfOperators
Test whether half_t is trivially copyable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by review. I need more time.
T base = static_cast<T>(3.3); | ||
Kokkos::Experimental::half_t a = Kokkos::Experimental::cast_to_half(base); | ||
T b = Kokkos::Experimental::cast_from_half<T>(a); | ||
ASSERT_TRUE((double(b - base) / double(base)) < epsilon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ASSERT_NEAR
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, float>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, bool>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, double>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, short>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, int>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long long>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION | ||
std::enable_if_t<std::is_same<T, unsigned short>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, unsigned int>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION | ||
std::enable_if_t<std::is_same<T, unsigned long>::value, T> | ||
cast_from_half(half_t); | ||
template <class T> | ||
KOKKOS_INLINE_FUNCTION | ||
std::enable_if_t<std::is_same<T, unsigned long long>::value, T> | ||
cast_from_half(half_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability is poor. Why did you change from
std::enable_if_t<std::is_same<T, Foo>::value &&
std::is_same<T, Bar>::value &&
... , T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will update accordingly. I used a regex to convert the cast_to_half forward decls above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalg24: Thinking about this more now. I cannot overload based on return type only. This was the best way I could find to provide the forward decls for cast_from_half
. Is there a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::enable_if_t<std::is_same<T, Foo>::value && std::is_same<T, Bar>::value && ... , T>
I don't think this expression will ever evaluate to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be && -> ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a multiple definition error with ||
. Is there a better way to provide the forward decls than what is shown above?
core/src/Cuda/Kokkos_Cuda_Half.hpp
Outdated
|
||
namespace Kokkos { | ||
namespace Experimental { | ||
#define HALF_IMPL_TYPE __half |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a better way to do this. I needed a compile time conditional since using impl_type = T
was not possible in Kokkos_Half.hpp
. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite following why using a type alias is not possible.
In any case you would probably need to comment to that effect if you absolutely have to use a macro and also make sure you #undef
it when you don't need it any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion, this is mostly about the test thing there we want to test interoperability with implementation types:
We can simply have a trait to do that however:
template<class T>
struct half_impl_type {
using type = typename T::impl_type;
};
template<>
struct half_impl_type<float> {
using type = float;
};
I would put that into the Impl namespace for now.
KOKKOS_FUNCTION | ||
half_t(impl_type rhs) : val(rhs) {} | ||
KOKKOS_FUNCTION | ||
half_t(float rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(bool rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(double rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(short rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(int rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(long rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(long long rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(unsigned short rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(unsigned int rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(unsigned long rhs) : val(cast_to_half(rhs).val) {} | ||
KOKKOS_FUNCTION | ||
half_t(unsigned long long rhs) : val(cast_to_half(rhs).val) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these not explicit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a best faith effort to get half_t behaving like other precisions (ex: float). I observed that float can be implicitly cast to and from T. For half_t, we cannot have implicit conversion operators to T as the compiler will consider T and its operators in all expressions involving half_t; but, we cannot mark half_t's operators as higher precedence than T's.
Fix half_t trivially copyable test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, other than the question about short circuiting operators.
|
||
// NOTE: Loses short-circuit evaluation | ||
KOKKOS_FUNCTION | ||
bool operator&&(half_t rhs) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want this? Losing short circuiting can lead to bugs in complex expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of bugs? We could collect feedback on this. But since we don't have implicit conversion to bool (only from) statements like a&&b wouldn't work otherwise and you would need to do bool(a)&&bool(b) instead. So I am in favor of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In favor of keeping these logical operator overloads? Shall I update the unit tests with some more complex expressions involving these logical operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just merge it. Nevin did approve after all. And if its a real issue he shouldn't have :-)
If possible we should move that in after the backend refactor since this file might be one of the per backend specializations.