Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add half precision base support #3439

Merged
merged 31 commits into from
Nov 2, 2020

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Sep 30, 2020

If possible we should move that in after the backend refactor since this file might be one of the per backend specializations.

@crtrott crtrott force-pushed the add-half-precision-base-support branch from 578c4e4 to abbfe23 Compare September 30, 2020 16:58
@crtrott
Copy link
Member Author

crtrott commented Sep 30, 2020

Forgot the license notice ...

#include <cstdint>
#include <Cuda/Kokkos_Cuda_Half.hpp>

#ifndef KOKKOS_IMPL_HALF_TYPE_DEFINED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be IMPL? I feel like users might want to know whether their Kokkos has half-precision (though I'm open to other mechanisms)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half_t will always be defined, the way to figure out whether we use the fallback is half_is_float constexpr bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott and I discussed in Slack. I was worried about a model where a code used different versions of Kokkos (including one before we make this change), and they might switch off functionality based on that macro. We're going with a model where codes use a version of Kokkos that evolves with the code, so this won't come up

@crtrott crtrott force-pushed the add-half-precision-base-support branch 2 times, most recently from 7d178f8 to cfe944f Compare September 30, 2020 17:28
core/src/Cuda/Kokkos_Cuda_Half.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_Half.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_Half.hpp Outdated Show resolved Hide resolved
KOKKOS_INLINE_FUNCTION
half_t cast_to_half(float val) { return __float2half(val); }
KOKKOS_INLINE_FUNCTION
half_t cast_to_half(double val) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care that double2half(val) and float2half(static_cast<float>(val)) don't have to return identical values (at least this is true for arm-gcc)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that expected?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users might be surprised if on one platform they get different results from cast_to_half than on another.

Copy link
Member Author

@crtrott crtrott Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the nature of half precision right now. To the degree vendors agree on conversions we should be good. But CUDA doesn't provide us the conversion calls on both sides so I rather call the fast one on the device and do the extra conversion (with potentially different result) on host. I think you argument above though that is just generally expected. static_cast<int>(val) doesn't necessarily give the same result as static_cast<int>(static_cast<float>(val)) either.

@crtrott crtrott force-pushed the add-half-precision-base-support branch from cfe944f to 5339250 Compare September 30, 2020 19:13

template <class T>
void test_half_conversion_type() {
double epsilon = Kokkos::Experimental::half_is_float ? 0.0000003 : 0.0003;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with this relative tolerance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wikipedia :-) This is the actual (or pretty close) value based on the number of bits for the mantisse.

T base = static_cast<T>(3.3);
Kokkos::Experimental::half_t a = Kokkos::Experimental::cast_to_half(base);
T b = Kokkos::Experimental::cast_from_half<T>(a);
ASSERT_TRUE((double(b - base) / double(base)) < epsilon);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look into gtest support for fp comparison?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ASSERT_NEAR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I am not sure that that works, since I want to explicitly cast b-base not b and base and then do the difference.

@crtrott crtrott force-pushed the add-half-precision-base-support branch from 5339250 to 288ab36 Compare October 1, 2020 21:18
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <cuda_fp16.h>
#include <cstdint>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need it any more

Suggested change
#include <cstdint>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

KOKKOS_INLINE_FUNCTION
half_t cast_to_half(double val) {
// double2half was only introduced in CUDA 11 too
return __float2half(static_cast<float>(val));
Copy link
Member

@dalg24 dalg24 Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you wrote [unsigned] short int, [unsigned] long long int

(not omitting the trailing "int")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


template <class T>
KOKKOS_INLINE_FUNCTION
typename std::enable_if<std::is_same<T, float>::value, T>::type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typename std::enable_if<std::is_same<T, float>::value, T>::type
std::enable_if_t<std::is_same<T, float>::value, T>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right I am forgetting that we now got C++14

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets change all of those.

core/src/Kokkos_Half.hpp Outdated Show resolved Hide resolved
@crtrott
Copy link
Member Author

crtrott commented Oct 5, 2020

There are more changes coming to this. In particular for the CUDA version we need our own struct which does the right thing on the host and the device (i.e. overloaded math operators). Later we will add more versions: Intel only, CUDA on ARM (where we can actually use __fp16 on the host), HIP, etc.

@crtrott crtrott force-pushed the add-half-precision-base-support branch from 288ab36 to e7fae55 Compare October 5, 2020 20:38
e10harvey and others added 3 commits October 13, 2020 07:01
Adds operator overloads for half_t.

Adds support for half_t arithmetic on host via promotion to float.
Copy link
Member Author

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of changes which need to be done see my comments.

operator half_device_type() const { return val; }

// NOTE: Changing below to 1 produces constructor overload error
#if 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we need to figure out why this is still an issue.

namespace Kokkos {
namespace Experimental {

using half_device_type = __half;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an internal typedef to half_t and it should be called impl_type.


KOKKOS_FUNCTION
half_t(half_device_type rhs = 0) : val(rhs) {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the bool thing gets solved if we explicitly add half_t(const half_t& rhs):val(rhs.val) {} constructor?

half_t operator+() const {
half_t tmp = *this;
#ifdef __CUDA_ARCH__
// printf("half_t unary operator+\n");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all the printf statements.


// Logical operators
KOKKOS_FUNCTION
half_t operator!() const {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to return bool. Which also means we don't need the tmp but simply return !val or !__half2float(val)

return tmp;
}

#if 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of the #if 1

// NOTE: Loses short-circuit evaluation
KOKKOS_FUNCTION
bool operator&&(half_t rhs) const {
half_t tmp = *this;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need a tmp here.

// NOTE: Loses short-circuit evaluation
KOKKOS_FUNCTION
bool operator||(half_t rhs) const {
half_t tmp = *this;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need a tmp here.

// Comparison operators
KOKKOS_FUNCTION
bool operator==(half_t rhs) const {
half_t tmp = *this;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need a tmp here.


template <class T>
KOKKOS_INLINE_FUNCTION
typename std::enable_if<std::is_same<T, float>::value, T>::type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets change all of those.

}

template <class T>
KOKKOS_FUNCTION half_t operator=(T rhs) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return reference.

@e10harvey
Copy link
Contributor

I've attempted to address the CI build errors and feedback in crtrott#7.

// Using an explicit list here too, since the other ones are explicit and for
// example don't include char
template <class T>
KOKKOS_INLINE_FUNCTION typename std::enable_if<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we can use enable_if_t here (C++14).

half_t(const half_t&) = default;

KOKKOS_FUNCTION
half_t(impl_type rhs = cast_to_half(0)) : val(rhs) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for two constructors here instead of relying on defaulted parameters.


// Cast rhs to half for assignment to lhs of type half_t
template <class T>
KOKKOS_FUNCTION half_t(T rhs) : half_t(cast_to_half(rhs)) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should constrain this (otherwise it matches just about everything) instead of the hard error calling it on something not convertible to a parameter that cast_to_half can take.

}

KOKKOS_FUNCTION
half_t(const half_t&) = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking Rule of 0 here. I can't imagine move does anything different vs. copy for __half.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had some issues with ambiguity before anything to consider in that direction? Or are you just saying we shouldn't define any of those constructors, and just write the explicit conversion constructors explicitly.


// Binary Arithmetic
KOKKOS_FUNCTION
half_t operator+(half_t rhs) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These binary operators should be hidden friends, as in:

half_t friend operator+(half_t lhs, half_t rhs)
{
 #ifdef __CUDA_ARCH__
     lhs.val += rhs.val;
 #else
     lhs.val = __float2half(__half2float(lhs.val) + __half2float(rhs.val));
 #endif
     return lhs;
}

The benefits are symmetry and ADL.

It is less common to do those for unary operators, although Anthony Williams makes a case for it at https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html.

e10harvey and others added 2 commits October 15, 2020 13:43
Remove implicit conversion from half_t to __half and bool.
This may cause the compiler to consider all __half operators
or bool operators for expressions involving half_t.

Make operator{+,-,*,/} symmetric

Explicitly overload half_t constructors

Add casting to/from bool

Add test cases for symmetry

Do not force a copy constructor for half_t
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by review. I need more time.

T base = static_cast<T>(3.3);
Kokkos::Experimental::half_t a = Kokkos::Experimental::cast_to_half(base);
T b = Kokkos::Experimental::cast_from_half<T>(a);
ASSERT_TRUE((double(b - base) / double(base)) < epsilon);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ASSERT_NEAR

Comment on lines +86 to +121
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, float>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, bool>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, double>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, short>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, int>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, long long>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION
std::enable_if_t<std::is_same<T, unsigned short>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION std::enable_if_t<std::is_same<T, unsigned int>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION
std::enable_if_t<std::is_same<T, unsigned long>::value, T>
cast_from_half(half_t);
template <class T>
KOKKOS_INLINE_FUNCTION
std::enable_if_t<std::is_same<T, unsigned long long>::value, T>
cast_from_half(half_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability is poor. Why did you change from

std::enable_if_t<std::is_same<T, Foo>::value &&
                 std::is_same<T, Bar>::value &&
                 ... , T>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will update accordingly. I used a regex to convert the cast_to_half forward decls above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalg24: Thinking about this more now. I cannot overload based on return type only. This was the best way I could find to provide the forward decls for cast_from_half. Is there a better way to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::enable_if_t<std::is_same<T, Foo>::value &&
                 std::is_same<T, Bar>::value &&
                 ... , T>

I don't think this expression will ever evaluate to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be && -> ||

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a multiple definition error with ||. Is there a better way to provide the forward decls than what is shown above?


namespace Kokkos {
namespace Experimental {
#define HALF_IMPL_TYPE __half
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a macro?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a better way to do this. I needed a compile time conditional since using impl_type = T was not possible in Kokkos_Half.hpp. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite following why using a type alias is not possible.
In any case you would probably need to comment to that effect if you absolutely have to use a macro and also make sure you #undef it when you don't need it any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion, this is mostly about the test thing there we want to test interoperability with implementation types:
We can simply have a trait to do that however:

template<class T>
struct half_impl_type {
  using type = typename T::impl_type;
};

template<>
struct half_impl_type<float> {
  using type = float;
};

I would put that into the Impl namespace for now.

Comment on lines +173 to +196
KOKKOS_FUNCTION
half_t(impl_type rhs) : val(rhs) {}
KOKKOS_FUNCTION
half_t(float rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(bool rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(double rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(short rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(int rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(long long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(unsigned short rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(unsigned int rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(unsigned long rhs) : val(cast_to_half(rhs).val) {}
KOKKOS_FUNCTION
half_t(unsigned long long rhs) : val(cast_to_half(rhs).val) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a best faith effort to get half_t behaving like other precisions (ex: float). I observed that float can be implicitly cast to and from T. For half_t, we cannot have implicit conversion operators to T as the compiler will consider T and its operators in all expressions involving half_t; but, we cannot mark half_t's operators as higher precedence than T's.

@e10harvey
Copy link
Contributor

@dalg24, @nliber: Thank you for your reviews.

@dalg24: Do you have any other feedback before I start addressing it?

@nliber: I've attempted to address your feedback in
15104e6. Is there anything else you'd like addressed?

Copy link
Contributor

@nliber nliber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, other than the question about short circuiting operators.


// NOTE: Loses short-circuit evaluation
KOKKOS_FUNCTION
bool operator&&(half_t rhs) const {
Copy link
Contributor

@nliber nliber Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want this? Losing short circuiting can lead to bugs in complex expressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @nilber. The unit tests were written to check for simple short circuiting. I will update the unit tests with some more complex expressions that rely on operator precedence as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of bugs? We could collect feedback on this. But since we don't have implicit conversion to bool (only from) statements like a&&b wouldn't work otherwise and you would need to do bool(a)&&bool(b) instead. So I am in favor of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In favor of keeping these logical operator overloads? Shall I update the unit tests with some more complex expressions involving these logical operators?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just merge it. Nevin did approve after all. And if its a real issue he shouldn't have :-)

@e10harvey
Copy link
Contributor

@crtrott, @dalg24, @nliber: Is there anything else that should be addressed here before we squash and merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants