-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Move bfloat support from experimental to supported. #6524
Conversation
Signed-off-by: Rajiv Deodhar <rajiv.deodhar@intel.com>
Looks good, do we want to also move these bfloat16 math functions out of experimental also:
since they are defined in the same extension document as the main bfloat16 class? btw there should be an accompanying PR to intel/llvm-test-suite updating corresponding tests, otherwise there will be lots of failures: For example in this test: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/BFloat16/bfloat16_type.hpp. |
Also FYI there is another open PR updating the bfloat16 class that it might be good to consider merging first #6492. |
/verify with intel/llvm-test-suite#1129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, though I think the comment should be addressed first.
Also, should we consider leaving a deprecated version of bfloat16.hpp
in the experimental folder to warn old users? Since the feature was experimental I don't think we strictly have to, but we could.
@@ -408,4 +406,5 @@ Compute absolute value of a `bfloat16`. | |||
|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor | |||
|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi | |||
|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins | |||
|6|2022-08-03|Alexey Sotkin |Add `operator sycl::half()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be your name here. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is carried over from that author, but I agree that more changes have been made, so changing name.
I also have two global comments:
|
operator float() const; | ||
operator sycl::half() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this conversion to sycl::half
. However, we should also add the opposite conversion from sycl::half
to bfloat16
:
bfloat16(const sycl::half &a);
bfloat16 &operator=(const sycl::half &a);
Do we also need conversion to / from double
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is intended to move the current bfloat16 support out of experimental space. Any changes to the level of bfloat16 support can be done in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Intel platforms the bfloat16 to/from float is done using the __spirv_ConvertBF16ToFINTELoperator. I suspect a double version of that does not exist.
Float to double conversion can be made in the usual C++ way more efficiently in hardware. A direct version of bfloat16 to double conversion in software will involve more bit twiddling than the float conversion where only trailing 0 bits of fraction need to be inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sycl::half class includes conversions to/from float. Those kick in when bfloat16 is used with sycl::half, so conversions between bfloat16 and sycl::half are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that we should remove this conversion from bfloat16
to sycl::half
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item was revisited and it turns out that sycl::half <-> bfloat16 conversions are needed. They have been added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for joining the discussion late. May be it's a nitpick, but should we tell, that conversion half <-> bfloat16 follows IEEE 754 float <-> half conversion? In other words, what happens, if bfloat16 value overflows half range? Also are we adding last 3 fraction bits stochastically or they are guarantied to be zero (or it's implementation detail)?
/verify with intel/llvm-test-suite#1129 |
/run with intel/llvm-test-suite#1129 |
/verify with intel/llvm-test-suite#1129 |
/verify with intel/llvm-test-suite#1129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking a lot better, just a few more comments below.
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
/verify with intel/llvm-test-suite#1129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor doc comments below. I think the main remaining issue is that the aspect needs to be enabled.
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/supported/sycl_ext_oneapi_bfloat16.asciidoc
Outdated
Show resolved
Hide resolved
Line 5 of bloat16_type.cpp looks like it is missing the -fsycl-targets=%sycl_triple ?: // RUN: %clangxx -fsycl %s -o %t.out |
You're absolutely right, it seems like we have a separate CUDA test for this since it has additional requirements, so I think is better to just keep these separate for now. I have opened a PR for this in intel/llvm-test-suite#1423. |
@steffenlarsen should we revert this change? |
intel/llvm#6524 moved bfloat16 out of the experimental namespace. This commit removes the last remaining uses of the experimental namespace in bfloat16 for ESIMD and matrix tests. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Tests seem to be resolvable, but in order to make sure we need the next nightly. Once all the mentioned patches are merged we should only have a check-sycl failure on no-assert to also address, which should only affect post-commit so I suggest we keep it in unless other issues pop up. |
@steffenlarsen, thanks a lot for handling bunch of these!!! |
@steffenlarsen has also dealt with this: " 5. " |
#6524 accidentally removed the experimental bfloat16 math functions while moving bfloat16 out of the experimental namespace. This commit reintroduces these in the bfloat16_math.hpp header file. Changes to sub_group.hpp are to resolve detail namespace ambiguities are are NFC. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Test was modified at intel#6524 Change fixes post-commit issue in no-asserts mode
Test was modified at #6524 Change fixes post-commit issue in no-asserts mode
…ntel#9143) The error in LIT test esimd/intel_fp16_converts.cpp is caused by intel#6524 which - moved 'bfloat16' out of 'experimental' namespace - created a wrapper __devicelib_ConvertBF16ToFINTEL() which simply calls __spirv_ConvertBF16ToFINTEL() The fix - create a test for bfloat16 conversions. - allows __devicelib_ConvertBF16ToFINTEL() and __devicelib_ConvertFToBF16INTEL() for ESIMD context. Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
#7981) …9143) The error in LIT test esimd/intel_fp16_converts.cpp is caused by #6524 which - moved 'bfloat16' out of 'experimental' namespace - created a wrapper __devicelib_ConvertBF16ToFINTEL() which simply calls __spirv_ConvertBF16ToFINTEL() The fix - creates a test for bfloat16 conversions. - allows __devicelib_ConvertBF16ToFINTEL() and __devicelib_ConvertFToBF16INTEL() for ESIMD context. Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
…YCL (#8257) This PR addresses an issue where if we use `__CUDA_ARCH__` causes intrinsics not to be defined in the CUDA include files. - Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device - Update the `sycl-macro.cpp` test to check the appropriate macro. --- As far as I could find the original issue was introduced from PR [#6524](7b47ebb) for enabling the bfloat16 support moving it from the experimental extension, and it breaks some codebases with CUDA interop calls. Current reports include github issues [#7722](#7722), [#8133](#8133) and [oneapi-src/oneMKL#257](oneapi-src/oneMKL#257). For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it instead for SYCL device targets and leave `__CUDA_ARCH__` as before for CUDA targets.
…ntal status. (intel#1129) Tests changes for intel#6524 Signed-off-by: Rajiv Deodhar <rajiv.deodhar@intel.com> Co-authored-by: JackAKirk <jack.kirk@codeplay.com>
…ntal status. (intel/llvm-test-suite#1129) Tests changes for intel#6524 Signed-off-by: Rajiv Deodhar <rajiv.deodhar@intel.com> Co-authored-by: JackAKirk <jack.kirk@codeplay.com>
…vm-test-suite#1422) intel#6524 moved bfloat16 out of the experimental namespace. This commit removes the last remaining uses of the experimental namespace in bfloat16 for ESIMD and matrix tests. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This change makes bfloat16 a supported feature.
Signed-off-by: Rajiv Deodhar rajiv.deodhar@intel.com