-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half-precision floating-point elements in Bigarray
#10775
Conversation
Like many other suggestions for new features, this PR will have to wait for after the great Multicore OCaml merge. However, I'm globally in favor: it's very much in the spirit of the Bigarray module to support 16-bit float arrays, since they are used in a number of other libraries we may want to interface with. The one thing that makes me sad is that there is no standardized C compiler support or library function to convert to/from 16-bit FP numbers. To be revisited after the Multicore merge. |
@a12n - sorry that this has been forgotten. Would you be happy to rebase this so that CI and so forth can be re-triggered and then we can move forwards with a review? |
@dra27 Sure, I'll do it shortly. Looks like more changes are needed, it fails to build now. |
I've found a few more places where a new kind of bigarray elements should be handled ( Looks like it's unrelated to the failing builds, though. The build in AppVeyor Cygwin fails like this:
As far as I understand, error 127 means "command not found". When I try to configure and build on my machine (Debian Linux, amd64) it fails in even stranger way:
I'm not sure what to do next :( |
This looks like a buffer overflow or an out-of-bounds access. Without looking in detail, I have a possible explanation: you've added I'm not quite sure where the compiler uses bigarrays to trigger this problem, but it might simply be because of You could do a quick check by moving |
Thanks! That's what I've missed. I've bootstrapped the compiler according to the instructions. |
This PR has been on my pile for a very long time. Apologies for the delay. Here is a high-level review:
There's one bug in native-code generation. Consider This can be fixed easily by generating calls to the generic get/set functions in this case, as if the kind of the bigarray Finally, I'm unsure about using processor instructions when available.
|
I just pushed the simple fix for the ocamlopt issue : 8fd71c2 . |
Conversion and hash mix functions.
More general version of internal DO_FLOAT_COMPARISON macro, to allow different type of stored/compared values. Rewrite DO_FLOAT_COMPARISON with DO_GENERIC_UNORDERED_COMPARISON macro.
Also: on ARM 64-bit, use the _Float16 type for hardware conversions.
Some additions to the ba_float16 proposal
Thanks @a12n for having merged my suggestions in this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy with the PR in its current state. Since this is a standard library change and since I contributed some code, it would be good to have a second approval.
I can review the changes to the compiler itself, but I don't have anything to say about either the runtime code or the changes to the standard library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the stdlib look OK (modulo some missing "since" annotations).
Question: is the bootstrap absolutely necessary? |
There was a change 55dc02f (a new element for Please see #10775 (comment) above. |
Right, I had missed that comment. Thanks! |
@nojb are you happy with the new |
Thanks for the ping. Yes, looks OK. |
Excellent. Would you feel confident to approve this PR, or do we need to find another second reviewer? |
I'm happy to do so. I had refrained from doing it because I only took a cursory glance at the C code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
POWER 10 (Power ISA Version 3.1B) should also bring bfloat16 support but getting access to the hardware is tricky at the moment. |
@@ -34,6 +34,7 @@ typedef unsigned short caml_ba_uint16; | |||
#define CAML_BA_MAX_NUM_DIMS 16 | |||
|
|||
enum caml_ba_kind { | |||
CAML_BA_FLOAT16, /* Half-precision floats */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this new constructor at the end would probably help with backward compatibility.
In its current form, marshaling a bigarray gives a different result from previous ocaml version because all constructors have been shifted by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Octachron, @gasche, should I open an issue to track this properly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that @xavierleroy and @damiendoligez are on it (so: no need for a separate issue), but this is tricky because it requires an acrobatic bootstrap. Thanks for the report, this is indeed a good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi, it was spotted while adapting jsoo to trunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this suggestion. Implemented in #12357 .
This adds support for half-precision float elements in
Bigarray
.The values stored as
uint16_t
elements. Intrinsics from F16C instruction set are used forfloat16
/float
conversions, if available. Fallback conversion functions are from public domain code by Fabian Giesen (looks like the same code is used in Intel SPMD Program Compiler).To validate the fallback conversion functions, I've written the following C program. I'm not sure how (if at all) to incorporate it in the testsuite. F16C intrinsics are used as the reference, and it requires a proper
-march=
in the compiler flags and a proper CPU in the machine running the tests.Is there any interest in such a feature?