Half-precision floating-point elements in `Bigarray` #10775

a12n · 2021-11-16T07:40:05Z

This adds support for half-precision float elements in Bigarray.

The values stored as uint16_t elements. Intrinsics from F16C instruction set are used for float16/float conversions, if available. Fallback conversion functions are from public domain code by Fabian Giesen (looks like the same code is used in Intel SPMD Program Compiler).

To validate the fallback conversion functions, I've written the following C program. I'm not sure how (if at all) to incorporate it in the testsuite. F16C intrinsics are used as the reference, and it requires a proper -march= in the compiler flags and a proper CPU in the machine running the tests.

Is there any interest in such a feature?

xavierleroy · 2021-12-03T18:08:44Z

Like many other suggestions for new features, this PR will have to wait for after the great Multicore OCaml merge.

However, I'm globally in favor: it's very much in the spirit of the Bigarray module to support 16-bit float arrays, since they are used in a number of other libraries we may want to interface with.

The one thing that makes me sad is that there is no standardized C compiler support or library function to convert to/from 16-bit FP numbers.

To be revisited after the Multicore merge.

dra27 · 2023-02-23T14:16:00Z

@a12n - sorry that this has been forgotten. Would you be happy to rebase this so that CI and so forth can be re-triggered and then we can move forwards with a review?

a12n · 2023-02-26T19:29:14Z

@dra27 Sure, I'll do it shortly. Looks like more changes are needed, it fails to build now.

a12n · 2023-03-07T19:36:14Z

I've found a few more places where a new kind of bigarray elements should be handled (Typeopt, Lambda, ...).

Looks like it's unrelated to the failing builds, though.

The build in AppVeyor Cygwin fails like this:

…
  MKEXE runtime/ocamlruns.exe
cp runtime/ocamlruns.exe boot/ocamlruns.exe
make -C stdlib OCAMLRUN='$(ROOTDIR)/boot/ocamlruns.exe' \
    CAMLC='$(BOOT_OCAMLC)' all
make[2]: Entering directory '/cygdrive/c/projects/🐫реализация-mingw64/stdlib'
  OCAMLC camlinternalFormatBasics.cmi
make[2]: *** [Makefile:196: camlinternalFormatBasics.cmi] Error 127

As far as I understand, error 127 means "command not found".

When I try to configure and build on my machine (Debian Linux, amd64) it fails in even stranger way:

…
make -C stdlib \
  OCAMLRUN='$(ROOTDIR)/runtime/ocamlrun' \
  CAMLC='$(BOOT_OCAMLC) -use-prims ../runtime/primitives' all
make[2]: Entering directory '/home/arn/proj/ocaml/stdlib'
  OCAMLC camlinternalFormatBasics.cmi
  OCAMLC camlinternalFormatBasics.cmo
  OCAMLC stdlib.cmi
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
make[2]: *** [Makefile:196: stdlib.cmi] Error 134

gdb runtime/ocamlrun stdlib/core

Core was generated by `../runtime/ocamlrun ../boot/ocamlc -use-prims ../runtime/primitives -strict-seq'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f236e8b9537 in __GI_abort () at abort.c:79
#2  0x00007f236e912768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f236ea303a5 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f236e919a5a in malloc_printerr (str=str@entry=0x7f236ea32690 "malloc_consolidate(): invalid chunk size") at malloc.c:5347
#4  0x00007f236e91a918 in malloc_consolidate (av=av@entry=0x7f236ea66b80 <main_arena>) at malloc.c:4477
#5  0x00007f236e91c755 in _int_malloc (av=av@entry=0x7f236ea66b80 <main_arena>, bytes=bytes@entry=65656) at malloc.c:3699
#6  0x00007f236e91e164 in __GI___libc_malloc (bytes=65656) at malloc.c:3058
#7  0x000055fc9380ae45 in caml_stat_alloc_noexc (sz=sz@entry=65656) at runtime/memory.c:500
#8  caml_stat_alloc (sz=sz@entry=65656) at runtime/memory.c:554
#9  0x000055fc93804603 in caml_open_descriptor_in (fd=3) at runtime/io.c:168
#10 0x000055fc93804ffd in caml_ml_open_descriptor_in_with_flags (fd=<optimized out>, flags=0) at runtime/io.c:576
#11 0x000055fc93818d72 in caml_interprete (prog=<optimized out>, prog_size=<optimized out>) at runtime/interp.c:1037
#12 0x000055fc9381ae0c in caml_main (argv=0x7ffde7410a28) at runtime/startup_byt.c:573
#13 0x000055fc937ef9cc in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37

I'm not sure what to do next :(

damiendoligez · 2023-03-08T09:10:22Z

malloc_consolidate(): invalid chunk size

This looks like a buffer overflow or an out-of-bounds access. Without looking in detail, I have a possible explanation: you've added Pbigarray_float16 close to the beginning of the type bigarray_kind, shifting all other constructors by one place. You make the corresponding change in caml_ba_element_size in runtime/bigarray.c, but that will need a careful bootstrap (see BOOTSTRAP.adoc) because a plain build uses the new runtime with an old compiler (the bootstrap compiler).

I'm not quite sure where the compiler uses bigarrays to trigger this problem, but it might simply be because of stdlib/random.ml.

You could do a quick check by moving Pbigarray_float16 to the end of bigarray_kind and changing caml_ba_element_size accordingly, then see if the build works.

a12n · 2023-03-08T13:34:04Z

Thanks! That's what I've missed. I've bootstrapped the compiler according to the instructions.

xavierleroy · 2023-06-09T09:58:05Z

This PR has been on my pile for a very long time. Apologies for the delay. Here is a high-level review:

The feature is definitely useful.
The changes to runtime/bigarray.c are fine.
The changes in the compilers are fine except for one bug, see below.
The portable implementation of float32 <-> float16 conversions is fine, I don't think we can do significantly faster.

There's one bug in native-code generation. Consider a.{i} <- a.{i} +. 1.0 where a is statically known to have kind Float16. ocamlopt generates 16-bit integer loads and stores, but no conversions to/from FP, so the FP addition operates on integer registers, causing the emitter to fail.

This can be fixed easily by generating calls to the generic get/set functions in this case, as if the kind of the bigarray a was unknown. Alternatively, the conversions between int16-representing-a-float16 and float64 can be implemented as calls to runtime functions.

Finally, I'm unsure about using processor instructions when available.

ARMv8 (ARM 64 bits) is the only supported platform where we can be sure the hardware supports FP16<->FP32 conversions.
For x86, the current PR uses F16C instructions if the runtime system is compiled with gcc -mf16c, which is not the default, and the resulting runtime system cannot run on processors that don't support the F16C extension.
An alternative that I played with is to detect F16C availability at run-time, and dynamically switch between the hardware instructions and the software emulation. But it's tricky code that may not be worth the effort.

xavierleroy · 2023-06-09T10:01:44Z

I just pushed the simple fix for the ocamlopt issue : 8fd71c2 .

Conversion and hash mix functions.

More general version of internal DO_FLOAT_COMPARISON macro, to allow different type of stored/compared values. Rewrite DO_FLOAT_COMPARISON with DO_GENERIC_UNORDERED_COMPARISON macro.

Also: on ARM 64-bit, use the _Float16 type for hardware conversions.

Some additions to the ba_float16 proposal

xavierleroy · 2023-06-13T13:16:16Z

Thanks @a12n for having merged my suggestions in this PR:

make ocamlopt produce specialized get/set code for bigarrays that are statically known to hold FP16 numbers;
add support for ARMv8 (ARM 64-bit) FP16 conversions;
a test for specialized bigarray accesses (there was none before?).

xavierleroy

I'm very happy with the PR in its current state. Since this is a standard library change and since I contributed some code, it would be good to have a second approval.

gasche · 2023-06-13T14:04:02Z

I wonder if @lthls or @nojb would be curious to have a look.

lthls · 2023-06-13T14:25:19Z

I can review the changes to the compiler itself, but I don't have anything to say about either the runtime code or the changes to the standard library.

nojb

The changes to the stdlib look OK (modulo some missing "since" annotations).

stdlib/bigarray.mli

nojb · 2023-06-13T19:30:54Z

Question: is the bootstrap absolutely necessary?

a12n · 2023-06-13T21:28:39Z

Question: is the bootstrap absolutely necessary?

There was a change 55dc02f (a new element for enum caml_ba_kind and a new entry in caml_ba_element_size[]). Looks like bootstrap is needed (at least) for this.

Please see #10775 (comment) above.

nojb · 2023-06-14T04:18:27Z

Question: is the bootstrap absolutely necessary?

There was a change 55dc02f (a new element for enum caml_ba_kind and a new entry in caml_ba_element_size[]). Looks like bootstrap is needed (at least) for this.

Please see #10775 (comment) above.

Right, I had missed that comment. Thanks!

xavierleroy · 2023-06-18T17:07:35Z

@nojb are you happy with the new @since annotations?

nojb · 2023-06-18T17:14:37Z

@nojb are you happy with the new @since annotations?

Thanks for the ping. Yes, looks OK.

xavierleroy · 2023-06-18T17:16:43Z

Excellent. Would you feel confident to approve this PR, or do we need to find another second reviewer?

nojb · 2023-06-18T17:19:52Z

Excellent. Would you feel confident to approve this PR, or do we need to find another second reviewer?

I'm happy to do so. I had refrained from doing it because I only took a cursory glance at the C code.

nojb

LGTM

xavierleroy · 2023-06-19T12:40:12Z

The bootstrap had to be re-done once more, so I squashed and merged manually on trunk (08276af, 8816d27). Thanks to all who contributed.

tmcgilchrist · 2023-06-29T01:37:42Z

Finally, I'm unsure about using processor instructions when available.

ARMv8 (ARM 64 bits) is the only supported platform where we can be sure the hardware supports FP16<->FP32 conversions.

POWER 10 (Power ISA Version 3.1B) should also bring bfloat16 support but getting access to the hardware is tricky at the moment.

hhugo · 2023-07-04T18:32:49Z

runtime/caml/bigarray.h

@@ -34,6 +34,7 @@ typedef unsigned short caml_ba_uint16;
 #define CAML_BA_MAX_NUM_DIMS 16

 enum caml_ba_kind {
+  CAML_BA_FLOAT16,             /* Half-precision floats */


Moving this new constructor at the end would probably help with backward compatibility.
In its current form, marshaling a bigarray gives a different result from previous ocaml version because all constructors have been shifted by 1.

@Octachron, @gasche, should I open an issue to track this properly ?

My understanding is that @xavierleroy and @damiendoligez are on it (so: no need for a separate issue), but this is tricky because it requires an acrobatic bootstrap. Thanks for the report, this is indeed a good catch.

Fyi, it was spotted while adapting jsoo to trunk

I like this suggestion. Implemented in #12357 .

gasche added this to the post-freeze milestone Dec 3, 2021

Octachron assigned dra27 Feb 23, 2023

a12n force-pushed the ba_float16 branch from 875080c to ea4f970 Compare February 24, 2023 19:52

a12n force-pushed the ba_float16 branch from bdb8066 to fd6b66a Compare March 8, 2023 13:14

damiendoligez modified the milestones: post-freeze, 5.1 Mar 8, 2023

Octachron assigned Octachron and unassigned dra27 Mar 14, 2023

Octachron assigned xavierleroy and unassigned Octachron Apr 21, 2023

Octachron modified the milestones: 5.1, 5.2 Apr 21, 2023

a12n added 10 commits June 9, 2023 23:38

Add C functions to support float16 elements in Bigarray

b05a0de

Conversion and hash mix functions.

Add CAML_BA_FLOAT16 kind in Bigarray C stubs

55dc02f

Handle float16 elements in Bigarray serialize/deserialize functions

7109d18

Handle float16 elements in Bigarray get/set functions

2c39fff

Support float16 elements in Bigarray fill function

9619949

Add float16 elements comparison code in Bigarray

2de1e8f

More general version of internal DO_FLOAT_COMPARISON macro, to allow different type of stored/compared values. Rewrite DO_FLOAT_COMPARISON with DO_GENERIC_UNORDERED_COMPARISON macro.

Handle float16 elements in Bigarray caml_ba_hash function

2db51b8

Make use of F16C intrinsics for float16/float conversions, if available

bbb4f49

Update Bigarray module interface to add support for float16 elements

ba6a2c3

Add test to Bigarray testsuite for get/set with float16 elements

db104e6

Generate correct specialized native code for float16 bigarray accesses

11eff3c

Also: on ARM 64-bit, use the _Float16 type for hardware conversions.

xavierleroy mentioned this pull request Jun 12, 2023

Some additions to the ba_float16 proposal a12n/ocaml#1

Merged

xavierleroy and others added 2 commits June 12, 2023 15:43

Testing type-specialized accesses to bigarrays

348558d

Merge pull request #1 from xavierleroy/ba_float16_plus

c0ae4a1

Some additions to the ba_float16 proposal

xavierleroy approved these changes Jun 13, 2023

View reviewed changes

nojb reviewed Jun 13, 2023

View reviewed changes

stdlib/bigarray.mli Show resolved Hide resolved

stdlib/bigarray.mli Show resolved Hide resolved

Add @since tags to comments related to Bigarray.float16_elt

275a899

a12n force-pushed the ba_float16 branch from 431f713 to 275a899 Compare June 13, 2023 21:15

nojb approved these changes Jun 18, 2023

View reviewed changes

Update reviewers for 10775

18d5357

xavierleroy pushed a commit that referenced this pull request Jun 19, 2023

Support 16-bit floating-point numbers as elements of bigarrays (#10775)

08276af

xavierleroy closed this Jun 19, 2023

xavierleroy mentioned this pull request Jun 19, 2023

POWER back-end: wrong compilation of FP32 stores #12311

Merged

hhugo reviewed Jul 4, 2023

View reviewed changes

xavierleroy mentioned this pull request Jul 5, 2023

Move constructor Bigarray.Float16 last in the type Bigarray.kind #12357

Merged

gasche mentioned this pull request Jul 12, 2023

Get rid of the LongString module #12360

Merged

yallop mentioned this pull request Feb 8, 2024

Float16 support yallop/ocaml-ctypes#763

Open

nojb mentioned this pull request Apr 12, 2024

Fix undefined behavior of left-shifting a negative number #13094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half-precision floating-point elements in `Bigarray` #10775

Half-precision floating-point elements in `Bigarray` #10775

a12n commented Nov 16, 2021

xavierleroy commented Dec 3, 2021 •

edited

dra27 commented Feb 23, 2023

a12n commented Feb 26, 2023

a12n commented Mar 7, 2023

damiendoligez commented Mar 8, 2023

a12n commented Mar 8, 2023

xavierleroy commented Jun 9, 2023

xavierleroy commented Jun 9, 2023

xavierleroy commented Jun 13, 2023

xavierleroy left a comment

gasche commented Jun 13, 2023

lthls commented Jun 13, 2023

nojb left a comment

nojb commented Jun 13, 2023

a12n commented Jun 13, 2023 •

edited

nojb commented Jun 14, 2023

xavierleroy commented Jun 18, 2023

nojb commented Jun 18, 2023

xavierleroy commented Jun 18, 2023

nojb commented Jun 18, 2023

nojb left a comment

xavierleroy commented Jun 19, 2023

tmcgilchrist commented Jun 29, 2023

hhugo Jul 4, 2023

hhugo Jul 5, 2023

gasche Jul 5, 2023

hhugo Jul 5, 2023

xavierleroy Jul 5, 2023

Half-precision floating-point elements in Bigarray #10775

Half-precision floating-point elements in Bigarray #10775

Conversation

a12n commented Nov 16, 2021

xavierleroy commented Dec 3, 2021 • edited

dra27 commented Feb 23, 2023

a12n commented Feb 26, 2023

a12n commented Mar 7, 2023

damiendoligez commented Mar 8, 2023

a12n commented Mar 8, 2023

xavierleroy commented Jun 9, 2023

xavierleroy commented Jun 9, 2023

xavierleroy commented Jun 13, 2023

xavierleroy left a comment

Choose a reason for hiding this comment

gasche commented Jun 13, 2023

lthls commented Jun 13, 2023

nojb left a comment

Choose a reason for hiding this comment

nojb commented Jun 13, 2023

a12n commented Jun 13, 2023 • edited

nojb commented Jun 14, 2023

xavierleroy commented Jun 18, 2023

nojb commented Jun 18, 2023

xavierleroy commented Jun 18, 2023

nojb commented Jun 18, 2023

nojb left a comment

Choose a reason for hiding this comment

xavierleroy commented Jun 19, 2023

tmcgilchrist commented Jun 29, 2023

hhugo Jul 4, 2023

Choose a reason for hiding this comment

hhugo Jul 5, 2023

Choose a reason for hiding this comment

gasche Jul 5, 2023

Choose a reason for hiding this comment

hhugo Jul 5, 2023

Choose a reason for hiding this comment

xavierleroy Jul 5, 2023

Choose a reason for hiding this comment

Half-precision floating-point elements in `Bigarray` #10775

Half-precision floating-point elements in `Bigarray` #10775

xavierleroy commented Dec 3, 2021 •

edited

a12n commented Jun 13, 2023 •

edited