Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half-precision floating-point elements in Bigarray #10775

Closed
wants to merge 25 commits into from

Conversation

a12n
Copy link
Contributor

@a12n a12n commented Nov 16, 2021

This adds support for half-precision float elements in Bigarray.

The values stored as uint16_t elements. Intrinsics from F16C instruction set are used for float16/float conversions, if available. Fallback conversion functions are from public domain code by Fabian Giesen (looks like the same code is used in Intel SPMD Program Compiler).

To validate the fallback conversion functions, I've written the following C program. I'm not sure how (if at all) to incorporate it in the testsuite. F16C intrinsics are used as the reference, and it requires a proper -march= in the compiler flags and a proper CPU in the machine running the tests.

Is there any interest in such a feature?

@xavierleroy
Copy link
Contributor

xavierleroy commented Dec 3, 2021

Like many other suggestions for new features, this PR will have to wait for after the great Multicore OCaml merge.

However, I'm globally in favor: it's very much in the spirit of the Bigarray module to support 16-bit float arrays, since they are used in a number of other libraries we may want to interface with.

The one thing that makes me sad is that there is no standardized C compiler support or library function to convert to/from 16-bit FP numbers.

To be revisited after the Multicore merge.

@gasche gasche added this to the post-freeze milestone Dec 3, 2021
@dra27
Copy link
Member

dra27 commented Feb 23, 2023

@a12n - sorry that this has been forgotten. Would you be happy to rebase this so that CI and so forth can be re-triggered and then we can move forwards with a review?

@a12n
Copy link
Contributor Author

a12n commented Feb 26, 2023

@dra27 Sure, I'll do it shortly. Looks like more changes are needed, it fails to build now.

@a12n
Copy link
Contributor Author

a12n commented Mar 7, 2023

I've found a few more places where a new kind of bigarray elements should be handled (Typeopt, Lambda, ...).

Looks like it's unrelated to the failing builds, though.

The build in AppVeyor Cygwin fails like this:

…
  MKEXE runtime/ocamlruns.exe
cp runtime/ocamlruns.exe boot/ocamlruns.exe
make -C stdlib OCAMLRUN='$(ROOTDIR)/boot/ocamlruns.exe' \
    CAMLC='$(BOOT_OCAMLC)' all
make[2]: Entering directory '/cygdrive/c/projects/🐫реализация-mingw64/stdlib'
  OCAMLC camlinternalFormatBasics.cmi
make[2]: *** [Makefile:196: camlinternalFormatBasics.cmi] Error 127

As far as I understand, error 127 means "command not found".

When I try to configure and build on my machine (Debian Linux, amd64) it fails in even stranger way:

…
make -C stdlib \
  OCAMLRUN='$(ROOTDIR)/runtime/ocamlrun' \
  CAMLC='$(BOOT_OCAMLC) -use-prims ../runtime/primitives' all
make[2]: Entering directory '/home/arn/proj/ocaml/stdlib'
  OCAMLC camlinternalFormatBasics.cmi
  OCAMLC camlinternalFormatBasics.cmo
  OCAMLC stdlib.cmi
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
make[2]: *** [Makefile:196: stdlib.cmi] Error 134
gdb runtime/ocamlrun stdlib/core
Core was generated by `../runtime/ocamlrun ../boot/ocamlc -use-prims ../runtime/primitives -strict-seq'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f236e8b9537 in __GI_abort () at abort.c:79
#2  0x00007f236e912768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f236ea303a5 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f236e919a5a in malloc_printerr (str=str@entry=0x7f236ea32690 "malloc_consolidate(): invalid chunk size") at malloc.c:5347
#4  0x00007f236e91a918 in malloc_consolidate (av=av@entry=0x7f236ea66b80 <main_arena>) at malloc.c:4477
#5  0x00007f236e91c755 in _int_malloc (av=av@entry=0x7f236ea66b80 <main_arena>, bytes=bytes@entry=65656) at malloc.c:3699
#6  0x00007f236e91e164 in __GI___libc_malloc (bytes=65656) at malloc.c:3058
#7  0x000055fc9380ae45 in caml_stat_alloc_noexc (sz=sz@entry=65656) at runtime/memory.c:500
#8  caml_stat_alloc (sz=sz@entry=65656) at runtime/memory.c:554
#9  0x000055fc93804603 in caml_open_descriptor_in (fd=3) at runtime/io.c:168
#10 0x000055fc93804ffd in caml_ml_open_descriptor_in_with_flags (fd=<optimized out>, flags=0) at runtime/io.c:576
#11 0x000055fc93818d72 in caml_interprete (prog=<optimized out>, prog_size=<optimized out>) at runtime/interp.c:1037
#12 0x000055fc9381ae0c in caml_main (argv=0x7ffde7410a28) at runtime/startup_byt.c:573
#13 0x000055fc937ef9cc in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37

I'm not sure what to do next :(

@damiendoligez
Copy link
Member

malloc_consolidate(): invalid chunk size

This looks like a buffer overflow or an out-of-bounds access. Without looking in detail, I have a possible explanation: you've added Pbigarray_float16 close to the beginning of the type bigarray_kind, shifting all other constructors by one place. You make the corresponding change in caml_ba_element_size in runtime/bigarray.c, but that will need a careful bootstrap (see BOOTSTRAP.adoc) because a plain build uses the new runtime with an old compiler (the bootstrap compiler).

I'm not quite sure where the compiler uses bigarrays to trigger this problem, but it might simply be because of stdlib/random.ml.

You could do a quick check by moving Pbigarray_float16 to the end of bigarray_kind and changing caml_ba_element_size accordingly, then see if the build works.

@a12n
Copy link
Contributor Author

a12n commented Mar 8, 2023

Thanks! That's what I've missed. I've bootstrapped the compiler according to the instructions.

@damiendoligez damiendoligez modified the milestones: post-freeze, 5.1 Mar 8, 2023
@Octachron Octachron assigned Octachron and unassigned dra27 Mar 14, 2023
@Octachron Octachron assigned xavierleroy and unassigned Octachron Apr 21, 2023
@Octachron Octachron modified the milestones: 5.1, 5.2 Apr 21, 2023
@xavierleroy
Copy link
Contributor

This PR has been on my pile for a very long time. Apologies for the delay. Here is a high-level review:

  • The feature is definitely useful.
  • The changes to runtime/bigarray.c are fine.
  • The changes in the compilers are fine except for one bug, see below.
  • The portable implementation of float32 <-> float16 conversions is fine, I don't think we can do significantly faster.

There's one bug in native-code generation. Consider a.{i} <- a.{i} +. 1.0 where a is statically known to have kind Float16. ocamlopt generates 16-bit integer loads and stores, but no conversions to/from FP, so the FP addition operates on integer registers, causing the emitter to fail.

This can be fixed easily by generating calls to the generic get/set functions in this case, as if the kind of the bigarray a was unknown. Alternatively, the conversions between int16-representing-a-float16 and float64 can be implemented as calls to runtime functions.

Finally, I'm unsure about using processor instructions when available.

  • ARMv8 (ARM 64 bits) is the only supported platform where we can be sure the hardware supports FP16<->FP32 conversions.
  • For x86, the current PR uses F16C instructions if the runtime system is compiled with gcc -mf16c, which is not the default, and the resulting runtime system cannot run on processors that don't support the F16C extension.
  • An alternative that I played with is to detect F16C availability at run-time, and dynamically switch between the hardware instructions and the software emulation. But it's tricky code that may not be worth the effort.

@xavierleroy
Copy link
Contributor

I just pushed the simple fix for the ocamlopt issue : 8fd71c2 .

Also: on ARM 64-bit, use the _Float16 type for hardware conversions.
@xavierleroy
Copy link
Contributor

Thanks @a12n for having merged my suggestions in this PR:

  • make ocamlopt produce specialized get/set code for bigarrays that are statically known to hold FP16 numbers;
  • add support for ARMv8 (ARM 64-bit) FP16 conversions;
  • a test for specialized bigarray accesses (there was none before?).

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy with the PR in its current state. Since this is a standard library change and since I contributed some code, it would be good to have a second approval.

@gasche
Copy link
Member

gasche commented Jun 13, 2023

I wonder if @lthls or @nojb would be curious to have a look.

@lthls
Copy link
Contributor

lthls commented Jun 13, 2023

I can review the changes to the compiler itself, but I don't have anything to say about either the runtime code or the changes to the standard library.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the stdlib look OK (modulo some missing "since" annotations).

stdlib/bigarray.mli Show resolved Hide resolved
stdlib/bigarray.mli Show resolved Hide resolved
@nojb
Copy link
Contributor

nojb commented Jun 13, 2023

Question: is the bootstrap absolutely necessary?

@a12n
Copy link
Contributor Author

a12n commented Jun 13, 2023

Question: is the bootstrap absolutely necessary?

There was a change 55dc02f (a new element for enum caml_ba_kind and a new entry in caml_ba_element_size[]). Looks like bootstrap is needed (at least) for this.

Please see #10775 (comment) above.

@nojb
Copy link
Contributor

nojb commented Jun 14, 2023

Question: is the bootstrap absolutely necessary?

There was a change 55dc02f (a new element for enum caml_ba_kind and a new entry in caml_ba_element_size[]). Looks like bootstrap is needed (at least) for this.

Please see #10775 (comment) above.

Right, I had missed that comment. Thanks!

@xavierleroy
Copy link
Contributor

@nojb are you happy with the new @since annotations?

@nojb
Copy link
Contributor

nojb commented Jun 18, 2023

@nojb are you happy with the new @since annotations?

Thanks for the ping. Yes, looks OK.

@xavierleroy
Copy link
Contributor

Excellent. Would you feel confident to approve this PR, or do we need to find another second reviewer?

@nojb
Copy link
Contributor

nojb commented Jun 18, 2023

Excellent. Would you feel confident to approve this PR, or do we need to find another second reviewer?

I'm happy to do so. I had refrained from doing it because I only took a cursory glance at the C code.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xavierleroy
Copy link
Contributor

The bootstrap had to be re-done once more, so I squashed and merged manually on trunk (08276af, 8816d27). Thanks to all who contributed.

@tmcgilchrist
Copy link
Contributor

Finally, I'm unsure about using processor instructions when available.

  • ARMv8 (ARM 64 bits) is the only supported platform where we can be sure the hardware supports FP16<->FP32 conversions.

POWER 10 (Power ISA Version 3.1B) should also bring bfloat16 support but getting access to the hardware is tricky at the moment.

@@ -34,6 +34,7 @@ typedef unsigned short caml_ba_uint16;
#define CAML_BA_MAX_NUM_DIMS 16

enum caml_ba_kind {
CAML_BA_FLOAT16, /* Half-precision floats */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this new constructor at the end would probably help with backward compatibility.
In its current form, marshaling a bigarray gives a different result from previous ocaml version because all constructors have been shifted by 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Octachron, @gasche, should I open an issue to track this properly ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that @xavierleroy and @damiendoligez are on it (so: no need for a separate issue), but this is tricky because it requires an acrobatic bootstrap. Thanks for the report, this is indeed a good catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi, it was spotted while adapting jsoo to trunk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion. Implemented in #12357 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet