Split runtime/array.c functions, use the new floatarray primitives in Float.Array#13361
Split runtime/array.c functions, use the new floatarray primitives in Float.Array#13361gasche merged 8 commits intoocaml:trunkfrom
Conversation
4efdca5 to
85eb4b3
Compare
nojb
left a comment
There was a problem hiding this comment.
As far as I can see, the patch is correct. LGTM
ec9c7b6 to
a527e53
Compare
|
The debug CI caught a couple mistakes in |
That will teach me to look more carefully at assertions! |
(the implementations mirror those of array.ml)
a527e53 to
4b8f4e1
Compare
|
There was a tricky bug hiding in the CI failure: I believed that all empty arrays are (physically) equal to I was using this assumption in the following implementation of for (mlsize_t i = 0; i < num_arrays; i++) {
/* An array is either the empty array Atom(0),
or a float array, or a non-float array.
We know which implementation to use on the first non-empty array. */
if (arrays[i] == Atom(0))
continue;
else if (Tag_val(arrays[i]) == Double_array_tag)
return caml_floatarray_gather(num_arrays, arrays, offsets, lengths);
else
break;
}
return caml_uniform_array_gather(num_arrays, arrays, offsets, lengths);(This code was written to avoid a second full traversal of the This code is wrong due to the existence of empty arrays that are not for (mlsize_t i = 0; i < num_arrays; i++) {
/* An array is either an empty array,
or a float array, or a non-float array.
We know which implementation to use on the first non-empty array. */
if (Wosize_val(arrays[i]) == 0)
continue;
else if (Tag_val(arrays[i]) == Double_array_tag)
return caml_floatarray_gather(num_arrays, arrays, offsets, lengths);
else
break;
}
return caml_uniform_array_gather(num_arrays, arrays, offsets, lengths); |
I suspect that even |
|
FTR, we are currently exploring the performance of Another point about the performance of
|
|
Looking at the definition of the |
|
I was using this trivial microbenchmark : test_floatarray.ml.txt I'll try to find time to play with your suggestion next week. |
I've tried different variant with/without the noalloc attribute, with/without inlining, with/without bound checking. Conclusion: yes, adding noalloc brings a visible gain for small sizes, and there is no downside, I guess. Forcing the inlining of The inlined+noalloc version (with bound checks) is still quite a bit slower than the version with a for-loop (with bound checks), but it's much less dramatic than with the current version (not inlined, no noalloc attribute). |
This PR is buildup work for ocaml/RFCs#37. It splits the
caml_array_foofunctions in runtime/array.c in three versions,caml_flotarray_foo,caml_uniform_array_foo, and finallycaml_array_foo. The newfloatarray_fooversions are used in Float.Array, which could noticeably improve performance as they replace pure-OCaml implementations.@nojb would you maybe be interested in reviewing this?