Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Expose a GADT representation for Bigarray.kind #6064
Original bug ID: 6064
The current interface to Bigarray.kind looks like this:
type ('a, 'b) kind
val float32 : (float, float32_elt) kind
The kind type is almost completely abstract -- there are no operations on kinds exposed except for bigarray construction and querying the element kind of an existing bigarray.
Exposing the representation of kind as a GADT would make kinds much more useful. The interface is the obvious one -- there's a nullary constructor for each kind value currently in the signature -- and we can even keep the same in-memory representation (with the small exception described below), since kinds are currently ints:
type ('a, 'b) kind =
With this change in place it becomes possible to write intensionally-polymorphic functions over bigarrays, such as the following trivial one that sets every element to zero:
let zero : type a b. (a, b) kind -> a = function
let zero_fill arr = Genarray.fill arr (zero (Genarray.kind arr))
Exposing the representation of kinds in this way would make bigarray/ctypes integration easier, and is apparently useful to other people (e.g. #3962).
I've attached a patch, written with Leo White, that makes this change and also exposes a GADT representation for Bigarray.layout. It's mostly backwards-compatible, but the representation of char is now distinct from the representation of int8_unsigned.
Comment author: @gasche
This patch is doing three different things:
(1) it adds a new CHAR kind to the C runtime part of bigarrays (previously only INT8 was added, and both kind were aliased on the OCaml side)
(2) it adds GADTs to allow richly-typed destruction of kinds by pattern-matching
(3) it modifies the C runtime to take as input from the OCaml side not the previous integers representing kind and layouts, but directly the GADTs values. As constant constructors are represented as integers from 0 to N-1, the kind-handling code (whose OCaml values where consecutive integers) mostly doesn't change, but the layout-handling code (which used 0 and 0x100) needs to change.
This is all rather clever, maybe a tad too much. Another possible option would be to separate the GADT-values used for their rich typing, and the integer-values used to communicate with the runtime. One could either:
(A) keep the code as is (Bigarray.int32 would still be an integer), add GADTs definitions, and add functions turning the integers into the GADTs values; I suspect those int-to-GADT functions need to be unsafe, but that's a reasonable kind of unsafe under the usual phatom type hypotheses
(B) make the GADT definitions the authoritative values on the OCaml side (Bigarray.int32 would be a GADT constructor as in your patch), but write proper tag-to-int conversion functions to avoid having to change the C runtime.
Approach (A) has the defect of making the OCaml-side interface more complex that it is now (two different presentations of kind, one phantom and one GADT).
Approach (B) has the same OCaml-side interface than you propose, is less invasive on the runtime part, but may introduce a slight performance cost (non-magic, non-dubious) conversion. However, one problem is that the Bigarray interface (.mli file) advertizes that, for example, GenArray.create, GenArray.kind and stuff are the C externals, so there is no room for intermediate OCaml-side translations between the input of those functions and the values passed on the C side.
All in all, I think your choice of changing the C side is reasonable. However, I would rather see more explicit and systematic conversions between the OCaml tags and the C-side values. You have added logic for layouts because you had to, but cleverly used 1 as a kind value to make it work transparently. I would rather see back-and-forth macros for both (possibly doing nothing for kinds) to help reviewing the code.
Comment author: @lpw25
I think that there are no backward-compatibility implications for type-safe programs. However, some type unsafe code (e.g. marshalling and C bindings) could potentially break, but probably only if it is quite fragile.
As long as code does not assume that Bigarray.int8_unsigned = Bigarray.char it will be fine.
Old char bigarrays would be read as int8_unsigned bigarrays, other than that there should be no problems.
Comment author: @gasche
Included in trunk. Thanks for the patch!
I think this is technically the first use of GADT for greater good included in the OCaml codebase, but to be fair I think that Benoît Vaugon's enormous work in #6017, which I hope to include as well soon-ish, will by release time have taken precedence in scope and practical impact.
Comment author: @dbuenzli
Btw. I don't know if there's a way to add this without breaking everything. But one thing I miss is kinds for unsigned int32 and int64. They would still be read/written with int32 and int64 but you'd actually know how you are supposed to interpret them.