This proposal is discussed at this pull request.
OCaml currently has two attributes that this proposal may easily be confused with:
- We can mark arguments to external calls as
[@unboxed]
, as documented in the manual. This proposal essentially takes this idea and expands it to be usable within OCaml itself, instead of just at the foreign function interface boundary. - We can mark some types as
[@@unboxed]
, as briefly described in a bullet point in this manual section. This attribute, applicable to a type holding one field of payload (either a one-field record or a one-constructor variant with one field), means that no extra box is allocated when building values of that type. It is notionally separate from this proposal, though there is an interaction: see the section mentioning[@@unboxed]
in its title.
Suppose you define this type in OCaml:
type t = { x : Int32.t; y : Int32.t }
Values of this type carry 8 bytes of actual data. However, in OCaml, values of this type are represented as:
-
an 8-byte pointer
-
to a 24-byte block (8B header + 2x 8B fields)
-
pointing at two more 24-byte blocks (8B header + 8B
caml_int32_ops
+ 4B data + 4B padding)
There are some efficiency gains to be had here.
In this design, we are guided by the following high-level principles. These principles are not incontrovertible, but any design that fails to uphold these principles should be considered carefully.
Backward compatibility: The extension does not change the correctness or performance of existing programs.
Availability: Unboxed types should be easy to use. That is, it should not take much change to code to use unboxed types and values.
This is a proposed extension to OCaml to allow the layout of data types to be controlled by the programmer. This proposal won't affect the type defined above, but will allow you to define better layouts. With unboxed types, we can write the following:
type t = { x : #int32; y : #int32 }
Here, the x
and y
fields have the unboxed type #int32
, which
consumes only 4 bytes. Values of type t
are still boxed: they are
represented as 8-byte pointers to a 16-byte block, consisting of an
8-byte header and two 4-byte fields. Despite containing unboxed
fields, the type t is an ordinary OCaml type: it can be passed to
normal functions, stored in normal data structures, and so on.
On the other hand, the new type #int32
is unboxed. This means that
it can't be used to instantiate ordinary type parameters. For
example, fun (x : #int32) -> Fn.id x
will not work: Fn.id
doesn't
work on types that are not represented by exactly one word. Similarly,
you can't have a #int32 list
, because a list expects its contents to
be represented by exactly one word. Getting this right is important
for, e.g., the garbage collector, which expects values in memory to
have a certain format.
Unboxed types, thus, require a trade-off: they are more compact, but do not work as smoothly as normal boxed types.
The following types will be available in the initial environment, given with example literals of that type:
#() : #unit
#42b : #int8
#42s : #int16
#42l : #int32
#42L : #int64
#42n : #nativeint
#42. : #float
So, unboxed types mean that polymorphic definitions need to be polymorphic over only a subset of the possible types (say, the boxed ones). We do this by introducing layouts, which classify types in the way that types classify values. (The technical term for such things is kinds, but we're avoiding that word because (a) it's unfamiliar to programmers who aren't type theorists and (b) it doesn't describe what we're using kinds for.) The layout of a type determines how many bytes of memory values of that type consume, how many and what kind of registers are needed to pass such values to functions, and whether locations holding such values need to be registered with the GC.
Once we have layouts, the type of, say, diag : 'a -> 'a * 'a
is a short form
of:
val diag : ('a : value) . 'a -> 'a * 'a
The layout value
is the layout of ordinary OCaml values (boxed or
immediate). This means that trying to use diag at type #int32 -> #int32 * #int32
causes a compilation error, as #int32
is not of
layout value
.
The following layouts are available, given with example types of that layout:
(* concrete ones *)
string : value
int : immediate
#unit : void
#int8 : bits8
#int16 : bits16
#int32 : bits32
#int64 : bits64
#nativeint : word
#float : float64
(* indeterminate ones *)
any
gc_friendly
gc_ignorable
(On 64-bit machines, word
has the same representation as bits64
, but
we don't want to bake this coincidence into the type system.)
As well as the above primitive layouts, layouts may be combined using
the *
and +
operators. Unlike in types, *
and +
are associative
in layouts. That is, the
following are all equivalent at runtime:
(value * value) * value
value * (value * value)
value * value * value
There is a (transitive) subtyping relation between these layouts, as pictured here:
----any-----
/ \
gc_friendly gc_ignorable-------------
| \ / | \ \
value void word float64 bits_8..64
|
immediate
(It would be safe to add immediate <= word
, as well, but we don't for now. See
an Extension toward the end of this document.) The subtyping relationship
between product and sum layouts is pointwise, and only between products/sums of
the same length. A product/sum is a subtype of gc_friendly
(resp. gc_ignorable
) iff its components are subtypes gc_friendly
(resp. gc_ignorable
). All products/sums are subtypes of any
.
A gc_friendly
layout is one that the garbage collector knows how to treat:
these are pointers to gc-managed memory and tagged immediates. A gc_ignorable
layout is one that the garbage collector is free to ignore: these values are not
pointers to gc-managed memory (and contain no such pointers).
We might imagine that we could make, e.g. (value * value) * value
a subtype of value * (value * value)
(and vice versa), but this
would play poorly with type inference. For example, if we had
'l1 * 'l2 = value * value * value
(where 'l1
and 'l2
are unification
variables), how could we know how to proceed? Perhaps we could make
a subtyping relationship here, but only when there is an explicit check;
we leave this off the design for now.
Names used in layouts (both the alphanumeric names listed above and the infix binary operators) live in a new layout namespace (distinct from all existing namespaces) and are in scope in the initial environment. Though this proposal does not introduce syntax to do so, we can imagine constructs defining new names in the layout namespace that could shadow these existing names.
Because a layout describes how a type is stored in memory and passed to and from
functions, an [@@unboxed]
type must have the same layout as its field
type. That is, in both of the following declarations, outer
is assigned the
same layout as inner
:
type outer = { f : inner } [@@unboxed]
type outer = K of inner [@@unboxed]
Without the [@@unboxed]
attribute, both outer
s would be value
s. It is
possible that recursion prevents us from finding the layout of the right-hand
side:
type loopy = { f : loopy } [@@unboxed]
In this case, we default the layout of loopy
to value
, although a programmer
can also write a layout annotation to choose a different layout, as follows:
type immediate_loopy : immediate = { f : immediate_loopy } [@@unboxed]
Both loopy
and immediate_loopy
are uninhabited.
All of the currently-existing types in OCaml have layout value
. Some
also have layout immediate
, such as int
or unit
. We're planning
to add unboxed versions of a few builtin types, such as #int32
and
#float
. This gives:
type float : value
type int32 : value
type int : immediate
type #float : float64
type #int32 : bits32
All of these can be passed and returned from functions, but only the
value
types (including immediate
ones) can be stored in, say,
lists. Note that the existing +.
, etc. primitives won't work on
#float
, because they have the wrong type. New ones for #float
will
be made available. (Same for #int32
, etc.)
Unboxed types will be allowed in record fields:
type t = {
foo : int;
bar : #float;
...
}
The OCaml runtime forbids mixing safe-for-GC fields and unsafe-for-GC fields in a single block. See the mixed-block restriction, below.
A key aspect of the design of unboxed types is that we must know
the representation of a type before we can ever manipulate a value
of that type. For example, check out wrong
here:
let wrong : ('a : any). 'a -> 'a = fun x -> x
This function is impossible to define; it would get rejected at
compile time. The reason is that the compiled wrong
function must
take an argument somehow and then return it somehow. Yet, if we don't
know the representation of that argument, we cannot do this! Does the
argument fit in one register or two? Is it a pointer (that the GC would
have to be aware of) or not? We cannot know: the argument's layout is
any
.
Now consider this function:
let strange : ('a : any). string -> 'a = fun msg -> raise (Msg msg)
This one is OK. This function never has to manipulate the value
of type 'a
: that value is produced by a tail call, and so the code for
strange
never has to move it. (raise
is magical, allowed to compile
by virtue of the fact that it doesn't actually return to its caller.) In
practice, a function like strange
would always be called with a known
layout (such as value
or bits32 * bits32
), and so the caller would know
where and how to expect the result. The key aspect of this, though, is that
we can compile strange
to executable code without knowing the representation
of 'a
.
Here's one last example:
let rec silly : ('a : any). unit -> 'a -> 'a =
fun () -> silly ()
(The unit
argument is just to prevent
the definition from falling into a loop right away.) This definition
is also accepted. Note that, even though the type of silly
has a
type of layout any
to the left of an arrow, it does not
manipulate a value of that type. Thus, this is fine.
Abstracting from these examples, there are precisely two places we
require a concrete layout -- no any
here.
- When binding a term variable of type
t
,t
must have a concrete layout. - When passing a term of type
t
to a function,t
must have a concrete layout.
There is no restriction on returning a value from a function. See "Kinds are calling conventions" for more exploration of these restrictions.
There is also no restriction on knowing the representation of type variables in type declarations. For example, the following declaration is fine:
type ('a : any) pair = { first : 'a; second : 'a }
The above restrictions imply that any construction of pair
or pattern-match on
pair
would work only for instantiations that have known representations, but
we can suffice with just one, flexible type declaration.
Because of the restrictions around the usage of unboxed types (both that they cannot work with standard polymorphism and the restriction just above about mixing representations in record types), we sometimes want to box an unboxed type. The types we have seen so far have obvious boxed cousins, but we will soon see the ability to make custom unboxed types. We thus introduce
type ('a : any) box
val box : ('a : any). 'a -> 'a box
val unbox : ('a : any). 'a box -> 'a
These definitions are magical in a number of ways:
-
It is best to think of the type
box
as describing an infinite family of types, where the member of the family is chosen by the layout of the boxed type. (The family is infinite because of the*
and+
layouts, which can be used to create new layouts.) -
It would be impossible to define the
box
function without magic, because it runs afoul of the restriction around function arguments. A consequence of this design is that thebox
function cannot be abstracted over: a definition likelet mybox (x : ('a : any)) = box x
runs into trouble, because it binds a variable of a type whose layout is unknown. -
Because the function
box
is really a family of functions, keyed by the layout of its argument, it is not allowed to usebox
without choosing a layout. The simplest way to explain this is to say thatbox
must always appear fully applied. In this way, it might be helpful to think ofbox
more as a keyword than a function. But we could, in theory, allow something likelet box_float : ('a : float64). 'a -> 'a box = box
, wherebox
is unapplied (but specialized). However, there is little advantage to doing so, and it's harder to explain to users, so let's not: just saybox
must always appear applied. (Note: GHC has a similar restriction around similar constructs.) -
There is additional magic around various forms of unboxed types, as described below.
Going beyond storing just one unboxed value in a memory box, we also
want to be able to store many unboxed values in an array. Accordingly,
we extend the existing array
to take a type argument of any
,
as if array
were declared with
type ('a : any) array
Just like we need magical value-level primitives box
and unbox
to deal
with box
, we will also need similar primitives to deal with arrays
of unboxed values. This proposal does not spell out the entire API for this
feature, but we will work it out during the implementation. Regardless
of the API details, the array
type must have a similar layout-indexed
magic as box
, though array
could conceivably use a different memory
layout as box
does. In particular, the memory format of a boxed variant
(as described below in the section on "Unboxed variants") has a variable length,
making it impossible to pack into an array
. Thus array
may choose to use
a different memory format as box
in order to allow for indexing.
Note that extraction of an element from an array of unboxed values (e.g.
with get
) requires
copying the element. There are two ways a user might want to get an
unboxed value, via the following hypothetical API:
get : 'a array -> int -> 'a
get_boxed : 'a array -> int -> 'a box
If the type 'a
is unboxed, the get
function copies the value from
the memory allocated as part of the array into registers (or other
destination). To avoid this copying, one might imagine get_boxed
, intended
to return a pointer to the existing allocated memory within the array.
However, get_boxed
cannot do this: an 'a box
is expected to be a pointer
to a block with a header, and an element within an array lacks this header.
Instead, we might imagine
get_element : 'a array -> int -> 'a element
where element
is a new type that represented a 1-element slice of an
allocated array. The get_element
function would indeed just return
a pointer, but it requires yet another magical type element
, represented
by a pointer to the middle of a block. (This would pose a challenge
for garbage collection, though presumably not an insurmountable one.)
We do not explore this aspect of the design further in this document.
Instead, this consideration of get_boxed
and get_element
serves to
explain why there is no loss of run-time efficiency if the memory format used
by box
and the memory format used by array
are completely different.
In addition to new primitive types, like #int32
, this proposal also
includes user-defined unboxed
types (by contrast to what's above: user-defined boxed types
containing unboxed fields). The simplest of these is unboxed tuples:
#( int * float * string )
This is an unboxed tuple, and is passed around as three separate
values. Passing it to a function requires three registers.
The layout of an unboxed tuple is the product of the layout of its
components. In particular, the layout of #( string * string )
is
value * value
, not value
, and so it cannot be stored in, say, a
list.
Constructing and pattern-matching against an unboxed tuple requires
a #
: let ubx : #(int * #float) = #(5, #4.)
. Note that parentheses
are required for construction and pattern-matching unboxed tuples.
Naturally, boxing an unboxed tuple yields a boxed tuple. We thus add the following rules:
-
For all unboxed tuples
#(ty1 * ty2 * ...)
,#(ty1 * ty2 * ...) box = (ty1 * ty2 * ...)
. The...
is meant to denote that this works for any arity, including 0. -
The syntax
(e1, e2, ...)
continues to work for boxed tuples.
The current OCaml runtime has the following restriction:
-
For every block of memory, one of the following must hold:
-
All words in the block (other than the header) may be inspected by the garbage collector. That is, every word in the block is either a pointer to GC-managed memory or has its bottom bit tagged. This is a gc-friendly block.
-
No words in the block are pointers to GC-managed memory. This is a gc-ignorable block.
-
This is also described at in Real World OCaml.
We call this rule the mixed-block restriction, and we call types whose memory layout is mixed as mixed-block types. We imagine that we will be able to lift this restriction in a future release of the runtime, but we do not expect to do this as part of our initial rollout of unboxed types. We thus refer to this restriction in several places in this specification so that readers might imagine a future improvement.
Note that a mixed-block type has a layout that is neither a subtype of
gc_friendly
nor a subtype of gc_ignorable
.
The first consequence of the mixed-block restriction is already apparent:
we cannot store, say, an unboxed tuple #(string * #float)
in a memory
block, as that block would be mixed.
We thus have the following restriction on mixed-block types: box
may
never be called on a mixed-block type. Equivalently, the layout of the argument
to box
must be either a subtype of gc_friendly
or gc_ignorable
.
(This is indeed the essense of
the mixed-block restriction.) However, because box
is implicitly
used by constructors of boxed types -- such as a boxed tuple -- this
restriction applies more widely than code literally calling box
. For
example, the following would be rejected:
let bad : (string * #float) = "hi", #4.
The problem is the implicit call to box
in the definition, which would
expand to
box #("hi", #4.)
Note that the mixed-block restriction never requires looking directly at
types; instead, a mixed-block type can be identified by looking only at
its layout. This is in keeping with our understanding of box
as a
family of functions, indexed by layout. The members of the family corresponding
to mixed-block types simply do not exist.
Unboxed records are in theory just as straightforward as unboxed tuples, but have some extra complexity just to handle the various ways that types are defined and used. Suppose you write:
type t = { x : #int32; y : #int32 }
Sometimes we want to refer to t
as a boxed type, to store it in
lists and other standard data structures. Sometimes we want to refer
to t
as an unboxed type, to avoid extra allocations (in particular,
to inline it into other records). The language never infers which is
which (boxing and allocation is always explicit in this design), so we
need different syntax for both.
To make this all work, we consider a declaration like the one for
t
above to actually declare two types. To wit, we desugar the
above syntax to these declarations:
type #t = #{ x : #int32; y : #int32 }
type t = #t box
That is, we have a primitive notion of an unboxed record, called
#t
, and then we define t
just as #t box
. (The #{
syntax is
also available to users, if they want to just directly declare an
unboxed record.)
To make this backward compatible (and at all pleasant to program in),
we add a little more magic to box
:
-
In the syntax
e.x
(wheree
is an arbitrary expression andx
is a path), ife
has typety box
(for somety
), then we treat the expression as if it were projecting fromty
after unboxing. -
Along similar lines, we allow record-creation and pattern-matching syntax
{ ... }
to create and match againstbox
.
We might imagine an alternative design without automatically creating unboxed record definitions in this way, but that design does not uphold the Availability principle as well as the current design.
We use a #
in the syntax for creating unboxed records:
type q = #{ x : int; y : #int32 }
let g : q -> q = fun #{ x; y } as r -> #{ x = x + r.x; y }
(If we didn't have the special treatment for box
, we wouldn't need to
differentiate construction syntax, as we can infer that the record is unboxed
from the field labels, as we do for other record types. We could also
release a first version that requires #{
and then see if users want to
be able to avoid the #
. If we drop the #
requirement, we could simply
default to boxed, unless the type system disambiguation mechanism specifically
selects for unboxed.)
Type aliases do not automatically bind an unboxed version; rather it is a property of the record type declaration that creates the extra unboxed definition. However, if you create a transparent alias like
type u = t = { x : int32; y : int32 }
then we translate this to become
type #u = #t = #{ x : int32; y : int32 }
type u = #u box (* equivalent to type u = #t box *)
Naturally, you can also directly alias an unboxed type:
type u = #t
Similarly, an abstract type abstracts only the boxed type:
module M : sig
type t
end = struct
type t = { x : int32; y : int32 }
end
defines M.t
but not M.#t
. If you want to export both
an unboxed and boxed version of an abstract type, you can with
module M : sig
type unboxed_t : value * value
type t = unboxed_t box
end = struct
type t = { x : int32; y : int32 }
type unboxed_t = #t
end
We might imagine allowing module signatures to include type #t
when they also include type t
, but as we see here, this feature
is not strictly necessary.
If we have a
type t = { x : int; y : int }
and write a function
let f t = t.x + t.y
what type do we infer for f
? Of course, we want to infer t -> int
, but in
the presence of unboxed types, we might imagine inferring #t -> int
, as t.x
is a valid way of projecting out the x
field of both t
and #t
. Yet
inferring #t -> int
would be terribly non-backward-compatible.
We thus effectively have the boxed projection shadow the unboxed one. That is,
when we spot t.x
(for a t
of as-yet-unknown type), we assume that the
projection is from the boxed type t
, not the unboxed type #t
. If a user
wants to project from the unboxed type, they can disambiguate using a prefix
#
, like t.#x
. Thus, if we have
let g t = t.#x + t.y
we infer g : #t -> int
, though inference is non-principal. Users could also
naturally write
let g t = t.#x + t.#y
which supports principled type inference.
To be clear, the #
mark is just used for disambiguation. The following also
works:
let g (t : #t) = t.x + t.y
There is no ambiguity here, and thus the #
mark is not needed on record
projections.
Although this section discusses only record projections, the same idea applies
to record construction and pattern matches: the field of the boxed record
shadows the field of the unboxed record, though the latter can be written with a
#
prefix (or can be discovered by type-directed disambiguation).
There are several concerns that arise when thinking about mutable fields and unboxed types; this section lays out the scenarios and how they are treated. We assume the following declarations:
type point = { x : #int32; y : #int32 }
type mut_point = { mutable x : #int32; mutable y : #int32; identity : int }
-
Mutable unboxed field in a boxed record. Example:
type t = { radius : #int32; mutable center : #point }
A mutable unboxed field can be updated with
<-
, analogously to a mutable boxed field. However, updating an unboxed field might take multiple separate writes to memory. Accordingly, there is the possibility of data races when doing so: one thread might be writing to a field word-by-word while another thread is reading it, and the reading thread might witness an inconsistent state. (For records with existential variables and unboxed sums, this inconsistent state might lead to a segmentation fault; for other types, the problem might arise only as an unexpected result.) We call this undesirable action tearing.To help the programmer avoid tears, an update of a mutable unboxed field is sometimes disallowed. We surely want to disallow such an update if it is not type safe. We also want to disallow such an update if it could violate abstraction: perhaps some abstract record type internally maintains an invariant, and a torn record might not support the invariant.
Some cases are easy: we definitely want to support mutation of
#float
s, and we definitely want to disallow mutation of wide unboxed variants (where we might update the tag and the fields in separate writes). What about wide unboxed records? It depends. Consider the typet
in this section, and imaginetype s = { mutable circle : #t }
. Should we allows.circle <- ...
? Our answer, in essence: if a type is concrete, then allow the update. After all, a concrete type cannot be maintaining any invariants (or if it is, the programmer is responsible). So, if we have the definition of#t
, thens.circle <- ...
is allowed. However, if we do not have the definition of#t
, thens.circle <- ...
is disallowed.Yet this design -- based on the concreteness of a type -- is disappointing, as it bases the choice of allowing mutable updates on whether or not an interface exposes a representation. Instead, we would want to be able to write an abstract type yet which allows a potentially-tearing update.
We thus introduce a new layout
*w
(the "w" stands for "writable"). (We will use*w
in our notation here, but the surface syntax is different and defined below.)*w
is like*
(andl1 *w l2 <= l1' * l2'
iffl1 <= l1'
andl2 <= l2'
), but a product built with*w
is allowed in a mutable update, even if it is potentially non-atomic. The inferred layout of a most (see below) product types (like#t
) will be built with*w
, not*
. Yet if a type is exported abstractly from a module, the module signature will have to specify the layout of that type; most users will write a layout using*
, thus protecting values of that type from getting torn.If a user wants to write a layout with
*w
, they can do so like this:type t : bits32 * (bits32 * bits32) [@writable]
The
[@writable]
attribute transforms all*
s in a layout to be*w
s.Despite most product types being inferred to have a
*w
layout, product types with existential variables are inferred to have a*
layout, as tearing an existential could be type-unsafe. Here is an example:type ex_int = K : 'a * ('a -> int) -> ex_int
The layout of
#ex_int
isvalue * value
(the first components isvalue
because there is no annotation on'a
suggesting otherwise), with the non-writable*
.Putting this all together, we define the following predicate to test for writeability:
writable : concrete_layout -> bool writable(l1 * l2) = false writable(l1 + l2) = false writable(l1 *w l2) = writable(l1) && writable(l2) writable(_) = true
The update of a mutable field is thus allowed if either:
- The layout of the field fits in a 64-bit word, or
- The layout of the field satisfies the
writable
predicate above.
The first possibility -- for word-sized layouts -- is because a type whose values fit in a 64-bit word have no threat of getting torn, and thus can be updated safely. We choose 64-bit words (not 32-bit ones) because 32-bit OCaml does not support multicore, and so no possibility of tearing exists.
The restriction described here is called the tearing restriction.
The tearing restriction allows updates of our
mutable center
field, for two reasons: the type#point
is concrete and thus has layoutbits32 *w bits32
, with*w
, and a#point
fits in a 64-bit word.A user who wishes to update a non-atomic mutable field (with their own plan for synchronization between threads) may do so with the
[@non_atomic]
attribute, like so:type point = { x : #int32; y : #int32 } module Circle : sig type t : bits32 * (bits32 * bits32) end = struct type t = #{ radius : #int32; mutable center : #point } end type t = { mutable circle : #Circle.t; color : Color.t } let f (t : t) = t.circle <- #{ ... } [@non_atomic]
The design in this section upholds the following principle:
- Even in the presence of data races, a programmer can never observe an
intermediate state of a single assignment (unless they write
[@non_atomic]
).
Optional. It may be convenient for users to get a warning when they declare a mutable field that falls afoul of the tearing restriction and thus requires
[@non_atomic]
on every update. -
Mutable field in an unboxed record. Example:
fun (pt : #mut_point) -> (pt.x <- pt.x + #1l); pt
Mutation within an unboxed record would be unusual, because unboxed values have no real notion of identity. That is, an update-in-place appears to be indistinguishable from a functional update (that is,
{pt with x = pt.x + #1l}
). By its unboxed nature, an unboxed value cannot be aliased, and so the mutation will not propagate.Here is an example:
fun (pt : #mut_point) -> let boxed = box pt in pt.x <- pt.x + #1l; (unbox boxed).x = pt.x
This function would always return
false
, because the point stored in the box is a copy ofpt
, not a reference topt
. In other words, unboxed values are passed by value, not by reference. Accordingly, every time an unboxed value is passed to or returned from a function, it is copied.Because of the potential for bugs around mutability within unboxed records, this proposal disallows it: in an unboxed record, any
mutable
keywords are ignored. Accordingly, the example above is rejected:pt.x <- ...
is disallowed, becausept
is an unboxed record.This section can be summarized easily:
- Fields of unboxed records are never mutable.
-
Fields within a mutable unboxed record within a boxed record. Example: if we have
t : t
(where the typet
is from mutation case 1, above), thent.center.x
is a field within a mutable unboxed record within a boxed record.The case of such a field is interesting in that an unboxed record placed within a boxed record does have a notion of identity: it lives at a particular place in the heap, and the surrounding boxed record might indeed be aliased. Thus mutation in this nested case does make good sense, and we surely want to support it.
We thus introduce a new syntax for updates of individual fields of a mutable unboxed record: We write e.g.
t.(.center.x) <- ...
. Note the parentheses: they denote that we're updating thex
sub-field within the mutablecenter
. The parentheses are required here, as is the leading.
. (The leading.
within the parentheses disambiguates the syntax with array access.)The rule is this: Consider the list of field accesses in a parenthesized field assignment. The first field must be mutable, and all fields in the list must be unboxed.
By the first field being mutable, we know that the record directly before the open-parenthesis is boxed, because unboxed records do not have mutable fields.
This allows
t.(.center.x) <- #3l
even thoughx
is not declared asmutable
. This does not break abstraction: the type ofcenter
must already be concrete in order to write this, and so a user can always writet.center <- {t.center with x = #3l}
instead (modulo the tearing restriction). In effect,t.(.center.x) <- #3l
can be seen as syntactic sugar fort.center <- #{ t.center with x = #3l }
, but allowed even ift.center
is too wide for the tearing restriction.Beyond just mutable fields in boxed records, this new syntax form extends to elements of mutable arrays. So if we have a
#pt array
namedarr
, users could writearr.(.(3).x) <- #4l
to update one field of an element of the array.
According to the descriptions above, the #
prefix on types is
not recursive. That is, the contents of a #t
are the same as
the contents of a t
-- it's just that the #t
is laid out in
memory differently. Let's explore the consequences of this aspect
of the design by example:
Consider the definitions (let's assume a 64-bit machine)
type t0 = { x : int32; y : int16 }
type t1 = { x : #int32; y : #int16 }
type t2 = { t1 : t1; z : #int16 }
type t3 = { t1 : #t1; z : #int16 }
type t4 = #{ t1 : #t1; z : #int16 }
The points below suggest a calling convention around passing arguments in registers. These comments are meant to apply both to passing arguments and returning values, and are up to discussion with back-end folks.
-
A value of type
t0
will be a pointer to a block of memory with 3 words: a header, a pointer to a boxedint32
forx
, and a word-sized immediate fory
. It is passed in one register (the pointer). -
A value of type
#t0
will comprise two words of memory: a pointer forx
, and an immediate fory
; it is passed in two registers to a function. -
A value of type
t1
will be a pointer to a block of memory with 2 words: a header and a word containingx
andy
packed together (precise layout within the word to be determined later, and likely machine-dependent). It is passed in one register (the pointer). -
A value of type
#t1
will comprise one word of memory, containingx
andy
packed together. When passed to a function, it will be passed as two registers, one for each component. -
A value of type
t2
does not exist: it would be pointer to a mixed block, containing a pointer to at1
structure and a non-GC word forz
.Alternative: (We do not plan to do this.) A value of type
t2
is a pointer to a block of memory with three words: a header, a pointer to at1
structure (as described above), and a tagged immediate word, 16 bits of which storez
. (In this design, az : #int64
would be rejected because there is no room for the tag.) -
A value of type
#t2
comprises two words: one is a pointer to at1
structure, and one containsz
. (Thez
word does not need to be tagged, as the GC will know not to look at it.) It may not be stored in memory; it is passed in two registers. -
A value of type
t3
is a pointer to a block of memory with 2 words: a header and one packed word containing 48 bits oft1
and 16 bits ofz
. It is passed in one register. -
A value of type
#t3
comprises one packed word in memory, containing all ofx
,y
, andz
, with no tag bit. It is passed in three registers, one each forx
,y
, andz
. -
The type
t4
is utterly identical to the type#t3
. -
The type
#t4
does not exist; it is an error, as the name#t4
is unbound.
The key takeaway here is that if a programmer wants tight packing, they have to
specify the #
in the definition of their types. Note that there is no way to
get from the definition of t0
or t2
to a tightly packed representation; use
t1
or t3
instead!
A separate takeaway here is that memory layout in our design is associative:
the memory layout does not depend on how the type definitions are
structured. This is in contrast to C struct
s, where each sub-struct
must be
appropriately aligned and padded. For example, the C translation of this example
is
struct t3 {
struct t1 { int32 x; int16 y; } t1;
int16 z;
};
Yet this would take 2 words in memory, as t1
would be padded in order to be
32-bit aligned.
We wish for unboxed records (and tuples, to which this discussion equally applies) to be packed as tightly as possible when stored in memory. (This dense packing does not apply when an unboxed record is stored in local variables, as it may be more efficient to store components in separate registers.)
This section of the proposal describes a possible approach to tight memory packing that supports reordering. This section is more hypothetical than others, and we type theorists defer to back-end experts if they disagree here. The section ends with a few user-facing design conclusions.
The key example is
type t5 = { t1 : #t1; a : int; z : #int16 }
Note that now, we have an a
between the #t1
and the #int16
.
A bad idea would be to tightly pack the int
against the #t1
, like this (not
to scale):
63 31 0 63 31 0
xxxxxxxxxyyyyyaaaaaa aaaaaaaaaaaaaaazzzz
Note that the 64 bits of a
are spread across two words. This would make
operations on a
very expensive. Just say "no"!
Instead, we insist that a
is word-aligned. We might thus imagine
63 31 0 63 31 0 63 31 0
xxxxxxxxxyyyyy000000 aaaaaaaaaaaaaaaaaaa zzzzz0000000000000000
where 0
denotes padding. That works, but it's wasteful. Instead, we do
63 31 0 63 31 0
xxxxxxxxxyyyyyzzzzzz aaaaaaaaaaaaaaaaaaa
Everything is aligned nicely, and there's no waste. The only drawback is that we have reordered.
In general, we reserve the right to reorder components in an unboxed tuple or
record in order to reduce padding. With demand, we could imagine introducing
a way for programmers to request that we do not reorder. (An unboxed type that
does not participate in reordering would have a different layout from one that
does participate in reordering, meaning essentially a new layout former, such
as &
, analogous to *
.)
However, the reordering is not encoded in the layout system. Imagine now
type t6 = { t1 : #t1; z : #int16; a : int }
This t6
is the same as t5
, but with fields in a different order. We have
t5 : (bits32 * bits16) * immediate * bits16
t6 : (bits32 * bits16) * bits16 * immediate
Accordingly, a type variable that ranges over t5
would not also be able to
range over t6
: the two have different layouts. We could imagine a very
fancy layout equivalence relation that would detect the reordering here and
say that t5
's layout equals t6
's; we choose not to do this, for several
reasons:
- Encoding reordering in the layout system potentially constrains us in the future, in case we wish to do fewer reorderings.
- This significantly complicates layouts, for little perceived benefit.
- Different architectures may benefit from different reorderings; we do not want the type system to depend on the architecture.
Accordingly, the reordering of physical layout is mostly undetectable by users:
they just get a more compact running program. The way to detect the reordering
is by either inspecting memory manually (of course) or by sending unboxed
records through a foreign function. In order to support foreign functions, we
will add an interaction with ocamlc
that produces a C header that offers
functions that extract and set the various fields of an unboxed tuple. Foreign
code could then use this header to be sure that it interacts with our reordering
algorithm correctly.
Existing OCaml programs may have foreign interfaces that rely on a certain
ordering of record fields. The reordering story need not disrupt this. To wit,
we promise that, as we work out the details of reordering, we never reorder
fields in records (or tuples) where all fields have layout value
. (This
includes all types expressible before our change.)
Going further, and imagining possibly lifting the mixed-block restriction, we can imagine the following rule:
- (Hypothetical) In any record or tuple type, its fields are ordered in memory
with all
value
s first, in the order given in the declaration, followed by non-value
s, in an implementation-defined order.
This rule actually contradicts the layout of t5
above, which would put a
(a
value
, because immediate
s are value
s) first. However, we make no promises
about the hypothetical rule and include it here just as a possible way forward.
Unboxed variants pose a particular challenge, because the boxed representation is quite different than the unboxed representation.
Consider
type t = K1 of int * bool | K2 of #float * #int32 | K3 of #int32 | K4
An unboxed representation of this would have to essentially be like a C
union with a tag. This particular #t
would use registers as follows for
argument passing:
bits2 (* tag information *)
immediate (* for the int *)
immediate (* for the bool *)
float64 (* for the #float *)
bits32 (* for both #int32s, shared across constructors *)
No matter which variant is chosen, this register format works. This is important
because the layout of an unboxed type must describe its register
requirements, regardless of what constructor is used. We thus use a new
layout former to describe unboxed variants, +
. That is, the layout of t
would
be (immediate * immediate) + (float64 * bits32) + bits32 + void
.
By specifying only the
layouts of the variants' fields -- not the actual register format itself -- we leave
abstract the actual algorithm for determining the register format.
(See "Aside" below for discussion on this point.)
One challenge in working with unboxed variants is that it may be hard for the
programmer to predict the size of the unboxed variant. That is, a programmer
might have a variant v
and then think that #v
will be more performant than
v
. However, #v
might be wider than even the widest single variant of v
and
thus actually be less efficient. (Of course, this is always true: we should
always test before assuming that e.g. unboxing improves performance.) As we
implement, we should keep in mind producing a mechanism where programmers can
discover the memory layout of an unboxed variant, so they can make an informed
decision as to its usage.
In contrast to the fixed register format above, a boxed variant needs only enough memory to store the fields for one particular constructor. That's because boxed variants get allocated specifically for one constructor -- there is no requirement that all variants have the same size and layout.
Despite the challenges here, box
can still work to convert an unboxed variant
to a boxed one: box
simply understands the +
layout form to mean
alternatives,
just as variants have always worked.
We thus extend the treatment of unboxed records to work analogously for unboxed variants. That is, we treat a definition of a boxed variant
type t = K1 of ty1 | K2 of ty2 | ...
to really mean
type #t = #( K1 of ty1 | K2 of ty2 | ... )
type t = #t box
We then additionally add magic to box
to make this change transparent to users:
- A boxed unboxed variant may be constructed and pattern-matched against as
if the box were not there. Following our example,
K1
andK2
could construct values of typet
, and values of typet
could be matched against constructorsK1
andK2
.
Depending on how an unboxed variant ends up in memory, it has one of three possible representations:
-
If an unboxed variant is
box
ed, then it has the same representation as today's variants: a block including a tag and the fields of the particular constructor (only). -
If an unboxed variant is part of a boxed product (i.e. record, tuple, or array), then it takes up exactly as much space as needed to store the tag and the widest constructor. Here is the example:
type t = K1 of int * string * string | K2 of bool * bool | K3 of string type t2 = { f1 : float; f2 : #t; f3 : string }
A value of type
t2
will be a pointer to a block arranged as follows:word 0: header word 1: f1, a pointer to a boxed float word 2: tag for f2 word 3: K1's int (immediate) or K2's bool (immediate) or K3's string (pointer) word 4: K1's string (pointer) or K2's bool (immediate) word 5: K1's string (pointer) word 6: f3, a pointer to a string
Note that the
#t
takes up 4 words, regardless of the choice of constructor. This fixed-width representation is needed in order to givef3
a fixed offset from the beginning of the record, which makes accesses off3
more efficient. Imagining a variable-width encoding requires examining the tag of the variant in order to address later fields; this becomes untenable if a record has multiple inlined variants. (We can think of#t
here more as an inlined variant than an unboxed one.) -
If an unboxed variant is part of an outer variant, we essentially inline the inner unboxed variant. The section on "Nesting", below, covers this case.
Just as we have done for record fields, we consider the constructor for the
boxed variant to shadow the constructor for the unboxed one. That is, writing
K1 blah
will construct a t
. However, if we already know that the expected
type of K1 blah
is #t
, then we use type-directed disambiguation to discover
that the K1
is meant to refer to the unboxed variant, not the boxed
one. Echoing the design for record fields, we can use a #
prefix to
disambiguate manually: #K1 blah
unambiguously creates a #t
.
We can naturally nest unboxed variants, just like we did with unboxed records.
Just as before, the #
mark is not recursive. Also just as before, we can
gain extra efficiency by cleverly packing one variant inside of another. Let's
explore an example:
type t = K1 of int * bool | K2 of #float * #int32 | K3 of #int32 | K4
(* as above *)
type t2 = K2_1 of #int32 | K2_2 of #float * #float | K2_3 of #t
The t
is unedited from above, but the t2
definition contains a #t
.
We imagine the following register format for t2
:
bits3 (* for the combined tag *)
immediate (* for K1's int *)
immediate (* for K1's bool *)
float64 (* for K2's #float and K2_2's first #float *)
float64 (* for K2_2's second #float *)
bits32 (* for K2's, K3's, and K2_1's #int32s #)
A few observations to note about this format:
- There is no part of this arrangement that matches the arrangement for
t
; indeed, it is not necessary to efficiently get from at2
to at
: when the user asked to unboxt
in the definition oft2
, they gave up on this possibility. - If the
#t
component ofK2_3
is, say, passed to another function, its components will have to be copied into new registers, so that the function can extract the pieces it needs. - If the
#t
component ofK2_3
is, say, matched against, the matcher can simply remember an offset when looking at the tag. In our case, we might imagine that tags0
and1
correspond toK2_1
andK2_2
, with tags2
through5
corresponding to the constructors oft
. Then, a match ont
would simply subtract 2 from the tag before comparing againstt
's constructors.
Note that the description above around boxing applies to nested variants, too,
because the layout of #t2
is
bits32 + (float64 * float64) + (immediate * immediate) +
(float64 * bits32) + bits32 + void
. Each summand is interpreted as a variant by box
, and thus the in-memory
representation does not need to take any extra space. Spelling this out, a
#t2 box
would be either be the tagged immediate for 0
(representing
K2_3 K4
) or a pointer to a block containing a header and the following fields
tag | fields | represents
===================================================
0 | bits32 | K2_1 x
1 | float64 ; float64 | K2_2 (x, y)
2 | immediate ; immediate | K2_3 (K1 (x, y))
3 | float64 ; bits32 | K2_3 (K2 (x, y))
4 | bits32 | K2_3 (K3 x)
As in the discussions above about low-level details: it's all
subject to change. The key observation here is that unboxing a variant
effectively inlines it within another variant. Put another way, we want the
concrete use of memory and registers for a type to be unaffected by the choice
of how the type is structured (that is, ignore the associativity choices of +
,
much like we have ignored the associativity choices of *
).
The section above describes a fresh layout constructor +
for unboxed
variants. However, in memory, an unboxed variant is laid out just like
an unboxed tuple, so we could, in theory, use the same layout constructor
for both. Under this idea, the t
example above would have layout
immediate * immediate * float * int32 * int2
.
However, we do not adopt this design for (at least) these reasons:
-
Encoding the unboxed-variant layout algorithm in the type system seems fragile in the face of possible future improvements/changes. Maybe we can come up with a better way of packing certain fields in the future, and it would thus be a shame if such a change broke layout checking.
-
An unboxed product type of that kind can reasonably be the type of a mutable record field. However, the variant case can't. That's because racing writes of an unboxed variant type risk memory safety -- the fields from one constructor might win the race, whilst the tag from the other constructor wins.
-
The most natural representation for including an unboxed sum type within a boxed sum type is actually to add aditional constructors to the boxed sum type -- dual to how unboxing a product into a product adds additional fields. In other words, we would store the tag of the nested unboxed sum type as part of the tag of the enclosing boxed sum type.
That representation also gives an additional excuse for not allowing mutable unboxed variant fields.
Note that the kind of representation described above is still available using GADTs to make the tag fields explicit. For example:
type foo = { a : int;
unused1 : unit;
unused2 : unit; }
type bar = { b : float;
c : string;
unused : unit; }
type baz = { d : string;
e : string;
f : string; }
type ('a : value * value * value) tag =
| Foo : #foo tag
| Bar : #bar tag
| Baz : #baz tag
type t =
T : { g : float;
tag : 'a tag
data : 'a } -> t
The design notes below here are about the details of making unboxed types play nicely with the rest of the type system.
The key problem is that we must now somehow figure out the layouts
of all type variables, including those in type declarations and
in value descriptions. The Availability principle suggests that
we should infer layouts as far as is possible. Yet the Backward
compatibility principle tells us that we should prefer the
value
layout over other layouts when there is a choice.
We thus operate with the following general design:
-
The layout of a rigid variable is itself rigid: it must be supplied at the type variable's binding site.
-
The layout of a flexible (unification) variable is itself flexible: it can be inferred from usage sites. If we still do not know the layout of a variable when we have finished processing a compilation unit, we default it to
value
.
(Why wait for the whole compilation unit? Because we might get critical information in an mli file, seen only at the very end of an entire file.)
We retain principal types, but we don't have principal layouts: for any expression, there's a best type for any given layout, but there's no best layout (or at least, the best layout isn't always compilable).
Programmers may also specify a layout, using the familiar :
syntax
for type ascriptions. A layout may be given wherever a type appears.
For example, these declarations are accepted:
type ('a : value, 'b : immediate) t = Foo of 'a * 'b
val fn : ('a : immediate) . 'a ref -> 'a -> unit
val fn2 : 'a ref -> ('a : immediate) -> unit (* NB: not on declaration of 'a *)
val fn3 : (unit : immediate) -> unit (* NB: layout ascription on type, not variable *)
type ('a : bits32 * bits32) t2 = #{ x : #float; stuff : 'a }
val fn4 : 'a t2 -> 'a t2 (* layout of fn4 is inferred *)
type t5 : bits32 (* layout ascription on an abstract type *)
val fn5 : 'a -> 'a (* NB: 'a is defaulted to have layout `value` *)
It may seem a little weird here to allow abstraction over types of
layout bits32
. After all, there's only one such type (#int32
).
However, this ability becomes valuable when combined with abstract
types: two different modules can expose two different abstract types
M.id
, N.id
, both representing opaque 32-bit identifiers. By
exposing them as abstract types of layout #int32
, these modules can
advertise the fact that the types are represented in four bytes
(allowing clients of these modules to store them in a memory-efficient
way), yet still prevent clients from mixing up IDs from different
modules or doing arithmetic on IDs.
A concrete layout actually encodes three different properties of a type: how
the type is represented in memory (how many bits the type needs
and what kind of register should hold the value), whether the GC will
be able to successfully inspect values of the type (is the type gc_friendly
?),
and whether the GC can ignore values of the type (is the type gc_ignorable
?).
Here are the possible representations:
value_rep
word_rep
void_rep
bits8_rep
bits16_rep
bits32_rep
bits64_rep
float_rep
rep1 * rep2
rep1 + rep2
We now define several meta-functions, defined only over concrete layouts:
rep : concrete_layout -> representation
rep(value) = value_rep
rep(immediate) = value_rep
rep(bits_n) = bits_n_rep
rep(word) = word_rep
rep(float64) = float_rep
rep(lay1 * lay2) = rep(lay1) * rep(lay2)
rep(lay1 + lay2) = rep(lay1) + rep(lay2)
gc_friendly : concrete_layout -> bool
gc_friendly(value) = true
gc_friendly(immediate) = true
gc_friendly(void) = true
gc_friendly(lay1 * lay2) = gc_friendly(lay1) && gc_friendly(lay2)
gc_friendly(lay1 + lay2) = gc_friendly(lay1) && gc_friendly(lay2)
gc_friendly(_) = false
gc_ignorable : concrete_layout -> bool
gc_ignorable(value) = false
gc_ignorable(lay1 * lay2) = gc_ignorable(lay1) && gc_ignorable(lay2)
gc_ignorable(lay1 + lay2) = gc_ignorable(lay1) && gc_ignorable(lay2)
gc_ignorable(_) = true
These meta-functions respect subtyping among concrete layouts, as stated in the following properties:
- If
l1 <= l2
, thenrep(l1) = rep(l2)
. - If
l1 <= l2
, thengc_friendly(l2) ⇒ gc_friendly(l1)
(where⇒
denotes logical implication). - If
l1 <= l2
, thengc_ignorable(l2) ⇒ gc_ignorable(l1)
.
In addition, these definitions allow us to derive the mixed-block restriction. It would definitely be problematic to have a layout that is both not gc-friendly and not gc-ignorable: the GC can't ignore it, but also can't look at it. No primitive layouts have this undesirable property. But mixed blocks do! And so it had better be the case that we never build them.
Every concrete type has a most accurate (i.e. lowest) layout. I think it is reasonable in this system to consider this as its "true" layout. A statement of the form:
t : l
Does not say that type t
has l
as its true layout, merely
that l
is an upper bound on the true layout of t
. In
particular, a unification variable with kind l
:
'a : l
stands for some type that we haven't worked out yet, whose true
layout has l
as an upper bound.
That's why, if we have:
'a : la
'b : lb
'a = 'b
then, even though the true layout of 'a
must be equal to the
true layout of 'b
, it is not the case that la
must be equal
to lb
. What is true is that the true layout of 'a
and 'b
must
be a sublayout of both la
and lb
. That means it must be a
sublayout of the greatest lower bound of la
and lb
--
written la ⊓ lb
. Thus a most general solution for these
constraints is to have:
'c : la ⊓ lb
and substitute 'c
for both 'a
and 'b
.
(Equivalently, we can substitute 'a
for 'b
and replace 'a
s
layout with lb ⊓ la
, which is how we actually implement
it. But it is simpler to think of the layout of a
variable as immutable, and think of variable-variable
unification as using a third variable.)
lb ⊓ la
is commutative and defined in the following cases (omitting symmetric
cases):
- If
lb
isany
, thenlb ⊓ la = la
. - If
lb
isgc_friendly
andla <= gc_friendly
, thenlb ⊓ la = la
. - If
lb
isgc_ignorable
andla <= gc_ignorable
, thenlb ⊓ la = la
. - If
lb
isgc_friendly
andla
isgc_ignorable
, thenlb ⊓ la = void
. - Otherwise, we require
rep(lb) = rep(la)
. Assumingrep(lb) = rep(la)
:- If
lb = la
, thenlb ⊓ la = la
. - If
lb = value
, thenlb ⊓ la = la
.
- If
(Correctness condition: for all layouts lb
and la
, lb ⊓ la <= lb
, lb ⊓ la <= la
, and for all layouts lc
such that lc <= lb
and lc <= la
, lc <= lb ⊓ la
.)
(Note that if immediate <= word
, gc_friendly ⊓ gc_ignorable
would not
exist and this algorithm would need to change. Furthermore, the definition would
become incomplete, because value ⊓ word
would exist -- it would be
immediate
-- and yet the definition above would not compute this.)
Because of the rep(lb) = rep(la)
restriction, we must track this as a new form
of constraint, arising whenever we must compute lb ⊓ la
.
Similarly, if we have:
'a : la
'a = t
We can solve that by substituting t
for 'a
, but this generates
an additional constraint that:
t : la
This is equivalent to:
l <= la
where l
is the best (lowest) layout we have available for t
.
The key to handling these additional forms of constraints is the observation that: When creating a unification variable for a type we are inferring, there is no requirement that we also infer its true layout. We are already inferring the type, and the type determines what the true layout is anyway. Instead we only need to provide an appropriate upper bound on that layout.
This means that we do not need unification variables that range
over indeterminate layouts. Instead
we can just assign fresh type unification variables the upper bound
any
:
'a : any
Note that, given such an 'a
, we can solve any constraint of the
form
'a = t
by substituting t
for 'a
because
t : any
holds for all t
.
Similarly, we do not actually need unification variables that range over concrete layouts either. Instead we can use unification variables that range over representations. When we know that a type needs to have a concrete layout it is sufficient to create a type unification variable the following upper bound:
'a : upper_bound('r)
where 'r is a representation variable and upper_bound(r)
is a function that
maps a representation to the layout that is the upper bound of all the layouts
with that representation:
upper_bound(value_rep) = value
upper_bound(word_rep) = word
upper_bound(bits_n) = bits_n
upper_bound(float_rep) = float64
upper_bound(rep1 * rep2) = upper_bound(rep1) * upper_bound(rep2)
upper_bound(rep1 + rep2) = upper_bound(rep1) + upper_bound(rep2)
(Correctness property: for all concrete layouts l
, l <= upper_bound(rep(l))
.)
This restricts the type to have a concrete layout, whilst still allow its representation to be discovered by inference.
With all this in place, we can now trivially solve all those additional constraints.
rep(upper_bound(ra)) = rep(upper_bound(rb))
is solved by just:
ra = rb
and
l <= upper_bound(r)
is solved by:
l ≠ any /\ rep(l) = r
meaning that the only layout related non-trivial constraints in the system are equality constraints on representation variables, which can be handled by ordinary undirected unification.
When we invent a new unification variable with an unknown layout,
how should we discover its layout? We must be careful, because some
unification variables will be required to have known, concrete layouts;
while others will be allowed to have any
. Let's look at
some examples:
let f x = x
let rec g : unit -> 'a -> 'a = fun () -> g ()
In f
, we guess the type for x
to be some type 'a
. We also
must create a representation unification variable 'r
, where
upper_bound('r)
is the layout of 'a
. The codomain of the upper_bound
meta-function
is only the concrete layouts, so we are guaranteed to infer a concrete
layout.
In g
, we must also infer the layout for 'a
. However, here,
because there is no place whether the layout is restricted to be
concrete, we just say 'a : any
. If something were to constrain
the layout, we would end up unifying 'a
with a type with a more
restrictive layout.
Let's now return to f
and g
and see how this all plays out:
f
:
-
We spot that
f
is a function of one variable. We thus say thatf : 'a -> 'b
for fresh variables'a
and'b
. We assign layoutupper_bound('ra)
to'a
andany
to'b
, for a fresh unification variable'ra
. Note that, because'b
is the return type, we do not require a concrete layout for it. -
We see that the return value is
x
. This forces us to unify'b
with'a
. We now have this unification problem:('b : any) ~=~ ('a : upper_bound('ra))
As described above, we solve this by inventing
'c : any ⊓ upper_bound('ra)
and filling both 'b
and
'a
with 'c
. The layout here quickly simplifies to upper_bound('ra)
.
- At this point there is no more information about
f
we can use. We see it has type'a -> 'a
, where'a : upper_bound('ra)
. Because'a
is not free in the environment (or, equivalently, it has a level higher than the surrounding context) we can generalize it. But we still have a problem with'ra
and'ga
; our type system does not support layout polymorphism, so we cannot generalize these variables. Instead, we make them into weakly polymorphic variables. When we finish processing the top-level definition containingf
(which might just bef
itself), we will default'ra := word
and'ga := must_gc
if we have not already unified them.
The final type for f
is thus ('a : value). 'a -> 'a
, just as it
always has been.
g
:
-
We have a type signature for
g
, but it mentions a unification variable'a
. We choose'a : any
. -
Processing the body of
g
doesn't yield anything interesting, and so we assigng
the type('a : any). unit -> 'a -> 'a
, which is strictly more general than the typeg
is assigned today.
Thanks to how types are constructed in OCaml, we need no separate well-formedness check. Suppose we're given the following type:
type ('a : immediate) t = ...
and the user attempts to write string t
. The syntax string t
is
converted to a type expression by first constructing a template with a
fresh unification variable 'b t
, and then unifying 'b
with
string
. Here, since 'b
ranges over layout immediate
, the
unification with string
will fail.
One alternative idea we considered was to make #
a (partial)
operator on the type algebra, computing an unboxed version of
a type, given its declaration. This ran aground fairly quickly,
though.
One problem is that OCaml doesn't have the ability to extract
a type argument from a composite type (like to get the int
from int list
). So viewing #
as the inverse of box
doesn't
quite work. (Richard thought for a little bit that
type 'a unlist = 'b constraint 'a = 'b list
did this unwrapping,
but it doesn't: it just requires that the argument to unlist
be
statically a list
type; this is not powerful enough for our needs,
because it would reject 'a. ... # 'a ...
, which would be part
of this more general operator's purview.)
Another problem is that #
would have to have an elaborate type:
it would need to be something like # : Pi (t : value) -> t layout_of_unboxed_version_of
,
necessitating the definition of layout_of_unboxed_version_of
.
This is a dependent type. Dependent types are cool and all, but
unboxed types seem like a poor motivation for them.
Today, OCaml users expect a line like let x = y
to be quick, copying a word of
data and requiring ~1 instruction. However, if y
is a wide unboxed record,
this assignment might take many individual copy operations and be unexpectedly
slow. The same problem exists when an unboxed record is passed as a function
argument. Perhaps worse, this kind of copy can happen invisibly, as part of
closure capture. Example:
type t = { a : int; b : int; c : int; d : int; e : int }
let f (t : #t) =
let g n = frob (n+1) t in
...
Building the closure for g
requires copying the five fields of t
.
We could imagine a restriction in the language stopping such copies from happening, or requiring extra syntax to signal them. However, we hope that there will not be too many wide unboxed records, and copying a few fields really isn't so slow. So, on balance, we do not think this is necessary.
We can imagine an extension where
the layout of (p, q) r
could in general depend on
the layouts of p
and q
. For instance:
type ('a : value) id = 'a
Since 'a : value
, we know that 'a id
is always of layout value
,
but if 'a = int
it may also have the more specific layout
immediate
.
For each type constructor (_, _) r
in the environment, we store
along with it a layout K
which is an upper bound for the layout of
any instance of r
. (For 'a id
above, this is value
). When
checking (p, q) r : L
, we compare L
and K
. If K
is a subtype
of L
, then the check passes. However, if the check fails and r
is
an alias, we must expand r
and try again. We only report a
unification failure once the check fails on a type which is not an
alias.
For example, suppose we have the unification variables 'a : value
and 'b : immediate
, and we're trying to unify 'a id = 'b
. We need
to check 'a id : immediate
, but the environment tells us that the
layout of _ id
is value
. So, we expand 'a id
to 'a
and retry
unification, which succeeds after unifying 'a
and 'b
(the
resulting type variable has layout immediate
). The effect here is
that after unifying 'a id
with 'b
, we leave 'a
as an
uninstantiated unification variable, but its layout is restricted to
immediate
, to match 'b
.
The current design describes that we require users to tell us the layout of rigid variables but that we will infer the layout of flexible variables. However, it is possible to do better, inferring the layouts even of rigid variables. For example:
let f1 : ('a : float). 'a -> 'a = fun x -> x
let f2 : 'a. 'a -> 'a = fun y -> f1 y (* 'a inferred to be a float *)
module type T = sig
type ('a : float) t1
type t2
type t3 = t2 t1 (* this forces t2 : float *)
end
This can be done simply by using unification variables for the layouts
of even rigid type variables; if we don't learn anything interesting
about the layout within the top-level definition, we default to
value
, as elsewhere.
It would be nice to add this rule to the box
magic:
- If
ty : value
, thenty box = ty
. There is no reason to box something already boxed. Note that this equality depends only on the layout ofty
, not thety
itself. For a given layout of the argument,box
is parametric: it treats two types of the same layout identically.
This is hard, because at the time we see ty box
, it might be the case
that we haven't yet figured out the layout for ty
, meaning we'd have
to continue inference, perhaps figure out the layout, and then return
to ty box
and perhaps simplify it to ty
. This is possible (GHC
does it), but it would be breaking new ground for OCaml, and so should
be strongly motivated. So we hold off for now, noting that adding this
feature later will be backward compatible.
The main proposal describes that a declaration like
type t = { x : int; y : float }
would really be shorthand for
type #t = #{ x : int; y : float }
type t = #t box
and that the expression e.g. { x = 5; y = 3.14 }
is shorthand
for box #{ x = 5; y = 3.14 }
, along with similar shorthand for
record projection and variants.
However, we could have a flag, say -strict-boxes
, that removes
the auto-boxing and auto-unboxing magic. This would make record-construction
syntax like { x = 5; y = 3.14 }
an error, requiring a manual call
to box
to package up an unboxed record. The advantage to this
is that programmers who want to be aware of allocations get more
compiler support. A downside (other than the verbosity) is that -strict-boxes
would still not prevent all allocation: any feature that doesn't go
via box
might still allocate, including closures.
Semantically, immediate <= word
makes good sense. To see why,
we start by observing we can collapse the value_rep
and word_rep
representations: after all, both value
s and word
s take up the
same space in memory. (Their relationship to the garbage collector
is different, but we track that separately, not in the representation.)
We can
see this in the following table:
| representation | gc_friendly | gc_ignorable
----------+----------------+-------------+-------------
immediate | word_rep | true | true
value | word_rep | true | false
word | word_rep | false | true
Just as immediate <= value
works, immediate <= word
works:
immediate
is really the combination of value
(with its
gc-friendliness) and word
(with its gc-ignorableness).
However, we hold off on adding immediate <= word
to our type
system for now, because it adds complication without anyone
asking for it.
Here are some complications introduced:
-
It is very convenient to reduce computing
la ⊓ lb
to a constraintrep(la) = rep(lb)
. In order to supportvalue ⊓ word = immediate
, we would thus drop thevalue_rep
representation, settingrep(value) = word_rep
. -
The
upper_bound
function we use in type inference then becomes impossible to define: neithervalue
norword
is an upper bound for the other, and soupper_bound(word_rep)
has no definition. We thus have to design a different type inference algorithm than the one described above, likely tracking representation, gc-friendliness, and gc-ignorability separately. This would not be all that hard, but it's definitely more complicated than the algorithm proposed here. -
We sometimes might infer
immediate
even when no function nearby has anything of layoutimmediate
:val f1 : ('a : value). 'a -> ... val f2 : ('a : word). 'a -> ... let f3 x = (f1 x, f2 x)
Here, we need to infer that the
x
inf3
has layoutimmediate
, even though neitherf1
norf2
mentionsimmediate
. This might be unexpected by users.
This proposal forbids mutability within an unboxed record (unless that record is itself contained within a boxed record). This decision is made to reduce the possibility of confusion and bugs, not because mutability within unboxed records is unsound. Here, we consider some reasons that push against the decision above and include some examples illustrating the challenges.
Reasons we might want mutability within unboxed types:
-
The designer of a type may choose to have some fields be mutable while other fields would not be. For example, our
#mut_point
type allows the location of the point to change, while the identity of the point must remain the same. -
OCaml programmers expect to be able to update their mutable fields, and it may cause confusion if a programmer cannot, say, update the
x
field of a#mut_point
.
The reason we decided not to allow mutability within unboxed types is because of examples like this:
type mut_point = { mutable x : #int32; mutable y : #int32; identity : int }
let () =
let p : #mut_point = #{ x = #4l; y = #5l; identity = 0 } in
let foo () = print_uint32 p.x in
foo ();
p.x <- #10l;
foo ()
Because the closure for foo
captures that unboxed record p
(by copying p
),
this code prints 4 twice, rather confusingly. Yet the current design does not
fully rule such a confusion out, as we see here:
type point = { x : int; y : int }
type t = { mutable pt : #point }
let () =
let t : t = { pt = #{ x = 5; y = 6 } } in
let pt = t.pt in
let foo () = print_int pt.x in
foo ();
t.(.pt.x) <- 7;
foo ()
This will print 5 twice, because the unboxed t.pt
is copied into the local
variable pt
. Yet the boxed version of this (with a mutable x
and no hash
marks) prints 5 and then 7. This is a key motivation for putting the parentheses
in the t.(.pt.x)
syntax: it forces the programmer to think about the fact that
they are mutating pt.x
, not just x
. Accordingly, they should expect pt
not
to be changed (even if pt
were an alias -- which it's not). Without the
parentheses there, a user might think they're mutating x
and be surprised.
Here's yet another example:
type t2 = { x : #int32; y : #int32 }
type t1 = { mutable t2 : #t2; other : int }
fun (t1 : t1) ->
let other_t1 : t1 = {t1 with other = 5} in
t1.(.t2.x) <- #10l;
other_t1.t2.x = t1.t2.x
Assuming t1.t2.x
doesn't start as #10l
, this will return false
. But this
should not come as a surprise: t1.t2
is copied as part of the functional
record update, and thus mutating t1.t2
(as part of t1.(.t2.x) <- #10l
) changes
it. If, on the other hand, we did not have the parentheses, users might expect
other_t1
's x
to be changed when t1
's x
gets changed.
We wondered at one point whether we could support int8 <= int64
in
the subtyping relation. After all, a function expecting an int64
argument can indeed deal with an int8
one. The problem would be
collections. That is, if we have an #int8 array
(for a magical
array
type that can deal with multiple layouts, like box
), how
would we index into it? Is it a #int8
-as-64-bits or a plain #int8
in there? The way to tell is to have a second argument to array
that denotes the layout. Richard thinks this could work out in the
end, but probably would still be a bad idea.
The mixed-block restriction tells us that there are really two forms of box
:
one for gc-friendly boxes and one for gc-ignorable boxes. We briefly thought
about having two different primitives box_gc
and box_non_gc
(and no box
function), where users would choose the one they wanted. The type system would
ensure that the layout of the argument is appropriate for the type of box
created.
This plan would be good because it would simplify inference: currently, every
use of box
imposes a requirement that the layout of the argument is either
gc-friendly or gc-ignorable. But "or" requirements are annoying: we can't
simplify or usefully propagate the requirement until we know which branch of the
"or" we wish to take. If the user specified which kind of box they want, then
we don't have the "or" any more.
However, this plan doesn't work very well, because of auto-boxing. That is, if we have something like
let x = ... in
let y = ... in
x, y
what kind of box should we use when constructing the tuple? It depends on the
type of x
and y
-- or rather on the layouts of those types. At the point
where our boxing magic inserts the call to box
, we haven't yet completed type
inference, and so we can't know which kind of box
to insert. In the end, we've
just re-created the challenge with "or" described above. So there seems to be
little incentive to have the two different boxes explicitly anywhere.