Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unboxing of data types using high-bit pointer tagging #149

Merged
merged 5 commits into from
Nov 16, 2023

Conversation

melsman
Copy link
Owner

@melsman melsman commented Nov 15, 2023

This PR implements an approach to unboxing that makes use of the 16 most-significant bits of 64-bit pointers for tagging constructor values. The approach allows for two levels of unboxing where the outer level uses the high bits of pointers and the inner level uses the lower bits of pointers. For instance, the following datatype declaration is implemented with unboxed lists and unbox trees:

    datatype t = Leaf of string | Empty | Children of t list

Here it is recognised that the arguments to t's unary constructors are either boxed or use only lower bits for tagging (list t). The higher bits are thus available for discriminating between the three constructor values.

Notice that the scheme is still based on a uniform representation of values (64-bit words).

GC note: For the garbage collector to work, it is important that nullary constructors are represented with the least significant bit set, to distinguish them from unary constructors.

There are a series of good examples of where this optimisation turns out to be effective:

The implemented algorithm for determining the boxity (i.e., UNB_LOW, UNB_ALL, BOXED, ENUM, SINGLE) of a datatype attempts to find a boxity fixpoint involving all simultaneously declared datatypes. The algorithm starts with an optimistic guess for each datatype and checks if the guess is valid according to a series of rules:

  1. No unary constructor of an UNB_ALL datatype can take a value as arguments that uses all 64 bits.
  2. No UNB_LOW datatype can have more than one unary constructor.
  3. Unary constructors of UNB_LOW datatypes must take boxed arguments.

If a rules is violated, an attempt is made with less optimistic assumptions.

Future work will split mutually recursive datatype declarations into strongly-connected components. Currently, declarations of the form

datatype t = A of int | B
and e = C | D

results in e (and t) using a boxed representation.

@melsman melsman self-assigned this Nov 15, 2023
@melsman melsman mentioned this pull request Nov 16, 2023
4 tasks
@melsman melsman merged commit adf941b into master Nov 16, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant