Unboxing of data types using high-bit pointer tagging #149
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements an approach to unboxing that makes use of the 16 most-significant bits of 64-bit pointers for tagging constructor values. The approach allows for two levels of unboxing where the outer level uses the high bits of pointers and the inner level uses the lower bits of pointers. For instance, the following datatype declaration is implemented with unboxed lists and unbox trees:
Here it is recognised that the arguments to
t
's unary constructors are either boxed or use only lower bits for tagging (list t
). The higher bits are thus available for discriminating between the three constructor values.Notice that the scheme is still based on a uniform representation of values (64-bit words).
GC note: For the garbage collector to work, it is important that nullary constructors are represented with the least significant bit set, to distinguish them from unary constructors.
There are a series of good examples of where this optimisation turns out to be effective:
Patricia trees
URefs, Directed graphs
IR Grammars, including DecGrammar and LambdaExp (and associated types).
Effect nodes
The implemented algorithm for determining the boxity (i.e.,
UNB_LOW
,UNB_ALL
,BOXED
,ENUM
,SINGLE
) of a datatype attempts to find a boxity fixpoint involving all simultaneously declared datatypes. The algorithm starts with an optimistic guess for each datatype and checks if the guess is valid according to a series of rules:UNB_ALL
datatype can take a value as arguments that uses all 64 bits.UNB_LOW
datatype can have more than one unary constructor.UNB_LOW
datatypes must take boxed arguments.If a rules is violated, an attempt is made with less optimistic assumptions.
Future work will split mutually recursive datatype declarations into strongly-connected components. Currently, declarations of the form
results in
e
(andt
) using a boxed representation.