Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-extensible records and variants #180

Merged
merged 6 commits into from Jul 31, 2020

Conversation

danieldjohnson
Copy link
Collaborator

@danieldjohnson danieldjohnson commented Jul 30, 2020

This PR adds support for non-extensible records and variants, with optionally repeated labels.

Syntax-wise, records look like

x : {a: Int & b: Real} = {a = 3, b = 4.0}

and variants look like

x : {a: Int | b: Real} = {| a = 3 |}

I used & and | for parity with tuples and Either. Empty records are denoted as x : {&} = {}. Technically there's an empty variant type {|} but it is uninhabited.

One interesting property of the implementation is that you are allowed to repeat labels. This was originally designed to support easy extensible records (see Daan Leijen's "Extensible records with scoped labels") but I don't think it causes any harm to include them in the non-extensible case. Perhaps this will make it easier to add extensibility in the future.

Records can be unpacked using pattern-matching let bindings, and variants can be unpacked using a case statement. For now, variants need to be explicitly annotated with their type unless it is sufficiently obvious (to the typechecker) from the preceding code what the type is. You can freely put either records or variants into tables, and hopefully can combine them in arbitrary ways.

I've implemented some preliminary logic for treating records and variants as index sets, since records have a nice interpretation as named axes, and variants as named slices/concatenations. However, so far I haven't figured out how to plumb everything through the low level internals. I got far enough that you can create tables indexed by variants, but once you have one of those tables, you can't print it or get anything out of it without getting a bunch of errors. Hopefully this is easy to fix for someone with a better mental model of how Imp and Interpreter work.

Some things I wasn't sure about:

  • At some points I needed a binder, but I didn't have one, so I used Ignore ty even though I wasn't necessarily going to ignore it. There's probably a better way of doing that, though.
  • I sort of hacked together the Imp translation rules, so it's very possible that I missed something.
  • There were some shenanigans I had to do to get the index set logic working, in particular generating case statements and casting things to PreludeBoolTy to enable pattern matching on them. I feel like there's probably something simpler than that? Although I suppose it's not too terrible.

Eventually, we may want to make the records and variants extensible, so that you could write something like {a=x, b=y, ...c} to add fields a and b to a record c. This could enable some cool polymorphic helper functions or other interesting use cases, and would also make it so variants don't have to be annotated with a type (since we could infer some extensible type and then let unification figure out what the real type was).

Adds syntax, parsing, pretty printing, type inference, typechecking,
and some very simple embed/simplify steps for records and enums. For
now, only supports non-extensible records, which must have known types.
Eventually we can hopefully expand this to allow extensible records as
well. I've left in some helper functions for dealing with extensible
record syntax in the parser and pretty-printer.

Notably, we support repeated labels, inspired by the approach described
by Daan Leijen in "Extensible records with scoped labels". Repeated
labels may have different types, and are disambiguated by position.

Mirroring tuple/Either syntax, the record constructor uses `,`, the
record type constructor uses `&`, and the variant type constructor uses
`|`. I've modified the parser a bit to make sure this is still
unambiguous by requiring the non-variant versions of `&` and `|` to
apppear inside parentheses.

Note that type inference for variants requires the type to be known.
In the future perhaps we can use extensible variant machinery to relax
this restriction. For now, users can always use an annotated let-binding
to give it a type.
This basically follows the existing implementation for ADTs, but allows
records to be unpacked as well. It is never possible to unpack a
variant, even if there's only one alternative in it, because that seems
like a lot of effort for no particular gain.
Since we are now matching both DataCons and Variants, we can't use the
name of the con to determine the order. Instead, we order alternatives
based on either the index of the con, or the index of the type in the
label map for the variant.

Since you can't tell the type of a variant from its syntax alone,
there's a bit of extra zonking to make sure that we know the type of
the expression to unpack before we check the patterns we are unpacking.

There are a few times where I compute indices into variant types by
enumerating their contents. This will probably be fairly negligible
performance-wise but we could try computing that in a more efficient
way if we need to for some reason.
Records are preserved through the lowering. Variants are converted using
SumAsProd, similar to other sum types. This makes it possible to use
records and variants in tables.
This makes it possible to directly annotate variant literals with
their types, which makes it possible to put them inside other
expressions.
Adds support for records and variants as index sets. Uses a fairly
arbitrary ordering of the labels: alphabetical order, then label
repeats.

I was able to get all of the fromOrdinal/toOrdinal/size functions
working, and it's possible to construct tables with records or variants
as the index set. Strangely, problems occur when trying to print those
tables or index into them, so it's possible I've made mistakes in how
I implemented the lowering, or that I'm hitting some unimplemented
corner cases.
Copy link
Collaborator

@apaszke apaszke left a comment

This is awesome, thanks for the PR!

I went through all the files except for the parser, pretty printer and inference since I'm not super up to date with how they work. The rest looks great, but I left some suggestions inline.

This might be a silly idea, but would it be possible to make unnamed tuples and sum types be represented as record and variant types internally? We could have a special field name to denote that.

RecordTy types -> do
-- TODO: is using Ignore here appropriate? We don't have any existing
-- binders to bind, but we still plan to use the results.
let bs = toNest $ map Ignore $ toList types
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine I think. The names are not used below for anything else than a name hint. The types are what's important.

@@ -256,6 +264,10 @@ getSnd :: MonadEmbed m => Atom -> m Atom
getSnd (PairVal _ y) = return y
getSnd p = emitOp $ Snd p

getUnpacked :: MonadEmbed m => Atom -> m [Atom]
getUnpacked (Record items) = return $ toList items
getUnpacked a = emitUnpack (Atom a)
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well single-case ADTs can also be unpacked, so this might end up being a little confusing.

Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it seems like there's only a single use-site, so it's better to inline this.

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fallback case makes it so that this will still work for ADTs, but it just doesn't have the opportunistic optimization (which would be easy to add). Although I'm happy to just inline this. Wasn't sure what the bar was for adding helper functions to Embed (maybe it's two or three use-sites?).

@@ -715,6 +741,25 @@ indexToIntE ty idx = case ty of
imul rSize lIdx >>= iadd rIdx
TC (IntRange _ _) -> indexAsInt idx
TC (IndexRange _ _ _) -> indexAsInt idx
RecordTy types -> do
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is there any reason to keep the record type as an n-ary constructor instead of a binary one, like we do for tuples?

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By binary constructor do you mean something like RecordTy Label Type Type, and expressing records as something like RecordTy "foo" x (RecordTy "bar" y (Unit))? The main reasons I went with this over that are:

  • I intended the types {a:Int & b:Real} and {b:Real & a:Int} to be the same. If we used a binary constructor, we would have to worry about the labels not being in the right order, and possibly need to do list sorting to check that things match. Using a single n-ary constructor lets you easily enforce a single order internally while allowing arbitrary orders in the parsing and typechecking logic.
  • The "rest" argument in a binary constructor needs to be another RecordTy instance, but I don't think the type system can support that. For polymorphic extensible records we probably need to introduce a new kind for a row of types, but that didn't seem worth the effort in the short term. (Tuples don't have this problem because (a, b) is a valid type no matter what b is, but {foo:a, ...b} only makes sense if b is a record type.)

RecordTy types -> do
sizes <- traverse indexSetSizeE types
(strides, _) <- traverseFunM
(\sz prev -> do {v <- imul sz prev; return (prev, v)}) sizes (IntVal 1)
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm one interesting question to ask would be whether we want to support dependent record types, and if the answer is yes, then this code will need to be re-worked a little.

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependent in the sense that one record field's type depends on the value from another field? I wonder if that would be useful. I'm not sure how you'd even write the type for that, since at least right now the labels of the record don't actually bind the label names to anything (although they kind of look like they do, I guess, which maybe is confusing), and there's no syntactical ordering imposed on the labels (so we can't say that earlier things are "in scope" for later things, because that depends on how the user wrote the type down).

My gut feeling is that if you want dependent things you should define an ADT, at least for now.

Copy link
Collaborator

@dougalm dougalm Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should only do dependent constructors via ADTs for now. For a general dependent pair, you need some sort of annotation to indicate what the type of the second element is (e.g. (2, [1,2]) could have type (Int & (Fin 2 => Int)) or (n:Int & (Fin n => Int)). Extensible records are tricky enough as they are!

asState f = do
traverseFunM :: (Monad m, Traversable t)
=> (a -> s -> m (b, s)) -> t a -> s -> m (t b, s)
traverseFunM f xs s = runStateT (traverse (asStateT . f) xs) s
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't scanM be a better name?

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me. Should I change traverseFun to scan too?

index <- intToIndexE ty shifted
return $ Variant types label repeatNum index
beforeCase <- buildNAbs Empty $ \[] -> return $ prev
emit $ Case beforeThis [hereCase, beforeCase] rty
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good candidate for a standalone function like

boolCase :: MonadEmbed m => Atom -> m Atom -> m Atom -> m Atom
boolCase intCond trueBlock falseBlock = ...

Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, IMO the case to PreludeBoolTy is ok!

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you still need to pass in the return type, right? Since that gets passed to Case. (Or maybe there's a way to easily infer that from the trueBlock and falseBlock? I guess we could wrap the case in a Lam and then check its type, but that seems a bit annoying...)

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, actually, maybe I can just getType body for the body of the Abs?

Copy link
Collaborator

@dougalm dougalm Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you can just query the type. (getType doesn't need to know about the variables in scope since their types are cached at the occurrences.)

return $ Variant types label repeatNum index
beforeCase <- buildNAbs Empty $ \[] -> return $ prev
emit $ Case beforeThis [hereCase, beforeCase] rty
((l0, 0), ty0, _):zs = zip3 (toList reflect) (toList types) (toList offsets)
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ignore the first offset anyway, so why is it zipped?

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ignore the first offset because I know it's exactly 0. I don't ignore the rest of the offsets (see go), and I still wanted them to be aligned correctly in zs.

beforeCase <- buildNAbs Empty $ \[] -> return $ prev
emit $ Case beforeThis [hereCase, beforeCase] rty
((l0, 0), ty0, _):zs = zip3 (toList reflect) (toList types) (toList offsets)
start <- Variant types l0 0 <$> intToIndexE ty0 i
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just intToIndexE rty 0? I understand if you want to make it more efficient, but it might deserve a comment

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't follow? This is in the definition of intToIndexE rty i so I don't think I want to recursively call intToIndexE rty here. Also the offset isn't necessarily 0. This line is essentially saying "if the first variant is the correct variant to use, what would the index atom be", and then the foldM overrides this if it wasn't actually the first variant.

Maybe I'm not understanding the suggestion?

((l0, 0), ty0, _):zs = zip3 (toList reflect) (toList types) (toList offsets)
start <- Variant types l0 0 <$> intToIndexE ty0 i
choice <- foldM go start zs
return choice
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to have the trailing return

VariantTy types -> do
tag <- rec IntTy
contents <- forM (toList types) rec
return $ Con $ SumAsProd ty tag $ map (\x->[x]) contents
Copy link
Collaborator

@apaszke apaszke Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect to me, because it doesn't preserve the type!

Copy link
Collaborator Author

@danieldjohnson danieldjohnson Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It saves the type into the SumAsProd constructor, which I think preserves the type for any downstream usage? At least the typechecker doesn't complain about it.

Copy link
Collaborator

@dougalm dougalm Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SumAsProd was intended for ADTs (and if it passes the type checker then that was probably an oversight on my part). But this seems like a fine use of it too. Btw, we're going to remove it soon in favor of a Minicase Var [AtomAlt] atom, as mentioned in #179.

@danieldjohnson
Copy link
Collaborator Author

@danieldjohnson danieldjohnson commented Jul 30, 2020

I think it would definitely be possible to represent tuples as records internally (and @dougalm and I actually discussed this briefly at one point). Although that would probably change the semantics of the user-level language too. Right now ((a, b), c) and (a, b, c) are the exact same thing, but if they were records internally then those would be different types {v: {v:a, v:b}, v:c} and {v:a, v:b, v:c}. I guess this would make it closer to Haskell where (,,) is different than (,).

Similarly, if we represented anonymous sum types with variants, this would change how you match on them, since there's no such thing as Left or Right for a three-element variant. So the user-level pattern-matching syntax might have to change. (Although I guess we already sort of have two notions of sum types, the one defined as an ADT and the internal one.)

I guess a halfway thing that would be pretty easy would be to still require the tuple and sum constructors to be binary, so that a & b becomes {left:a & right:b} and a | b becomes {left:a | right:b}? But then I'm not sure we gain a lot.

@dougalm
Copy link
Collaborator

@dougalm dougalm commented Jul 30, 2020

Although I guess we already sort of have two notions of sum types, the one defined as an ADT and the internal one.

Yes, the plan is to remove the built-in Either. What's blocking it is that we don't have index set instances for ADTs (either built-in or user-definable).

Although that would probably change the semantics of the user-level language too. Right now ((a, b), c) and (a, b, c) are the exact same thing, but if they were records internally then those would be different types

That sounds fine to me. Actually, I think it's better that way. Treating tuples as nested pairs was a cheap trick I was hoping nobody would complain about :)

@dougalm
Copy link
Collaborator

@dougalm dougalm commented Jul 31, 2020

Look great!

As we discussed, the variants are very neat, but they might be too much complexity, since we can always define sums with ADTs. We can make that call once we have more experience using them.

I'm looking forward to the extensible version. That will let us pull out a single field by name rather than pattern-matching on all the fields, right? Unfortunately, we already use the conventional . for table indexing. Maybe myRecord#myField? More radically, we could just use whitespace for table indexing (since it's actually just function application underneath) and then . would be free for record indexing.

@dougalm dougalm merged commit 46c2aa9 into google-research:main Jul 31, 2020
2 checks passed
@dougalm
Copy link
Collaborator

@dougalm dougalm commented Jul 31, 2020

It just occurred to me that an immediate application of these records would be to make the typeclass dictionaries a bit more readable:

data Add a:Type =
  MkAdd { add : (a->a->a) 
        & sub : (a->a->a) 
        & zero : a }

We could also add conventional class ... where ... and instance ... where ... syntax rather than this awkward explicit dictionary-construction business.

dougalm added a commit that referenced this issue Jul 31, 2020
We still don't have index set instances for ADTs, but we can use variants added
in #180 for sum types as index sets.

Also add unification rules for records and variants.
dougalm added a commit that referenced this issue Jul 31, 2020
We still don't have index set instances for ADTs, but we can use variants added
in #180 for sum types as index sets.

Also add unification rules for records and variants.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants