-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline records for constructor arguments #5528
Comments
Comment author: @alainfrisch Some motivation. I think that most of the time, using "records" instead of "tuples" for constructor arguments is better (and so it's a good idea to make it as easy as possible to do it in the language):
|
Comment author: @alainfrisch (Note: Camlp4 has not been updated; it does not compile. OCamldoc compiles but fails on the new feature.) |
Comment author: @gasche You surely know that Caml Light had mutable sum constructors. Your mutation syntax reminds me of it -- but better behaved, as the syntactic lvalue is still a field projection rather an arbitrary variable. http://caml.inria.fr/pub/docs/manual-caml-light/node4.6.html This is also reminding of SML's anonymous record types (see eg. http://adam.chlipala.net/mlcomp/ ); those are not unboxed with the constructor (but MLton can optimize it) but allow field disambiguation -- in sometimes treacherous ways. This extension prompts a question (say, for a teacher): do we have a duplication of concepts here (two slightly different ways to define records, just as tuples and the * used in sum constructors are slightly different), or is there a reasonable explanation of the first-class record concept in terms of this feature? I mean, maybe it's possible to say once and for all that the general data type is a labelled sum of labelled products, and explain current records as sums with only one case, where the constructor is left implicit. Could something like that reliably explain the semantics? The restrictions on the "special" variables seem a bit icky. From a language point of view, I'd rather have general unboxed datatypes, with a kind system guaranteeing that they can't instantiate polymorphic parameters. Haskell has those and it's nice -- not sure how that works with the GC, however, and I understand changing the type checker is probably not an option. The pragmatism of your solution makes it simpler, and OCaml already has special cases for unboxed float anyway. You included no example where two constructors of a single sum types would share some field names. Is that possible/reasonable? That looks like a potential use case -- having a second constructor that has the same fields than the first, plus some, as it is more memory-efficient than a record with an option type at the end. This construction would allow to implement Queue efficiently without magic. |
Comment author: @jberdine For what it's worth, I have several times wanted such a feature. I think that the value of flexible extension and consistent naming, in particular, would be significant for larger code bases. |
Comment author: @alainfrisch I've added my extra proposal, allowing field projection and mutation on pseudo record-argument capture variables in patterns. Example: type t = A of {x:int; mutable y:int} let get_x (A r) = r.x let bad (A r) = r (* rejected *) When used in a or-pattern, these pseudo-variable must have the same kind on both sides; the kind includes the constructor, so the following is rejected: type t = A of {x:int} | B of {x:int} |
Comment author: @alainfrisch Gabriel, thanks for your note. I believe that real records are indeed mostly a special case of this new feature Currently, two features of records are not supported: polymorphic fields, and overriding ({e with x = ...}), although I don't see any reason why they couldn't be. You have a question about two constructors in a given sum type sharing some field names. Yes, this is possible: type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string} let get = function It is even feasible to make the following work: let enrich foo = function i.e. use a record-argument capture variable as the starting point for overriding, with a different constructor (having more fields). I'm not sure if this is really desirable. |
Comment author: @garrigue I almost proposed that one (including mutable fields) almost ten years ago. Note that the use of special variables is not a big problem in my mind: this is already the case for instance variables in objects. Also an extra potential feature would be the ability to use type information to extract without pattern-matching. type t = A of {x:int} | B of {x:int;y:int} let get_x (r:t) = r.x |
Comment author: @alainfrisch
I'm not entirely convinced by this extension. How would the code above behave if some constructor in t don't have an "x" field? (Also, we'd need to be careful with type inference and principality here, but you know that better than I do!) Pattern-matching with punning is not so bad: let get_x (A{x}|B{x}) = x (yes it becomes tedious with more constructors) |
Comment author: @garrigue Actually I was not thinking of generating a pattern-matching, but rather to restrict this feature to types for which it can be implemented trivially, as a direct field access without any test. I.e. x would have to be present, with the same type, and at the same position in all cases of the type. This of course includes the trivial case where the type has only one case. But do not take this suggestion as meaning that I support variants of records unconditionally. |
Comment author: @alainfrisch
I see, but I'm not sure it's worth introducing an extra feature, whose meaning is not completely obvious. It just avoids the need to define (once for each common field) functions like: let get_loc (Var{loc}|App{loc}|Lambda{loc}) = loc
A first reason is that it makes it makes it possible to define sum types with mutable fields, without space overhead. For some low-level data structures (for instance mutable binary trees), it can give some performance boost without requiring Obj.magic. But the most important reason, in my opinion, is that it encourages to name arguments of constructors instead of relying on position, and it is often a good idea. In particular, it removes the following counter-arguments to naming constructor arguments with explicit record declarations:
In the past, for instance, I've proposed to replace some n-ary constructors in OCaml sources with records. This would have simplified maintaining on LexiFi side some of our local patches (avoiding the need to add wildcard patterns at many places). This was rejected, based on the fact that it would have increased memory usage and the size of cmi files (argument 3 above). |
Comment author: @damiendoligez
This is a red flag. Are you sure your implementation is safe with pattern-matching and "when" clauses? |
Comment author: @alainfrisch
I'm not sure at all, this would need to be checked. Note that the semantics is weird anyway (and I'm not absolutely convinced it is safe) even with regular records: type t = A of int | B of string;; It's really weird that f can return "BAD" (and it does, with argument (R (ref (B""))) ). |
Comment author: lavi This is perhaps to early to speak about optimization, but for records with float only fields, it would be better to de-inline them internally, or even better to treat the first constructor with float only fields as a float array and de-inline the others. |
Comment author: Bardou Hello, I have been wishing for this for a long time. Mainly because it's just tedious to declare a record for each constructor. Too often do I start with constructors with a low argument count (say 1 to 3) and find that I have to add new arguments. At some point I have so many arguments that I really have to declare the record, and thus rewrite all the relevant parts of the code. I would add that it would be convenient for me to be able to name only some of the arguments. For instance : type t = Sum of pos: Lexing.position * t * t I would access the anonymous arguments using pattern-matching as usual, and use ".pos" as a shortcut sometimes. Unifying records and sums is great, unifying tuples at the same time seems even better to me. The OPA language (of Mlstate) does this, if I'm not mistaken. |
Comment author: bobot For a semantic, syntax, typing perspective, can't we consider that: type t = is exactly the same thing than: type __t = {msg: string; mutable foo:int} and t = Of course the "when" clause can lead to strange behaviors, but not different than the one we have with such a type t and __t. Moreover we can consider the record nearly like an actual record. The nearly mean that we create it with the tag of B instead of the default one (the first, if I remember well) used for record. So: is exactly for semantic, syntax and typing the same thing than: type t2 = {msg: string; mutable foo:int} and t = eg: match x with For compilation: B r is just the identity function, the matching just doesn't extract. You can't make the difference between the two definitions, except if you use Obj.magic to see that (Obj.magic (B r)) == (Obj.magic r), but Obj.magic is not in the semantic right? (It's in fact not the same thing for typing in regard of module subtyping, if t is made abstract, t2 must be done private. But that can be quite useful) My 2 cents, |
Comment author: bobot With #5759 (Using well-disciplined type-propagation to disambiguate label and constructor names) the last view (#5528#c7199) should be able to work with such a type (#5528#c7028): type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string} let get = function |
Comment author: @bobzhang F# 3.1 has added named tuples |
Comment author: bobot The implementation done by Alain already work well:type t = let x = A {msg = "toto"; foo = 1} let foo =
|
Comment author: @alainfrisch
Indeed, it could be interesting to consider that: ===
|
Comment author: bobot In order to simplify the implementation and reduce the impact on the compiler code, we can avoid to create a new kind of type constructors by requiring that:
The problem is for the ocamlc -i option since in that case we need to print a type for eg. the f, g function that can be parsed by ocaml. |
Comment author: @alainfrisch I'm wondering if we can reuse the current internal definition of Path.t, interpreting a lowercase component before a dot as a type (and not a module). So t.B would be represented as Pdot(Pident {name="t";..}, "B", 0) and M.t.B as Pdot(Pdot(Pident {name="M";..}, "t"), "B"). |
Comment author: @lpw25 Could we possibly separate this into two separate branches? A simple one adding constructors with named fields (no mutability, no special record values), and a more complex one with the other features under discussion. This might help get the simple (and very useful) feature merged much more quickly (i.e. before the next release). |
Comment author: @alainfrisch Before committing anything, we should have a clearer picture of where to go. For instance, now that we have type-based disambiguation of names, it seems most benefits would be achievable with an "inline" annotation within the type declaration: type t = A of foo inline | B of bar inline and foo = {x: int; z: string} ("inline" can be used only in such context: reference to a record type as argument of a constructor defined in the same group). The effect is to inform the compiler about the tag to use for the inlined records. It seems the impact on the type-system and on the language syntax is rather limited, while providing most of the benefits and no restriction compared to regular records (polymorphic fields, etc). |
Comment author: bobot I agree that we should have a clear view of all the proposal, after that we can cut in little pieces for easier integration. Whichever way we choose, if we require the user to name the type of the record the impact on the type-system is small (only type signature inclusion). The inline tag changes the syntax in a very small way, which is nice.
type t = A of foo inline | B of bar inline and foo = {x: int; z: string} All the uses must be with the same tag.
|
Comment author: @alainfrisch
I'd say: at most one "inline" reference to a given record. Other (non-inline) references are allowed, of course. I don't like the idea that multiple inline references to a given record type are allowed as long as the tags are identical: this is too fragile and expose the tag assignment scheme as a constraint in the type-checker. Moreover, I'm not convinced this is actually useful.
No, the "inline" marker assumes the target record type is defined in the same group. When the definition group is type-checked, the compiler chooses how to represents datatypes: tags for constructors, and, in this proposal, also tags for records. Any client code of the record type has access to this representation choice. This is necessary so that any code creating a record value can pick to correct tag. Note that the compiler already chooses the runtime representation of a record type when its definition is type-checked (Regular/Float records). The only difference would be that in the "Regular" case, the compiler also remembers the corresponding tag (and from which constructors it comes). And one must also keep the information in the internal constructor description, to adapt code generation.
Yes, globally, the sum type and the inlined record(s) types would form a monolithic group, which cannot be split in the signature. This would come from free from the inclusion check, which ensures that the representation of records is equivalent. This is why the following is rejected today: module X : (error message: the first declaration uses unboxed float representation) One can decide to allow or not the case where the inlined record type is made abstract in the signature.
Indeed. We could also use an attribute (from the "extension points" work). |
Comment author: bobot
I don't want to push it to much, but to be able to have an internal view with more constructor than the external view and a constant-time conversion from one to the other is very interesting. But I agree that we shouldn't complicate heavily this proposal for this possibility. So one use of the "inlined" record. Thank you for the float records example, I forgot about it. But contrary to this example you can read the record even if you don't know the number of the tag.
|
Comment author: @gasche If we have the constraint that a given record may only be used for one constructor, I prefer bobot's "as" syntax: type t = It is lighter (some people have pointed out that, even if they don't care about the memory representation, they don't use a record per constructor now because it's too painful to write), and it naturally enforces this restriction (but then people are going to ask about how to share sets of fields between those...). |
Comment author: @alainfrisch Arguments in favor of the "as" syntax:
Arguments in favor of the "inline" syntax:
|
Comment author: @hcarty Would the constraint of one record to one constructor allow for: type t = ? |
Comment author: @alainfrisch Some news from the caml-devel front:
(we could write !t.A instead of !A) |
Comment author: lavi Sorry to rise an old subject: I mean type parameters declaration. Current situation is probably enought for most cases, but still fragile. Taking an old exemple, this may look likes: type (_, _) funchain = |
Comment author: @alainfrisch There was indeed some resistance against the use of the syntactic ordering (first occurrence) of free variables in the inlined record to define the type parameters for that inlined record. The last branch (constructors_with_record4) has been adapted as follows:
The case of a GADT constructor with more than one free variable requires an explicit ordering between the variables. There are plans to allow specifying the ordering of such variables anyway (in order to provide a way to pattern match on them), and we will wait for such a feature to be available. |
Comment author: @lpw25
It might be better to just leave out GADT record constructors entirely. This is a much simpler restriction for people to understand than "their can be at most one free type variable if you use GADT syntax".
It is not really clear to me why we do not use the version without phantom type variables for constructors with regular syntax. The rule would be quite simple: constructors have the same type variables in the same order as their type, with unused type variables removed. |
Comment author: @alainfrisch
The case of GADT constructors with one free variable is quite common, and there is no ambiguity for that case, so why should we arbitrarily reject it?
This is also an option, but is it significantly more difficult to define the list of free variables ordered by their first syntactic appearance than the set of free variables (ordered by their rank in the list of parameters of the sum type) ? |
Comment author: @lpw25
Because it is more important that the reason for rejecting things is easy to understand than to accept as many things as possible. Whilst GADTs with a single argument might be common, I don't think that many of them would benefit much from using record syntax.
It is for users, especially with any kind of object or polymorphic variant involved. I'm also not convinced that "first syntactic appearance" rule is stable: do all equal (according to |
Comment author: @garrigue
The answer is clearly no: you would have to normalize the type in some way first. type ('a, 'b) bpair = 'b * 'a Not only that, but you cannot even now for sure what are the free variables of a type without normalizing it. type 'a s = string Does the constructor t have one parameter ? zero parameter ? For these reasons I would be careful about any implicit definition of the parameter list, including removing absent variables. In the 0-parameter case, GADT constructors with no parameters could be seen as a special case of parameterized constructors, but does it justify having a different behavior for apparently similar cases: type 'a t = Int of {v: int} |
Comment author: @lpw25
Yes it does seem like using the same parameter list as the type is the only way to guarantee that the list remains stable under changes of type environment. |
Comment author: @alainfrisch Thanks Jacques, this is very helpful and shows clearly that even the notion of free variables is not so clear (and context dependent). So let's drop support for inlined record on GADT constructors for now (committed to the *4 branch, as a type-checking level error; it would be easy to reject the case during parsing) and settle for the simple and stable definition for type parameters on non-GADT constructors (the same as the sum type). Any opinion on the support for extension records? This support represents a non-negligible fraction of the patch and of the conceptual complexity of the current proposal. |
Comment author: @lpw25
I thought it no longer affected the "conceptual complexity" of the proposal, as the same things work on both regular and extension constructors. Is there something I've missed? |
Comment author: @alainfrisch Consequences of supporting extension constructors:
# module X = struct type t = .. type t += A of {x:int} let x : !A = {x=0} type s = A end;; module X : sig type t = .. type t += A of { x : int; } val x : !A type s = A end # X.x;; - : !X.A = {x = 0} # (X.x : !X.A);; Error: Constructor X.A does not have an inline record argument Here, !X.A is a type path with no type component; internally, it refers to the extension constructor A in module X (there can be at most one). But for the surface language, !X.A can also be used to refer to !X.s.A. So the displayed type is incompatible with the interpretation of syntactic !-types. There are plenty of other such cases already in the type system, but this is slightly unsatisfactory. We will somehow need to explain that a type printed as !X.A necessarily refers to the argument of an extension constructor. Or we could decide to use the syntax !X.A only for such constructors, and force users to write !X.s.A to refer to the argument of a regular constructor (in which case the syntax !X.A becomes specific to extension constructors). (The same example can be done with local constructors.)
|
Comment author: @alainfrisch I forgot to mention that since extension constructor rebinding cannot be specified in signature (contrary to sum types with a manifest), it is not possible to expose equality of the inlined record type attached to two identical extension constructors (one being bound to the other). |
Comment author: @lpw25
Personally, I think this is backwards compatibility that it would be good to break if possible. Giving two exceptions in the same module the same name is more likely to be a bug than deliberate, and it is quite likely to be a run-time bug on the error path -- so quite a nasty bug to find. It also doesn't allow any extra functionality, since you can just rename one of the exceptions (they must be internal to the module after all). Obviously, as with all breaks in backwards compatibility we should be cautious. Perhaps a check through OPAM would indicate how often this occurs in practise.
I had forgotten about this, but as you pointed out earlier, it is unlikely to cause any backwards compatibility issues since this feature was only introduced in 4.02 and is quite obscure.
I had also forgotten about this issue. It is slightly annoying, but not too different from other type scoping issues. If the error message included the type of the constructor (X.s) in it's output then I don't think it will cause much confusion.
Do you mean that
works? I'm not sure whether this is the correct behaviour or not. Rebinding an exception is currently an implementation issue (i.e. it cannot be exposed in the signature), so it is not clear that it should effect the typing environment. It has been previously suggested (or at least it has been suggested to me) that rebinding could be exposed in signatures, which would make your behaviour definitely correct. Interestingly, the current restriction is purely syntactic: the type-checker is perfectly happy with rebindings in signatures. |
Comment author: @alainfrisch
We don't need to go very far to find such a situation: typing/ctype.ml has two instances of duplicated exception names. The fix is indeed trivial (rename one of them), but we've seen users complaining about the new {|...|} syntax breaking their comments :-) Let's see on caml-devel if anyone is against this move to a stricter semantics.
I did not say there is anything wrong with it, just that, considering how simple the entire stuff is, such tiny details contribute to a significant fraction of the overall "complexity". And it shows that it's not completely unlikely that we forgot something subtle.
Yes. The (small) implementation overhead (need to pass the information on the side, since the extension_constructor doesn't hold the information) is directly related to the fact that the rebinding information is lost in the signature. It's good that you commented on that. I'll thus simplify the patch to stop propagating the manifest type in such rebinding of extension constructors. If we ever track rebinding in signatures, getting the proper manifest where we need in (in Datarepr) will become straightforward. Otherwise, there is indeed no strong reason to make the example above work. (Interestingly, tracking rebinding in signatures will also require to add the constraint on non-duplication of extension constructor names, and to support substitution of constructor paths in Subst as we do in the branch.)
I wouldn't go as far: rebinding is currently not represented in signatures (and thus not supported in Subst nor checked in signature inclusion). More code will be needed than just allowing to specify rebinding in syntactic signatures. |
Comment author: @lpw25
Good point, the type-checker would currently basically desugar them to regular exception definitions, which is probably not what you want for things like functor parameters. Still they would not be hard to support if we wanted to. |
Comment author: @alainfrisch
Good idea! We now get: # module X = struct type t = .. type t += A of {x:int} let x : !A = {x=0} type s = A end;; module X : sig type t = .. type t += A of { x : int; } val x : !A type s = A end # (X.x : !X.A);; Error: Constructor X.s.A does not have an inline record argument |
Comment author: @alainfrisch Yet another branch... constructors_with_record5. This one drops support for referring to inline record types, and adds some syntactic restrictions to ensure that these types never escape. The only constructions that are allowed on pattern variables r bound to inline records (with a pattern of the form "A r" or "A (... as r)") are: r.x Inline record types are now only used as an internal type-checking device allowing us to reuse all the existing machinery for type-checking and compiling records (including mutation, polymorphic fields, etc), but they are not exposed to users any more. Some work is done so that error messages don't mention them (not sure if this can still happen). There was some concerns raised amongst developers about the specific naming mechanism, and this latest proposal, inspired by a patch from Jacques Garrigue, aims at minimality to facilitate inclusion, while keeping the key benefits of inlined records. Future versions might re-allow some kind of naming ability for inline records, and drop the syntactic constraints. Or not. |
Comment author: ybarnoy This seems like the way to go for now. How soon can we integrate it? |
Comment author: @alainfrisch The latest branch (*5) has been integrated in the trunk! |
Comment author: @bobzhang thanks for the nice work. type t = A of {x : int} | B of {x :int};;type t = A of { x : int; } | B of { x : int; } let f = function (A x | B x) -> x.x;;Error: The variable x on the left-hand side of this or-pattern has type The error message still exposes 't.B' which might not be friendly for new users |
Comment author: @garrigue Actually I would argue that using named types in error messages makes them more understandable, in particular when the error is a conflict. |
Comment author: ybarnoy I think it's very clear in error messages and should be kept as is. The feature as a whole works very well, from what I've seen so far. |
Comment author: @alainfrisch Let's mark this is resolved! The current solution addresses quite exactly what was in the original description. Further discussion about extending the system to make the inline record types visible in the type system can be done in another ticket. |
Original bug ID: 5528
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-09-24T15:31:52Z)
Resolution: fixed
Priority: normal
Severity: feature
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #5525 #6374
Monitored by: pierpa @bobzhang @gasche @rixed @diml pveber bobot Bardou Camarade_Tux lavi @jmeber @yallop @hcarty yminsky @Chris00 @yakobowski
Bug description
OCaml allows n-ary constructors for sum types. Instead of relying on position, it would be convenient to name the fields. Of course, one can use records, but this requires an extra type declaration and has some runtime overhead.
I've started a new branch (constructors_with_record) in the SVN to allow naming arguments of constructors, with the same syntax as records.
Example:
type t =
| A of {msg: string; foo:int}
| B of string
| C
let f = function
| A {msg; _} | B msg -> msg
| C -> ""
GADTs and exceptions are supported. It is possible to define mutable fields, but there is currently no way to mutate them. Polymorphic fields are not supported.
Note that this proposal also gives a way to disambiguate record field names:
type a = A of {foo: string; bar: int}
type b = B of {foo: int; baz: bool}
let f (A{foo; _}) = foo
let g (B{foo; _}) = foo
To support mutation and field overriding, I was thinking of a syntax like:
type t = A of {mutable l: int} | B of {x:int; y:int}
match ... with
| A r -> r.l <- r.l + 10
| ...
match ... with
| B r -> B {r with x = r.y; y = r.x}
| ...
Binding (directly like above or with an alias pattern) the "record" argument creates a special value (special in the same way that "self" or "ancestor" variables in objects are special) which can only be used in the following context:
File attachments
The text was updated successfully, but these errors were encountered: