-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity in ML -> Reason conversion #63
Comments
Another approach is that we can choose a default interpretation to convert to -- it is likely not be able compile and will force users to fix them right away (should be easy). But it will make the converting experience worse for first time users. |
This ambiguity arises only when a type constructor has been declared in
OCaml as taking multiple arguments (e.g. `type t = C of u * v` but not
for `type t = C of (u * v)`), right? In this case first, code such as
`match x with C uv -> fst uv` that treats `C` as if it took a pair
implicitly allocates in OCaml. I wonder if instead of inserting
ambiguity warning tokens when importing such OCaml code into Reason, it
would be better to generate code that makes the allocation explicit:
`switch x { C u v -> let uv = (u,v) ; fst uv }`.
|
@jberdine Interesting, this looks like a better approach. Let me try it. |
@jberdine Hmm ... Actually this doesn't quite work. The ambiguity here comes from the fact that we don't know if Constructor in OCaml is taking multiple arguments or not. For example, consider our Ocaml -> Reason reformatter is seeing the following OCaml code segment:
What code do we generate? If we think C is taking one argument (
Or, if we think C is taking two arguments (
But the problem is that we don't have any information with C during parse time (C may be declared in another package). I thought about to generate the following code:
But it doesn't compile. |
I wonder if, perhaps longer term, it would be better for OCaml to Reason to start from the Typedtree instead of Untypeast. Are there other cases where the translation into Reason could be better if the OCaml types were available?
|
@jberdine This is the only case that I can think of where it would be helpful, and it seems that there are workarounds for converting both from and to I could imagine another mode that uses the typed tree in order to make the conversion higher quality (no need to append attributes like |
Yes, there are definite advantages to using the untyped tree.
Perhaps I'm in the minority, but I generally consider using a non-unary constructor as if it was a unary constructor of tuple type to be a bug/bad code in OCaml. That is, I wish there was some warning I could turn into an error to catch that. My impression, but I could be wrong, is that materializing a constructor into a tuple is rare. So it would be great if converting OCaml code that did not play tricks with constructor arity produced nice Reason code, and back. Does that seem feasible? I'm imagining in the simplest form, an option to reasonfmt that would assume that e.g. C (_,_) implies that C is a binary constructor, and if the assumption is wrong, then failing to compile is the desired outcome.
|
Yes, that would required the typed tree, but I think the problem is that there is no way to distinguish between the two in OCaml so you will inevitably have a ton of errors when converting.
I think the problem is that so much code must play tricks with constructor arity in OCaml so virtually every project would fail conversion. |
Perhaps I'm overlooking something, but my impression is that OCaml code intentionally using type definitions such as `type t = C of (u * v) | ...` is rare, as it is almost universally a latent perf bug, and there are only a few niche cases where it is convenient.
Are you entirely opposed to an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature' in order to produce nice Reason code? I really don't want to wade through 10s of KLOC removing implicit arity attributes and deleting parens and commas... In my opinion, it would be much preferable to flag any existing reliance on constructor arity tricks, fix them in the OCaml code, and reconvert. Do you agree @cristianoc, or is my position really strange here?
|
I'm certainly not opposed to it. What would a tool like that look like?
Is that the same thing as saying "assume the input OCaml code does not declare variant types with a single tuple as their data"? If so, then yeah, that would be an easy feature to add. If the assumption is wrong, you'd likely get one or two compiler errors which you could fix up. If what you say is true (that this is super rare), then maybe fixing one/two compiler errors at the time of conversion is worth not having to fix 50 I think the mode that generates all the |
Yes, exactly that.
It it's something like 24, no problem. What I worry about is getting an attribute on every pattern match of a constructor with multiple arguments. Perhaps I'm not understanding just when the attributes are introduced. |
The way I see it, we should have one language, and many language representations. And, try not to break those properties, as any crack in those properties would be pretty difficult to recover from. In particular, we should not change ocaml, unless any changes are pushed upstream. Essentially, the property would be: So there should be no conversion between two languages, as there’s only one language, but just a conversion of representation, obtained by parsing in one and printing in the other. Any actual change, such as disambiguation, would need to happen separately at the language-to-language level, and with this property there are 3 equivalent places to do that: ml to ml syntax, re to re syntax, ast to ast. |
I agree that there is a very high cost to deviating from one language with several representations. I'm not sure about the exact formulation as
parse_x (prettyprint_x (ast)) = ast
In particular, parse_x may not be a surjection. IIUC, at least for OCaml, there are asts not in the image of the parser. For example, I guess that there are asts a0 with pattern matches over constructors with explicit arities, and almost identical a1 with pattern matches over constructors with implicit arities, that can only be pretty-printed to the same OCaml code. So parsing that code can at best give a0 or a1, but not both.
Perhaps we want to consider "the language" not to be all asts, but instead the image of the parsers, or the image of the composition prettyprint o parse. But these images would need to match for all language representations or we'll be back in the same jam as above.
At this point I am thinking that perhaps the best plan is to see if adding (and upstreaming) some (new) attributes to OCaml code could eliminate the problem of not being able to generate all asts with the OCaml parser. For example, I guess at least an attribute to indicate that constructors have explicit arity. Of course, then we're talking not about all asts, but those in the image of some annotation processor. But that might be ok. IIUC, this would be a language-to-language disambiguation operation as Cris suggested, done on the ast representation.
|
Yes, sorry for omitting the detail: I was thinking about a subset of the AST, the representable subset. |
@jberdine You are right. My current proposal is to introduce the attribute whenever there is a constructor with multiple arguments. I think it is fair to have an option to make the reasonfmt assuming the constructor is always operating on multiple arguments. On the other hand, I think we should still add the [@implicit_arity] as a default behavior -- it may turn people away when they try to convert their OCaml project to Reason and get type errors. @cristianoc We have three components in the ast today related to this topic:
For all the cases except 2.b, the property |
@yunxing that's great! As good as we can hope. |
Yes, the current situation is looking very good. Am I the only one who finds attributes fugly, and thinks that they should be few and far between, and exceptional rather than common? In the current situation, it is not possible to have code that has only few attributes when written in both Reason and OCaml, one or the other will have many attributes about arity. I think that it would be significantly lower friction for adopters if it was possible to have code without many attributes in both its Reason and its OCaml representation. So I guess that this is a vote for @yunxing's `C <u,v>` syntax, or something similar.
|
Could 2b be expressed in OCaml as: `C uv -> let (u,v) = uv in`? For this to type check, the type definition for C would have to be of the form `type t = C of (a * b) ...`.
|
@jberdine: (Edit) If converting from Reason back to OCaml, yes that would very much work. (Unfortunately, none of the existing OCaml code uses that convention so upgrading from OCaml -> Reason must take one of the two approaches mentioned (1. Assume multiple args, and make user fixup rare type errors when it's actually a single tuple, 2. Litter with attributes). It would be cool (maybe lower priority) to have a third option that performed some very basic analysis of the set of files you're converting in batch, looking for a type definition matching that constructor, and using that to help guide the decision. Supporting those same heuristics on third party libraries is much more difficult but also possible if you examine the Regarding attributes : I see them as a temporary thing. In practice, you would either run the upgrade tool in fail mode (requires you to fix a rare set of compile errors but leaves you with no additional attributes) or just-works mode which leaves you with attributes that you should fix up by hand over time. Either way, attributes are not being embraces as a long term strategy here. |
The following idea would admitedly be lower priority to implement, but I'm happy that this option exists so I thought I'd share. Since most projects have |
@jordwalke Good point, we should look into that again once we have a project file (we also need to think about the form of migration). |
Using the .cmt files sounds interesting.
Regarding the comment about not having cases to convert from Reason to OCaml, perhaps my expectations are off, but I thought that one objective was to enable developers to use whatever representation they want. So for example I could imagine contributing to an existing project written in OCaml (say the compiler even), do my own development using the Reason tools and representation, but convert back to OCaml when I commit my changes. In short, the choice of representation (such as Reason or OCaml) ought to be made per developer rather than per project. This is maybe not a strict necessity, but it would be so cool, would reduce friction and barriers for adoption, and doesn't seem to be far away technically.
|
@jberdine: I totally agree and it's amazing that there was only one place that needed some extra work in converting. Even with the current proposal above, if you used Another option, even without upstream fixes, is to just provide a copy of the ML parser that is exactly the same as upstream, except it also follows the (superior) convention of distinguishing between multiple arguments. Perhaps this version would also be pushed upstream, but even without it, you could still offer it as an option, fulfilling the one-language-multiple-representations goal. |
Another option which I haven't emphasized but that would also accomplish that feat would be to make Reason worse, matching OCaml's convention. |
Regarding to not being able to converting from Reason to OCaml, another solution is -- based on the observation that a Reason's ast is a superset of OCaml -- to make 2b fallback into an ambiguous syntax, which is 3b. In today's code, when the ML printer meets the ast component of (2)*, it automatically prints it to 1b -- which creates a compiler error since the arity is wrong. We can find a way to make it print it to 3b, but I haven't figured out how. *: In Reason, the ast for
, which is printed as If we can convert it to
,it will be automatically printed as But the ast above is invalid since it violates the invariance that a tuple has to have 2+ components . |
How to print 2b will be tracked by #68. Before we get to the .cmi parsing, as a short term goal we still need to decide if we want |
Have you guys seen the related discussion on this topic: http://caml.inria.fr/mantis/view.php?id=6455 and ocaml/ocaml#284 .
tl;dr: A pull request to resolve the ambiguity in ocaml by supporting use of n-ary constructors as tuples. Didn't meet with much support: Xavier wants to keep multi-arity constructors, and in hindsight thinks that constructors should have been curried. There is mention of a curried-constructors option for ocaml in a later version, so maybe an avenue to gain support for upstreaming a change to unify reason and ocaml in this regard.
And yeah, really amazing that this is the one sticking point. I guess some credit here is due to the ocaml ast, it must be in reasonably good shape to avoid having a bunch of unrepresentable stuff.
Regarding printing constructors with explicit arity to ocaml, have you looked at what the revised syntax does in this case? In particular, what does camlp5 do when printing such asts to the standard ocaml syntax? Perhaps it doesn't end up with parsing and printing being inverses.
|
It sounds like Reason in alignment with what Xavier would want. We are merely making the syntactic distinction even more apparent. If currying were added to the constructors, it would all come together nicely. There's hope of upstreaming improvements. |
Getting back to this. @jberdine: ocamlp5 does a straight forward conversion that doesn't compile (sort of what you've been asking for).
|
As a summary of the options so far: When converting
|
Thanks for looking into camlp5. So much for the hope of learning a trick from it, oh well. Ok, it looks like this situation is in good shape.
|
@cristianoc mentioned in the other issue that there are lots of @implicit_arity added after the conversion. The current implementation asks the client to manually fix all the [@implicity_arity] by hand when they have the mood -- the conversion can still compile without the implicity_arity so they can do it incrementally. If we don't like the conversion story, we have three alternative solutions:
I personally like option 1 better. What do you think from an user's point of view? cc @cristianoc @jberdine @jordwalke |
Assuming you have a ML program that looks like this:
What information about arguments do you get from constructor C? Is it a constructor that takes a tuple? or is it a constructor that takes two arguments? The compiler has to use heuristics to figure it out but we don't have that information at the parsing time.
So here is the solution we are thinking about using:
When translate from ML to Reason, we can provide an ambiguous syntax thats support both cases. The reason ambiguous syntax will be compilable. The user later on has to decide which case he intends to use and gradually migrate from the ambiguous syntax to a specific Reason syntax.
As an example, when an user first convert the above code from ML to Reason. It will be something like this:
The above syntax will be valid and he can start compiling it, the pretty printer is idempotent for this code segment.
Later on he has to decide which interpretation to convert to, it is either:
Or
I will send out a diff about this soon.
The text was updated successfully, but these errors were encountered: