Ambiguity in ML -> Reason conversion #63

yunxing · 2016-01-21T21:57:35Z

Assuming you have a ML program that looks like this:

match a with
  | C (c, d) -> 1
;;

What information about arguments do you get from constructor C? Is it a constructor that takes a tuple? or is it a constructor that takes two arguments? The compiler has to use heuristics to figure it out but we don't have that information at the parsing time.

So here is the solution we are thinking about using:

When translate from ML to Reason, we can provide an ambiguous syntax thats support both cases. The reason ambiguous syntax will be compilable. The user later on has to decide which case he intends to use and gradually migrate from the ambiguous syntax to a specific Reason syntax.

As an example, when an user first convert the above code from ML to Reason. It will be something like this:

  switch a {
    | C (TODO_REMOVE_AMBIGUITY__ f x __TODO_REMOVE_AMBIGUITY) -> 1
  }

The above syntax will be valid and he can start compiling it, the pretty printer is idempotent for this code segment.

Later on he has to decide which interpretation to convert to, it is either:

  switch a {
    | C (f, x) -> 1
  }

Or

switch a {
    | C f x -> 1
}

I will send out a diff about this soon.

The text was updated successfully, but these errors were encountered:

yunxing · 2016-01-21T22:15:05Z

Another approach is that we can choose a default interpretation to convert to -- it is likely not be able compile and will force users to fix them right away (should be easy). But it will make the converting experience worse for first time users.

jberdine · 2016-01-24T22:07:51Z

This ambiguity arises only when a type constructor has been declared in OCaml as taking multiple arguments (e.g. `type t = C of u * v` but not for `type t = C of (u * v)`), right? In this case first, code such as `match x with C uv -> fst uv` that treats `C` as if it took a pair implicitly allocates in OCaml. I wonder if instead of inserting ambiguity warning tokens when importing such OCaml code into Reason, it would be better to generate code that makes the allocation explicit: `switch x { C u v -> let uv = (u,v) ; fst uv }`.

yunxing · 2016-01-26T06:20:34Z

@jberdine Interesting, this looks like a better approach. Let me try it.

yunxing · 2016-01-30T07:06:12Z

@jberdine Hmm ... Actually this doesn't quite work.

The ambiguity here comes from the fact that we don't know if Constructor in OCaml is taking multiple arguments or not.

For example, consider our Ocaml -> Reason reformatter is seeing the following OCaml code segment:

type t = C of ...

match x with
  | C (1, 1) -> 1
;;

...

What code do we generate?

If we think C is taking one argument (type t = C of (int * int);;), we can generate:


switch x {
  | C (1, 1) => 1
};

...

Or, if we think C is taking two arguments (type t = C of int * int;;), we can generate:

...

switch x {
  | C 1 1 => 1
};

...

But the problem is that we don't have any information with C during parse time (C may be declared in another package).

I thought about to generate the following code:


switch x {
  | C 1 1 | C (1, 1) => 1
};

But it doesn't compile.

jberdine · 2016-02-01T19:43:01Z

I wonder if, perhaps longer term, it would be better for OCaml to Reason to start from the Typedtree instead of Untypeast. Are there other cases where the translation into Reason could be better if the OCaml types were available?

jordwalke · 2016-02-01T20:33:49Z

@jberdine This is the only case that I can think of where it would be helpful, and it seems that there are workarounds for converting both from and to OCaml syntax. The benefit of working at the parse tree level is that upgrading of existing code fromOCaml to Reason is trivial, and doesn't require any knowledge of your build system.

I could imagine another mode that uses the typed tree in order to make the conversion higher quality (no need to append attributes like [@implitic_arity].

jberdine · 2016-02-01T22:59:31Z

Yes, there are definite advantages to using the untyped tree. Perhaps I'm in the minority, but I generally consider using a non-unary constructor as if it was a unary constructor of tuple type to be a bug/bad code in OCaml. That is, I wish there was some warning I could turn into an error to catch that. My impression, but I could be wrong, is that materializing a constructor into a tuple is rare. So it would be great if converting OCaml code that did not play tricks with constructor arity produced nice Reason code, and back. Does that seem feasible? I'm imagining in the simplest form, an option to reasonfmt that would assume that e.g. C (_,_) implies that C is a binary constructor, and if the assumption is wrong, then failing to compile is the desired outcome.

jordwalke · 2016-02-02T02:30:06Z

Yes, that would required the typed tree, but I think the problem is that there is no way to distinguish between the two in OCaml so you will inevitably have a ton of errors when converting.

So it would be great if converting OCaml code that did not play tricks with constructor arity

I think the problem is that so much code must play tricks with constructor arity in OCaml so virtually every project would fail conversion.

jberdine · 2016-02-02T09:54:13Z

Perhaps I'm overlooking something, but my impression is that OCaml code intentionally using type definitions such as `type t = C of (u * v) | ...` is rare, as it is almost universally a latent perf bug, and there are only a few niche cases where it is convenient. Are you entirely opposed to an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature' in order to produce nice Reason code? I really don't want to wade through 10s of KLOC removing implicit arity attributes and deleting parens and commas... In my opinion, it would be much preferable to flag any existing reliance on constructor arity tricks, fix them in the OCaml code, and reconvert. Do you agree @cristianoc, or is my position really strange here?

jordwalke · 2016-02-02T10:45:52Z

I'm certainly not opposed to it. What would a tool like that look like?

an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature'

Is that the same thing as saying "assume the input OCaml code does not declare variant types with a single tuple as their data"? If so, then yeah, that would be an easy feature to add. If the assumption is wrong, you'd likely get one or two compiler errors which you could fix up. If what you say is true (that this is super rare), then maybe fixing one/two compiler errors at the time of conversion is worth not having to fix 50 [@implicit_arity] attributes.

I think the mode that generates all the [@implicit_arity] flags is still very nice because it gives us a way to 100% guarantee that a project can be converted over without hassle (imagine people just want to quickly try it out without commitment etc) .

jberdine · 2016-02-02T16:38:38Z

an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature'

Is that the same thing as saying "assume the input OCaml code does not declare variant types with a single tuple as their data"?

Yes, exactly that.

If so, then yeah, that would be an easy feature to add. If the assumption is wrong, you'd likely get one or two compiler errors. If what you say is true (that this is super rare), then maybe fixing one/two compiler errors at the time of conversion is worth not having to fix 24 [@implicit_arity] attributes.

It it's something like 24, no problem. What I worry about is getting an attribute on every pattern match of a constructor with multiple arguments. Perhaps I'm not understanding just when the attributes are introduced.

cristianoc · 2016-02-02T17:48:29Z

The way I see it, we should have one language, and many language representations. And, try not to break those properties, as any crack in those properties would be pretty difficult to recover from. In particular, we should not change ocaml, unless any changes are pushed upstream.

Essentially, the property would be:
parse_x (prettyprint_x (ast)) = ast
For any representation x, currently x=ml or x=re.
Do we have that property at the moment?

So there should be no conversion between two languages, as there’s only one language, but just a conversion of representation, obtained by parsing in one and printing in the other.

Any actual change, such as disambiguation, would need to happen separately at the language-to-language level, and with this property there are 3 equivalent places to do that: ml to ml syntax, re to re syntax, ast to ast.
One can imagine an automatic disambiguation operation by simple pattern matching, that will make a default choice, and leave a bunch of type errors to be manually fixed afterwards.

jberdine · 2016-02-02T19:28:01Z

I agree that there is a very high cost to deviating from one language with several representations. I'm not sure about the exact formulation as parse_x (prettyprint_x (ast)) = ast In particular, parse_x may not be a surjection. IIUC, at least for OCaml, there are asts not in the image of the parser. For example, I guess that there are asts a0 with pattern matches over constructors with explicit arities, and almost identical a1 with pattern matches over constructors with implicit arities, that can only be pretty-printed to the same OCaml code. So parsing that code can at best give a0 or a1, but not both. Perhaps we want to consider "the language" not to be all asts, but instead the image of the parsers, or the image of the composition prettyprint o parse. But these images would need to match for all language representations or we'll be back in the same jam as above. At this point I am thinking that perhaps the best plan is to see if adding (and upstreaming) some (new) attributes to OCaml code could eliminate the problem of not being able to generate all asts with the OCaml parser. For example, I guess at least an attribute to indicate that constructors have explicit arity. Of course, then we're talking not about all asts, but those in the image of some annotation processor. But that might be ok. IIUC, this would be a language-to-language disambiguation operation as Cris suggested, done on the ast representation.

cristianoc · 2016-02-02T19:52:48Z

Yes, sorry for omitting the detail: I was thinking about a subset of the AST, the representable subset.

yunxing · 2016-02-02T21:19:02Z

@jberdine You are right. My current proposal is to introduce the attribute whenever there is a constructor with multiple arguments. I think it is fair to have an option to make the reasonfmt assuming the constructor is always operating on multiple arguments. On the other hand, I think we should still add the [@implicit_arity] as a default behavior -- it may turn people away when they try to convert their OCaml project to Reason and get type errors.

@cristianoc We have three components in the ast today related to this topic:

A disambiguated constructor that takes multiple arguments:
a. Representation in Reason: C u v.
b. Representation in OCaml: C (u, v) [@explicit_arity].
A disambiguated constructor that take tuple as single argument:
a. Representation in Reason: C (u, v).
b. Not representable in OCaml yet.
A ambiguous constructor that can either take a tuple or multiple arguments:
a. Representation in Reason: C (u, v)[@implicit_arity].
b. Representation in OCaml: C (u, v).

For all the cases except 2.b, the property parse_x (prettyprint_x (ast)) = ast is maintained

cristianoc · 2016-02-02T22:20:25Z

@yunxing that's great! As good as we can hope.
I think having reason representing a slightly larger set of ASTs than ocaml is perfectly OK until we can propose to push an ocaml extension upstream, e.g. by having the ocaml parser and pretty printer recognize certain attributes.

jberdine · 2016-02-03T13:11:30Z

Yes, the current situation is looking very good. Am I the only one who finds attributes fugly, and thinks that they should be few and far between, and exceptional rather than common? In the current situation, it is not possible to have code that has only few attributes when written in both Reason and OCaml, one or the other will have many attributes about arity. I think that it would be significantly lower friction for adopters if it was possible to have code without many attributes in both its Reason and its OCaml representation. So I guess that this is a vote for @yunxing's `C <u,v>` syntax, or something similar.

jberdine · 2016-02-03T13:55:26Z

Could 2b be expressed in OCaml as: `C uv -> let (u,v) = uv in`? For this to type check, the type definition for C would have to be of the form `type t = C of (a * b) ...`.

jordwalke · 2016-02-03T20:10:59Z

@jberdine: (Edit) If converting from Reason back to OCaml, yes that would very much work. (Unfortunately, none of the existing OCaml code uses that convention so upgrading from OCaml -> Reason must take one of the two approaches mentioned (1. Assume multiple args, and make user fixup rare type errors when it's actually a single tuple, 2. Litter with attributes).

It would be cool (maybe lower priority) to have a third option that performed some very basic analysis of the set of files you're converting in batch, looking for a type definition matching that constructor, and using that to help guide the decision. Supporting those same heuristics on third party libraries is much more difficult but also possible if you examine the .cmt files.

Regarding attributes : I see them as a temporary thing. In practice, you would either run the upgrade tool in fail mode (requires you to fix a rare set of compile errors but leaves you with no additional attributes) or just-works mode which leaves you with attributes that you should fix up by hand over time. Either way, attributes are not being embraces as a long term strategy here.

yunxing · 2016-02-04T05:07:26Z

@jberdine Good point, 2b could be expressed as that. I think it is also the right solution to #68. It is currently low pri since we don't think today we have many cases to convert from Reason back to OCaml.

jordwalke · 2016-02-04T06:10:55Z

The following idea would admitedly be lower priority to implement, but I'm happy that this option exists so I thought I'd share. Since most projects have .merlin files, which list the location of the build artifacts of their own files (and their dependencies), we can use merlin to tell us the types of any AST node without having to tap into anyone's build system. We only ask that you have built your project when you do the conversion. This would give you the ideal output - everything type checks perfectly with no attributes, and the syntax uses the superior form! (Again, perhaps it's lower priority, but the fact that this option exists might change how much we choose to invest in a shorter term solution).

yunxing · 2016-02-04T06:32:54Z

@jordwalke Good point, we should look into that again once we have a project file (we also need to think about the form of migration).

jberdine · 2016-02-04T11:02:41Z

Using the .cmt files sounds interesting. Regarding the comment about not having cases to convert from Reason to OCaml, perhaps my expectations are off, but I thought that one objective was to enable developers to use whatever representation they want. So for example I could imagine contributing to an existing project written in OCaml (say the compiler even), do my own development using the Reason tools and representation, but convert back to OCaml when I commit my changes. In short, the choice of representation (such as Reason or OCaml) ought to be made per developer rather than per project. This is maybe not a strict necessity, but it would be so cool, would reduce friction and barriers for adoption, and doesn't seem to be far away technically.

jordwalke · 2016-02-04T20:26:56Z

@jberdine: I totally agree and it's amazing that there was only one place that needed some extra work in converting. Even with the current proposal above, if you used .merlin, you could convert back and forth somewhat reliably, but not rapidly enough to fulfill the spirit of the "one language - several representations in dev environment". I think some relatively small upstream fixes to OCaml could actually end up letting us fulfill that vision in the truest sense.

Another option, even without upstream fixes, is to just provide a copy of the ML parser that is exactly the same as upstream, except it also follows the (superior) convention of distinguishing between multiple arguments. Perhaps this version would also be pushed upstream, but even without it, you could still offer it as an option, fulfilling the one-language-multiple-representations goal.

jordwalke · 2016-02-04T23:23:12Z

Another option which I haven't emphasized but that would also accomplish that feat would be to make Reason worse, matching OCaml's convention.

yunxing · 2016-02-05T06:44:32Z

Regarding to not being able to converting from Reason to OCaml, another solution is -- based on the observation that a Reason's ast is a superset of OCaml -- to make 2b fallback into an ambiguous syntax, which is 3b.

In today's code, when the ML printer meets the ast component of (2)*, it automatically prints it to 1b -- which creates a compiler error since the arity is wrong.

We can find a way to make it print it to 3b, but I haven't figured out how.

*: In Reason, the ast for C (u, v) looks like this:

attribute "axplicit_arity" 
  []
Pexp_constructor "C"
Some 
  expression
    Pexp_tuple
   [
     expression
       Pexp_tuple
       [
          expression
            Pexp_ident u
          expression
            Pexp_ident v
       ]
   ]

, which is printed as C (u, v) [@explicit_arity] in ML

If we can convert it to

Pexp_constructor "C"
Some 
  expression
    Pexp_tuple
   [
     expression
       Pexp_tuple
       [
          expression
            Pexp_ident u
          expression
            Pexp_ident v
       ]
   ]

,it will be automatically printed as C (u, v) in ML.

But the ast above is invalid since it violates the invariance that a tuple has to have 2+ components .

yunxing · 2016-02-05T07:14:24Z

How to print 2b will be tracked by #68.

Before we get to the .cmi parsing, as a short term goal we still need to decide if we want C (u, v)[@implicit_arity], C <u,v>, or something else.

jberdine · 2016-02-05T11:34:30Z

Have you guys seen the related discussion on this topic: http://caml.inria.fr/mantis/view.php?id=6455 and ocaml/ocaml#284 . tl;dr: A pull request to resolve the ambiguity in ocaml by supporting use of n-ary constructors as tuples. Didn't meet with much support: Xavier wants to keep multi-arity constructors, and in hindsight thinks that constructors should have been curried. There is mention of a curried-constructors option for ocaml in a later version, so maybe an avenue to gain support for upstreaming a change to unify reason and ocaml in this regard. And yeah, really amazing that this is the one sticking point. I guess some credit here is due to the ocaml ast, it must be in reasonably good shape to avoid having a bunch of unrepresentable stuff. Regarding printing constructors with explicit arity to ocaml, have you looked at what the revised syntax does in this case? In particular, what does camlp5 do when printing such asts to the standard ocaml syntax? Perhaps it doesn't end up with parsing and printing being inverses.

jordwalke · 2016-02-05T22:23:03Z

It sounds like Reason in alignment with what Xavier would want. We are merely making the syntactic distinction even more apparent. If currying were added to the constructors, it would all come together nicely. There's hope of upstreaming improvements.

yunxing · 2016-02-12T05:29:56Z

Getting back to this.

@jberdine: ocamlp5 does a straight forward conversion that doesn't compile (sort of what you've been asking for).

~/p/ocaml $ cat try.ml # a regular OCaml file
type t = C of (int * int);;
C (1, 2);;

~/p/ocaml $ camlp5o pr_r.cmo try.ml
type t =
  [ C of (int * int) ]
;
C 1 2;

~/p/ocaml $ camlp5o pr_r.cmo try.ml > revised.ml  # converting it to revised syntax

~/p/ocaml $ cat revised.ml
type t =
  [ C of (int * int) ]
;
C 1 2;

~/p/ocaml $ ocamlc -pp camlp5r revised.ml # Doesn't compile
File "revised.ml", line 4, characters 0-5:
Error: The constructor C expects 1 argument(s),
       but is applied here to 2 argument(s)

yunxing · 2016-02-12T05:37:27Z

As a summary of the options so far:

When converting C (1, 2);; from OCaml to Reason, there are three different proposals so far:

Converting them as C 1 2:
This is what ocamp5 uses and it is provided as a commandline option in Add implicit_arity attribute when converting from OCaml #70. It assumes the constructors is N-arity and could generate compiler errors after conversion.
As C (1, 2) [@implicit_arity]:
This is default behavior in Add implicit_arity attribute when converting from OCaml #70 and is ambiguous.
As C <1, 2>:
This is same as 2, but a bit nicer.

jberdine · 2016-02-12T11:03:15Z

Thanks for looking into camlp5. So much for the hope of learning a trick from it, oh well. Ok, it looks like this situation is in good shape.

yunxing · 2016-03-02T17:19:40Z

@cristianoc mentioned in the other issue that there are lots of @implicit_arity added after the conversion.

The current implementation asks the client to manually fix all the [@implicity_arity] by hand when they have the mood -- the conversion can still compile without the implicity_arity so they can do it incrementally.

If we don't like the conversion story, we have three alternative solutions:

Convert them to a new syntax C <1, 2> -- same as C (1, 2) [@implicit_airty], but a bit nicer.
Add an option in rebuild which forces the conversion to produce C 1 2.
Use heuristics in the AST tree and automatically decide to use C 1 2 or C (1, 2) -- this will only work with constructor defined in a single file.

I personally like option 1 better. What do you think from an user's point of view? cc @cristianoc @jberdine @jordwalke

yunxing mentioned this issue Jan 21, 2016

Introducing an ambiguous syntax for construct args #64

Closed

yunxing mentioned this issue Jan 26, 2016

Any objection to converting the pretty printer to Reason? #11

Closed

yunxing mentioned this issue Feb 1, 2016

Add implicit_arity attribute when converting from OCaml #70

Closed

yunxing added the OCaml->Reason label Feb 12, 2016

yunxing added the Infer Conversion label Feb 12, 2016

yunxing mentioned this issue Mar 2, 2016

[ExplicitArity] [pprinter] Generated OCaml is not compilable #68

Closed

yunxing closed this as completed Mar 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguity in ML -> Reason conversion #63

Ambiguity in ML -> Reason conversion #63

yunxing commented Jan 21, 2016

yunxing commented Jan 21, 2016

jberdine commented Jan 24, 2016 via email

yunxing commented Jan 26, 2016

yunxing commented Jan 30, 2016

jberdine commented Feb 1, 2016 via email

jordwalke commented Feb 1, 2016

jberdine commented Feb 1, 2016 via email

jordwalke commented Feb 2, 2016

jberdine commented Feb 2, 2016 via email

jordwalke commented Feb 2, 2016

jberdine commented Feb 2, 2016

cristianoc commented Feb 2, 2016

jberdine commented Feb 2, 2016 via email

cristianoc commented Feb 2, 2016

yunxing commented Feb 2, 2016

cristianoc commented Feb 2, 2016

jberdine commented Feb 3, 2016 via email

jberdine commented Feb 3, 2016 via email

jordwalke commented Feb 3, 2016

yunxing commented Feb 4, 2016

jordwalke commented Feb 4, 2016

yunxing commented Feb 4, 2016

jberdine commented Feb 4, 2016 via email

jordwalke commented Feb 4, 2016

jordwalke commented Feb 4, 2016

yunxing commented Feb 5, 2016

yunxing commented Feb 5, 2016

jberdine commented Feb 5, 2016 via email

jordwalke commented Feb 5, 2016

yunxing commented Feb 12, 2016

yunxing commented Feb 12, 2016

jberdine commented Feb 12, 2016 via email

yunxing commented Mar 2, 2016

Ambiguity in ML -> Reason conversion #63

Ambiguity in ML -> Reason conversion #63

Comments

yunxing commented Jan 21, 2016

yunxing commented Jan 21, 2016

jberdine commented Jan 24, 2016 via email

yunxing commented Jan 26, 2016

yunxing commented Jan 30, 2016

jberdine commented Feb 1, 2016 via email

jordwalke commented Feb 1, 2016

jberdine commented Feb 1, 2016 via email

jordwalke commented Feb 2, 2016

jberdine commented Feb 2, 2016 via email

jordwalke commented Feb 2, 2016

jberdine commented Feb 2, 2016

cristianoc commented Feb 2, 2016

jberdine commented Feb 2, 2016 via email

cristianoc commented Feb 2, 2016

yunxing commented Feb 2, 2016

cristianoc commented Feb 2, 2016

jberdine commented Feb 3, 2016 via email

jberdine commented Feb 3, 2016 via email

jordwalke commented Feb 3, 2016

yunxing commented Feb 4, 2016

jordwalke commented Feb 4, 2016

yunxing commented Feb 4, 2016

jberdine commented Feb 4, 2016 via email

jordwalke commented Feb 4, 2016

jordwalke commented Feb 4, 2016

yunxing commented Feb 5, 2016

yunxing commented Feb 5, 2016

jberdine commented Feb 5, 2016 via email

jordwalke commented Feb 5, 2016

yunxing commented Feb 12, 2016

yunxing commented Feb 12, 2016

jberdine commented Feb 12, 2016 via email

yunxing commented Mar 2, 2016