Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity in ML -> Reason conversion #63

Closed
yunxing opened this issue Jan 21, 2016 · 33 comments
Closed

Ambiguity in ML -> Reason conversion #63

yunxing opened this issue Jan 21, 2016 · 33 comments

Comments

@yunxing
Copy link
Contributor

yunxing commented Jan 21, 2016

Assuming you have a ML program that looks like this:

match a with
  | C (c, d) -> 1
;;

What information about arguments do you get from constructor C? Is it a constructor that takes a tuple? or is it a constructor that takes two arguments? The compiler has to use heuristics to figure it out but we don't have that information at the parsing time.

So here is the solution we are thinking about using:

When translate from ML to Reason, we can provide an ambiguous syntax thats support both cases. The reason ambiguous syntax will be compilable. The user later on has to decide which case he intends to use and gradually migrate from the ambiguous syntax to a specific Reason syntax.

As an example, when an user first convert the above code from ML to Reason. It will be something like this:

  switch a {
    | C (TODO_REMOVE_AMBIGUITY__ f x __TODO_REMOVE_AMBIGUITY) -> 1
  }

The above syntax will be valid and he can start compiling it, the pretty printer is idempotent for this code segment.

Later on he has to decide which interpretation to convert to, it is either:

  switch a {
    | C (f, x) -> 1
  }

Or

switch a {
    | C f x -> 1
}

I will send out a diff about this soon.

@yunxing
Copy link
Contributor Author

yunxing commented Jan 21, 2016

Another approach is that we can choose a default interpretation to convert to -- it is likely not be able compile and will force users to fix them right away (should be easy). But it will make the converting experience worse for first time users.

@jberdine
Copy link
Contributor

jberdine commented Jan 24, 2016 via email

@yunxing
Copy link
Contributor Author

yunxing commented Jan 26, 2016

@jberdine Interesting, this looks like a better approach. Let me try it.

@yunxing
Copy link
Contributor Author

yunxing commented Jan 30, 2016

@jberdine Hmm ... Actually this doesn't quite work.

The ambiguity here comes from the fact that we don't know if Constructor in OCaml is taking multiple arguments or not.

For example, consider our Ocaml -> Reason reformatter is seeing the following OCaml code segment:

type t = C of ...

match x with
  | C (1, 1) -> 1
;;

...

What code do we generate?

If we think C is taking one argument (type t = C of (int * int);;), we can generate:


switch x {
  | C (1, 1) => 1
};

...

Or, if we think C is taking two arguments (type t = C of int * int;;), we can generate:

...

switch x {
  | C 1 1 => 1
};

...

But the problem is that we don't have any information with C during parse time (C may be declared in another package).

I thought about to generate the following code:


switch x {
  | C 1 1 | C (1, 1) => 1
};


But it doesn't compile.

@jberdine
Copy link
Contributor

jberdine commented Feb 1, 2016 via email

@jordwalke
Copy link
Member

@jberdine This is the only case that I can think of where it would be helpful, and it seems that there are workarounds for converting both from and to OCaml syntax. The benefit of working at the parse tree level is that upgrading of existing code fromOCaml to Reason is trivial, and doesn't require any knowledge of your build system.

I could imagine another mode that uses the typed tree in order to make the conversion higher quality (no need to append attributes like [@implitic_arity].

@jberdine
Copy link
Contributor

jberdine commented Feb 1, 2016 via email

@jordwalke
Copy link
Member

Yes, that would required the typed tree, but I think the problem is that there is no way to distinguish between the two in OCaml so you will inevitably have a ton of errors when converting.

So it would be great if converting OCaml code that did not play tricks with constructor arity

I think the problem is that so much code must play tricks with constructor arity in OCaml so virtually every project would fail conversion.

@jberdine
Copy link
Contributor

jberdine commented Feb 2, 2016 via email

@jordwalke
Copy link
Member

I'm certainly not opposed to it. What would a tool like that look like?

an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature'

Is that the same thing as saying "assume the input OCaml code does not declare variant types with a single tuple as their data"? If so, then yeah, that would be an easy feature to add. If the assumption is wrong, you'd likely get one or two compiler errors which you could fix up. If what you say is true (that this is super rare), then maybe fixing one/two compiler errors at the time of conversion is worth not having to fix 50 [@implicit_arity] attributes.

I think the mode that generates all the [@implicit_arity] flags is still very nice because it gives us a way to 100% guarantee that a project can be converted over without hassle (imagine people just want to quickly try it out without commitment etc) .

@jberdine
Copy link
Contributor

jberdine commented Feb 2, 2016

an option that would essentially tell reasonfmt to assume that the input OCaml code does not use the arity trick 'feature'

Is that the same thing as saying "assume the input OCaml code does not declare variant types with a single tuple as their data"?

Yes, exactly that.

If so, then yeah, that would be an easy feature to add. If the assumption is wrong, you'd likely get one or two compiler errors. If what you say is true (that this is super rare), then maybe fixing one/two compiler errors at the time of conversion is worth not having to fix 24 [@implicit_arity] attributes.

It it's something like 24, no problem. What I worry about is getting an attribute on every pattern match of a constructor with multiple arguments. Perhaps I'm not understanding just when the attributes are introduced.

@cristianoc
Copy link
Contributor

The way I see it, we should have one language, and many language representations. And, try not to break those properties, as any crack in those properties would be pretty difficult to recover from. In particular, we should not change ocaml, unless any changes are pushed upstream.

Essentially, the property would be:
parse_x (prettyprint_x (ast)) = ast
For any representation x, currently x=ml or x=re.
Do we have that property at the moment?

So there should be no conversion between two languages, as there’s only one language, but just a conversion of representation, obtained by parsing in one and printing in the other.

Any actual change, such as disambiguation, would need to happen separately at the language-to-language level, and with this property there are 3 equivalent places to do that: ml to ml syntax, re to re syntax, ast to ast.
One can imagine an automatic disambiguation operation by simple pattern matching, that will make a default choice, and leave a bunch of type errors to be manually fixed afterwards.

@jberdine
Copy link
Contributor

jberdine commented Feb 2, 2016 via email

@cristianoc
Copy link
Contributor

Yes, sorry for omitting the detail: I was thinking about a subset of the AST, the representable subset.

@yunxing
Copy link
Contributor Author

yunxing commented Feb 2, 2016

@jberdine You are right. My current proposal is to introduce the attribute whenever there is a constructor with multiple arguments. I think it is fair to have an option to make the reasonfmt assuming the constructor is always operating on multiple arguments. On the other hand, I think we should still add the [@implicit_arity] as a default behavior -- it may turn people away when they try to convert their OCaml project to Reason and get type errors.

@cristianoc We have three components in the ast today related to this topic:

  1. A disambiguated constructor that takes multiple arguments:
    a. Representation in Reason: C u v.
    b. Representation in OCaml: C (u, v) [@explicit_arity].
  2. A disambiguated constructor that take tuple as single argument:
    a. Representation in Reason: C (u, v).
    b. Not representable in OCaml yet.
  3. A ambiguous constructor that can either take a tuple or multiple arguments:
    a. Representation in Reason: C (u, v)[@implicit_arity].
    b. Representation in OCaml: C (u, v).

For all the cases except 2.b, the property parse_x (prettyprint_x (ast)) = ast is maintained

@cristianoc
Copy link
Contributor

@yunxing that's great! As good as we can hope.
I think having reason representing a slightly larger set of ASTs than ocaml is perfectly OK until we can propose to push an ocaml extension upstream, e.g. by having the ocaml parser and pretty printer recognize certain attributes.

@jberdine
Copy link
Contributor

jberdine commented Feb 3, 2016 via email

@jberdine
Copy link
Contributor

jberdine commented Feb 3, 2016 via email

@jordwalke
Copy link
Member

@jberdine: (Edit) If converting from Reason back to OCaml, yes that would very much work. (Unfortunately, none of the existing OCaml code uses that convention so upgrading from OCaml -> Reason must take one of the two approaches mentioned (1. Assume multiple args, and make user fixup rare type errors when it's actually a single tuple, 2. Litter with attributes).

It would be cool (maybe lower priority) to have a third option that performed some very basic analysis of the set of files you're converting in batch, looking for a type definition matching that constructor, and using that to help guide the decision. Supporting those same heuristics on third party libraries is much more difficult but also possible if you examine the .cmt files.

Regarding attributes : I see them as a temporary thing. In practice, you would either run the upgrade tool in fail mode (requires you to fix a rare set of compile errors but leaves you with no additional attributes) or just-works mode which leaves you with attributes that you should fix up by hand over time. Either way, attributes are not being embraces as a long term strategy here.

@yunxing
Copy link
Contributor Author

yunxing commented Feb 4, 2016

@jberdine Good point, 2b could be expressed as that. I think it is also the right solution to #68. It is currently low pri since we don't think today we have many cases to convert from Reason back to OCaml.

@jordwalke
Copy link
Member

The following idea would admitedly be lower priority to implement, but I'm happy that this option exists so I thought I'd share. Since most projects have .merlin files, which list the location of the build artifacts of their own files (and their dependencies), we can use merlin to tell us the types of any AST node without having to tap into anyone's build system. We only ask that you have built your project when you do the conversion. This would give you the ideal output - everything type checks perfectly with no attributes, and the syntax uses the superior form! (Again, perhaps it's lower priority, but the fact that this option exists might change how much we choose to invest in a shorter term solution).

@yunxing
Copy link
Contributor Author

yunxing commented Feb 4, 2016

@jordwalke Good point, we should look into that again once we have a project file (we also need to think about the form of migration).

@jberdine
Copy link
Contributor

jberdine commented Feb 4, 2016 via email

@jordwalke
Copy link
Member

@jberdine: I totally agree and it's amazing that there was only one place that needed some extra work in converting. Even with the current proposal above, if you used .merlin, you could convert back and forth somewhat reliably, but not rapidly enough to fulfill the spirit of the "one language - several representations in dev environment". I think some relatively small upstream fixes to OCaml could actually end up letting us fulfill that vision in the truest sense.

Another option, even without upstream fixes, is to just provide a copy of the ML parser that is exactly the same as upstream, except it also follows the (superior) convention of distinguishing between multiple arguments. Perhaps this version would also be pushed upstream, but even without it, you could still offer it as an option, fulfilling the one-language-multiple-representations goal.

@jordwalke
Copy link
Member

Another option which I haven't emphasized but that would also accomplish that feat would be to make Reason worse, matching OCaml's convention.

@yunxing
Copy link
Contributor Author

yunxing commented Feb 5, 2016

Regarding to not being able to converting from Reason to OCaml, another solution is -- based on the observation that a Reason's ast is a superset of OCaml -- to make 2b fallback into an ambiguous syntax, which is 3b.

In today's code, when the ML printer meets the ast component of (2)*, it automatically prints it to 1b -- which creates a compiler error since the arity is wrong.

We can find a way to make it print it to 3b, but I haven't figured out how.

*: In Reason, the ast for C (u, v) looks like this:

attribute "axplicit_arity" 
  []
Pexp_constructor "C"
Some 
  expression
    Pexp_tuple
   [
     expression
       Pexp_tuple
       [
          expression
            Pexp_ident u
          expression
            Pexp_ident v
       ]
   ]

, which is printed as C (u, v) [@explicit_arity] in ML

If we can convert it to

Pexp_constructor "C"
Some 
  expression
    Pexp_tuple
   [
     expression
       Pexp_tuple
       [
          expression
            Pexp_ident u
          expression
            Pexp_ident v
       ]
   ]

,it will be automatically printed as C (u, v) in ML.

But the ast above is invalid since it violates the invariance that a tuple has to have 2+ components .

@yunxing
Copy link
Contributor Author

yunxing commented Feb 5, 2016

How to print 2b will be tracked by #68.

Before we get to the .cmi parsing, as a short term goal we still need to decide if we want C (u, v)[@implicit_arity], C <u,v>, or something else.

@jberdine
Copy link
Contributor

jberdine commented Feb 5, 2016 via email

@jordwalke
Copy link
Member

It sounds like Reason in alignment with what Xavier would want. We are merely making the syntactic distinction even more apparent. If currying were added to the constructors, it would all come together nicely. There's hope of upstreaming improvements.

@yunxing
Copy link
Contributor Author

yunxing commented Feb 12, 2016

Getting back to this.

@jberdine: ocamlp5 does a straight forward conversion that doesn't compile (sort of what you've been asking for).

~/p/ocaml $ cat try.ml # a regular OCaml file
type t = C of (int * int);;
C (1, 2);;

~/p/ocaml $ camlp5o pr_r.cmo try.ml
type t =
  [ C of (int * int) ]
;
C 1 2;

~/p/ocaml $ camlp5o pr_r.cmo try.ml > revised.ml  # converting it to revised syntax

~/p/ocaml $ cat revised.ml
type t =
  [ C of (int * int) ]
;
C 1 2;

~/p/ocaml $ ocamlc -pp camlp5r revised.ml # Doesn't compile
File "revised.ml", line 4, characters 0-5:
Error: The constructor C expects 1 argument(s),
       but is applied here to 2 argument(s)

@yunxing
Copy link
Contributor Author

yunxing commented Feb 12, 2016

As a summary of the options so far:

When converting C (1, 2);; from OCaml to Reason, there are three different proposals so far:

  1. Converting them as C 1 2:
    This is what ocamp5 uses and it is provided as a commandline option in Add implicit_arity attribute when converting from OCaml #70. It assumes the constructors is N-arity and could generate compiler errors after conversion.
  2. As C (1, 2) [@implicit_arity]:
    This is default behavior in Add implicit_arity attribute when converting from OCaml #70 and is ambiguous.
  3. As C <1, 2>:
    This is same as 2, but a bit nicer.

@jberdine
Copy link
Contributor

jberdine commented Feb 12, 2016 via email

@yunxing
Copy link
Contributor Author

yunxing commented Mar 2, 2016

@cristianoc mentioned in the other issue that there are lots of @implicit_arity added after the conversion.

The current implementation asks the client to manually fix all the [@implicity_arity] by hand when they have the mood -- the conversion can still compile without the implicity_arity so they can do it incrementally.

If we don't like the conversion story, we have three alternative solutions:

  1. Convert them to a new syntax C <1, 2> -- same as C (1, 2) [@implicit_airty], but a bit nicer.
  2. Add an option in rebuild which forces the conversion to produce C 1 2.
  3. Use heuristics in the AST tree and automatically decide to use C 1 2 or C (1, 2) -- this will only work with constructor defined in a single file.

I personally like option 1 better. What do you think from an user's point of view? cc @cristianoc @jberdine @jordwalke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants