Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M.(::) syntax and printing exotic lists in the toplevel. #1247

Merged
merged 7 commits into from
Aug 11, 2017

Conversation

Octachron
Copy link
Member

The initial aim of this PR was to fix the printing of exotic lists in the toplevel.
For instance, consider the following silly example of an alternating list:

module L = struct type ('a,'b) t = [] | (::) of 'a * ('b,'a) t end;;

Trying to print a value of type _ L.t in the toplevel leads to

#  L.[ ([1;2]:int list); "3"; [4;5;6] ];;
- : (int list, string) L.t = L.:: ([1; 2], L.:: ("3", L.:: ([4; 5; 6], L.[])))

Similarly, after opening the module L:

# open L ;;
# [1;[];2;[];3];;
- : (int, ('a, 'b) t) t = :: (1, :: ([], :: (2, :: ([], :: (3, [])))))

In both examples, the printed value is not using a valid OCaml syntax. Fortunately, the last case can be fixed directly by escaping :: in identifiers as (::), which is done in the first commit of this PR.

However, for the first example, this change yields

#  L.[ ([1;2]:int list); "3"; [4;5;6] ];;
- : (int list, string) L.t = L.(::) ([1; 2], L.(::) ("3", L.(::) ([4; 5; 6], L.[])))

which is still not syntactically valid because L.(::) is not. Moreover, L.(::)( [1;2], L.[] ) cannot be mapped directly to an existing synctatic construction: L.[ [1;2]; "3"; [4;5;6] ] would lead to the wrong type for the inner lists [1;2] and [4;5;6]. To fix this issue, the third commit in this PR adds to the parser Mod.Long.Ident.(::)(_,_) as a valid pattern and expression, mirroring the existing syntax M.[].

With this change, the toplevel can now print exotic lists as proper ocaml values (even if the sugared form [a;…;z] is lost compared to standard lists).

@gasche
Copy link
Member

gasche commented Jul 16, 2017

Thanks for working on this, it is indeed an interesting problem.

Cold, I find it a bit surprising in 3b5e4ba that you need to handle :: specially. Why can it not be handled in the code logic dedicated to infix operators? How does M.(+) 1 2 work differently that it does not need this special treatment?

@gasche
Copy link
Member

gasche commented Jul 16, 2017

The most natural solution that I would expect is for M.[1; 2; 3] to be printed back as M.[1; 2; 3]. This is not what you implement. Have you considered this approach?

One justification for your approach is that there are AST nodes, whenever the list ends on a list expression that is not M.[], that cannot be put in this form. If I am not mistaken, such nodes cannot be produced by source programs, but they could be produced by AST-generating code. So I agree that your approach (to allow the standard function notation for a M.(::) infix operator) should be allowed, but I would still find it natural to have the resugaring done when possible.

Changes Outdated
@@ -9,6 +9,10 @@ Working version
can be used as a placeholder for a polymorphic function.
(Stephen Dolan)

- GPR#1247: M.(::) construction for expression and pattern
(and fix printing of (::) in toplevel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes specialist trick of the day: cutting lines early can avoid sentences starting with a parenthesis, to avoid confusion with the credit line. and pattern (plus fixing of (::) in toplevel).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea, fixed.

@gasche
Copy link
Member

gasche commented Jul 16, 2017

L.[ ([1;2]:int list); "3"; [4;5;6] ];;

When reading this I was at first confused as to why this is valid OCaml code -- I would expect [1; 2] to be resolve in the L scope and thus not build a list. If someone is equally confused: this is using type-directed disambiguation of constructors.

@Octachron
Copy link
Member Author

Octachron commented Jul 16, 2017

For the typing/oprint.ml part, neither (+) nor M.(+) 1 2 are values (or types) and thus they are never represented or printed as an outcome tree? An alternative will be to intercept :: in the construction of Oval_constr in toplevel/genprintval.ml.

Initially, I implemented the M.[ a; …; b] form but the problem is that this means that one has to detect all shadowing of constructors triggered by opening M and reverse them, if possible. For instance, with

     type t = A | B
     let x = A
     module M = struct type u = A | C type ('a,'b) t = [] | (::) of 'a * ('b,'a) t end

How to print without ambiguity M.[ x; A ] in the toplevel? To solve this issue straigthforwardly, it is needed to be able to precise the path of the constructors :: and [] and only them. This was already possible for [], but not for ::. This is the interest of the M.(::) form. The less than ideal form,
M.(::)(A, M.(::)(M.A, [])), has at least the merit of being very regular and extend quite naturally the existing syntax. A heavier solution might be to add a construction to precise the path of these constructors while retaining the [a ; …] syntax and only them, but then you are right that exotic terminators will be a problem.

For types that with exotic terminators, cons syntax could be an intermediary sugared form
a :: b :: c :: d :: Not_nil but there is still no way to precise the path of the constructor :: independently. So one will need a syntax for qualified operators (which would be really nice but I fear this would be a much more involved change than this PR).

@gasche
Copy link
Member

gasche commented Jul 16, 2017

I don't understand the "neither (+) nor M.(+) 1 2 are values (or types) and thus they are never represented or printed as an outcome tree?" comment: (+) 1 2 and Pervasives.(+) 1 2 are both valid input expressions, that get printed back as 1 + 2 and Pervasives.(+) 1 2. I don't understand what their outcometree representation is. That said, now I see a difference between the two cases, which is that (::) occurs in a constructor node whilte (+) occurs in an application node, so maybe the resugaring logic I had in mind is only present at application nodes.

I do understand the rest of the answer: basically resugaring is too hard here. Another option would be to keep trace of the sugaring in out-of-band attributes of the parsetree (I believe that some constructions already do this?). The fact that you already considered resugaring definitely suggests that going with what you propose now is a reasonable (if maybe not complete) step forward.

@Octachron
Copy link
Member Author

Octachron commented Jul 16, 2017

Note that I am speaking of the printing of output values in the toplevel:

# Pervasives.(+) 1 2;;
- : int = 3 (* ← i.e. this right-hand side of the toplevel output *) 

Sorry for the confusing description.

@gasche
Copy link
Member

gasche commented Jul 16, 2017

Oh. Sorry for missing this important aspect of the problem.

I agree that correctness (of the "can be printed back" specification) is the first concern here, so your approach does sound reasonable. I will do a code review, and approve the PR if I believe it is correct. However, I would like to get a third informed opinion on this problem -- it does seem that we are on a slippery slope.

Re. resugaring: it is impossible to always qualify the list elements in a way that makes resugaring correct, as opening the module may shadow some identifiers from the global scope that currently have no non-ambiguous qualified name. (Makes you wish for a ~M.<expr> construction to un-open a module...). We could, relying on much typing information, write a function that perfectly resugars or fails if impossible. However, my intuition would be that, in most cases, the list elements would not have to be printed differently because of the opened module (for example, M.(::)(A, M.(::)(1, M.[])), when A is not defined in M). Is it possible (and interestingly easier) to determine when that is the case, and resugar only then? I guess that in general the question is whether <expr> and M.<expr> are equivalent, and that sounds painful to implement...

typing/oprint.ml Outdated
@@ -22,11 +22,13 @@ let cautious f ppf arg =
try f ppf arg with
Ellipsis -> fprintf ppf "..."

let fix_ident = function "::" -> "(::)" | s -> s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is correct, but a more elegant approach would be to consider the type out_ident as if it was defined by the more informative

type out_ident =
  | Oide_apply of out_ident * out_ident
  | Oide_dot of out_ident * out_lident
  | Oide_ident of out_lident
and out_lident = string (* lident: short lowercase identifier *)

and have a print_lident ppf s function that internally applies the fix logic.

(There may be a better name than lident; in the parser lident/uident clearly refer to path components without dots, but in the typer "ident" rather refers to "long idents" that are in fact value paths. Maybe "name" would work, with the idea that a value path is a (possibly empty) module path followed by a value name, but it is less self-evident.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with print_lident since it seemed a good enough fit.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

@alainfrisch
Copy link
Contributor

I find it weird that in M.(::)(e1, e2), the module M is opened for e1 and e2. This is at odds with the idea that (::) is a valid constructor name, which would suggest that M.(::)(e1, e2) is interpreted as a simple constructor application with a qualified constructor name.

@alainfrisch
Copy link
Contributor

Moreover, I'm concerned that a change to the grammar does not impact the reference manual; but it seems that using (::) as a constructor name is not documented anyway, right?

@objmagic
Copy link
Contributor

@alainfrisch [] and :: have been documented: https://caml.inria.fr/pub/docs/manual-ocaml/typedecl.html

@Octachron
Copy link
Member Author

I find it weird that in M.(::)(e1, e2), the module M is opened for e1 and e2. This is at odds with the idea that (::) is a valid constructor name

I agree that it was the wrong behavior, I have fixed it (and the printing of M.(::) in pprintast.ml).

Concerning the documentation, maybe it would make sense to move the description of [] and (::) directly to the documentation of constr-name? I will check if they are now allowed in all context where constr-name is admitted.

@alainfrisch
Copy link
Contributor

[] and :: have been documented: https://caml.inria.fr/pub/docs/manual-ocaml/typedecl.html

Indeed, but not when used as constructors in expressions ((::)(1, 2)). Btw, I find it confusing that one can define a constructor (::) of arbitrary arity but not use it with the prefix syntax:

# type t = (::) of int * int * int;;
type t = (::) of int * int * int
# (::) (1, 2, 3);;
Characters 13-14:
  (::) (1, 2, 3);;
               ^
Error: Syntax error: operator expected.
# type t = (::);;
type t = (::)
# (::);;
Characters 4-6:
  (::);;
      ^^
Error: Syntax error: operator expected.

We should try to make it all more uniform, both in the grammar definition and in the parser. Ideally, M.(::)(e1, e2) would not require any special parsing rule. It would be the standard syntax for a constructor application.

@alainfrisch alainfrisch self-requested a review July 19, 2017 07:27
@Octachron
Copy link
Member Author

Octachron commented Jul 19, 2017

On one hand, I would think that the fact that not-binary :: constructor cannot be used with either the binary sugar form a::b::c … nor with the list syntax sugar [a;…;z] is an argument for rejecting such definition of non-binary :: constructor.

On the other hand, currently the parser spend some time and effort to special case ::. It would be more homogeneous to simply add the (::) and M.….(::) case to constr_longident and treat (::) as a standard constructor (see this commit for instance).

@alainfrisch
Copy link
Contributor

We should aim at uniformity and reducing special cases in the grammar and type-checker. If we decide that (::) can be used as a constructor in a user-defined sum type, there is no reason to impose any restriction on its arity, neither on the declaration side nor on the use side. I don't see why users would use this constructor if not to use the infix syntax (and thus of arity 2), but there is no point adding ad hoc restrictions here, and I'm sure one could find scenario where it would be useful to use a different arity (here is one: an automated tool that rewrites code to add an extra argument to each constructor, e.g. the callstack of its allocation point).

Note that this is coherent with infix binary operators. You can define let (+) = 42.

@xavierleroy
Copy link
Contributor

I feel this issue is consuming way too much brain power and developer time relative to its importance.

Do we really need (::) as a constructor? Is anyone using it? If I remember correctly, this syntax was introduced following the suggestion of the Coq folks, to facilitate the extraction (autogeneration) of Caml code. Yet Coq's extraction today doesn't generate (::)(a, b) because they special-cased Caml lists and their syntax.

@gasche
Copy link
Member

gasche commented Jul 19, 2017

A hack as cute ends up being reused, always, and it happens that this one is instrumental to having a nice syntax for heterogeneous lists using GADTs. See for example this type declaration in the excellent fuzzing library Crowbar (to be presented at the OCaml workshop!), and these uses in a fuzzer for PPrint (found no bugs).

@Drup
Copy link
Contributor

Drup commented Jul 19, 2017

The ability to redefine :: for custom list-like structures is very useful to create small DSLs. Also, the cat is out of the bag now. This has been available for several OCaml versions. There
was already a (long) discussion in #234 .

On the topic at hand, I agree with @alainfrisch : just behave exactly like normal operators and allow arbitrary arity. This is more uniform.

@Octachron
Copy link
Member Author

Note that this is coherent with infix binary operators. You can define let (+) = 42.

Good point, I agree that coherence with infix operators should be respected. Fixed.

@xavierleroy , I agree that the lack of polish of this PR on my end ended up taking to much time of the people involved, and I shall fix this point in the future ; but it seems reasonable that fixing polish issues generate some discussions.

@gasche
Copy link
Member

gasche commented Jul 19, 2017

It seems reasonable that fixing polish issues generate some discussions.

Especially the Polish issues about UTF8 internalization support.

@@ -2388,8 +2381,10 @@ val_longident:
;
constr_longident:
mod_longident %prec below_DOT { $1 }
| mod_longident DOT LPAREN COLONCOLON RPAREN { Ldot($1,"::") }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about mod_longident DOT LPAREN RPAREN and mod_longident DOT LBRACKET RBRACKET?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are currently interpreted as delimited local open, in other words they are mapped to (let open M in ()) or (let open M in []), which means that

    module M = struct end
    let ( ) = M.( )

is well typed but raises a warning(33) for unused open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a bit weird that the qualified syntax is not available for these constructors.

But interpreting M.[] as a qualified reference to constructor [] would be incompatible with the current interpretation of M.[e1; ...; en] (which is a local open). I now think it was a mistake to have this local open interpretation. It would have been better to map the expression above to:

  M.(::) e1(M.(::) e2 ..(M.(::) en M.[])

(interpreting M.(::) and M.[] as normal qualified references)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would seem to violate the principle of least astonishment.

@alainfrisch
Copy link
Contributor

I think this is a small but clear improvement to the parser. The grammar is simpler and more regular.

@gasche
Copy link
Member

gasche commented Jul 19, 2017

I thought about the "was it a mistake?" point as well but I don't like the idea (with the alternative) that M.[a; b; c] and M.([a; b; c]) have different interpretations. I'm afraid that the cat was out of the box when people decided to use M.(...) as syntactic sugar for local open.

@alainfrisch alainfrisch added this to the 4.06.0 milestone Jul 20, 2017
@mshinwell
Copy link
Contributor

@alainfrisch In which case, could you review the recent patches and then approve them, so this can be merged?

@alainfrisch
Copy link
Contributor

Ok for merging once the conflict is resolved.

@mshinwell
Copy link
Contributor

I'm unsure about @Octachron 's CLA status but I think this is small enough that it doesn't need a CLA. I took the liberty of fixing the conflict (was only in Changes) and am about to merge.

@mshinwell mshinwell merged commit 7d671fc into ocaml:trunk Aug 11, 2017
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
fix link to security blog post
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants