M.(::) syntax and printing exotic lists in the toplevel. #1247

Octachron · 2017-07-16T21:12:05Z

The initial aim of this PR was to fix the printing of exotic lists in the toplevel.
For instance, consider the following silly example of an alternating list:

module L = struct type ('a,'b) t = [] | (::) of 'a * ('b,'a) t end;;

Trying to print a value of type _ L.t in the toplevel leads to

#  L.[ ([1;2]:int list); "3"; [4;5;6] ];;
- : (int list, string) L.t = L.:: ([1; 2], L.:: ("3", L.:: ([4; 5; 6], L.[])))

Similarly, after opening the module L:

# open L ;;
# [1;[];2;[];3];;
- : (int, ('a, 'b) t) t = :: (1, :: ([], :: (2, :: ([], :: (3, [])))))

In both examples, the printed value is not using a valid OCaml syntax. Fortunately, the last case can be fixed directly by escaping :: in identifiers as (::), which is done in the first commit of this PR.

However, for the first example, this change yields

#  L.[ ([1;2]:int list); "3"; [4;5;6] ];;
- : (int list, string) L.t = L.(::) ([1; 2], L.(::) ("3", L.(::) ([4; 5; 6], L.[])))

which is still not syntactically valid because L.(::) is not. Moreover, L.(::)( [1;2], L.[] ) cannot be mapped directly to an existing synctatic construction: L.[ [1;2]; "3"; [4;5;6] ] would lead to the wrong type for the inner lists [1;2] and [4;5;6]. To fix this issue, the third commit in this PR adds to the parser Mod.Long.Ident.(::)(_,_) as a valid pattern and expression, mirroring the existing syntax M.[].

With this change, the toplevel can now print exotic lists as proper ocaml values (even if the sugared form [a;…;z] is lost compared to standard lists).

gasche · 2017-07-16T21:40:02Z

Thanks for working on this, it is indeed an interesting problem.

Cold, I find it a bit surprising in 3b5e4ba that you need to handle :: specially. Why can it not be handled in the code logic dedicated to infix operators? How does M.(+) 1 2 work differently that it does not need this special treatment?

gasche · 2017-07-16T21:43:50Z

The most natural solution that I would expect is for M.[1; 2; 3] to be printed back as M.[1; 2; 3]. This is not what you implement. Have you considered this approach?

One justification for your approach is that there are AST nodes, whenever the list ends on a list expression that is not M.[], that cannot be put in this form. If I am not mistaken, such nodes cannot be produced by source programs, but they could be produced by AST-generating code. So I agree that your approach (to allow the standard function notation for a M.(::) infix operator) should be allowed, but I would still find it natural to have the resugaring done when possible.

gasche · 2017-07-16T21:45:05Z

Changes

@@ -9,6 +9,10 @@ Working version
  can be used as a placeholder for a polymorphic function.
  (Stephen Dolan)

+- GPR#1247: M.(::) construction for expression and pattern
+  (and fix printing of (::) in toplevel)


Changes specialist trick of the day: cutting lines early can avoid sentences starting with a parenthesis, to avoid confusion with the credit line. and pattern (plus fixing of (::) in toplevel).

Thanks for the idea, fixed.

gasche · 2017-07-16T21:46:53Z

L.[ ([1;2]:int list); "3"; [4;5;6] ];;

When reading this I was at first confused as to why this is valid OCaml code -- I would expect [1; 2] to be resolve in the L scope and thus not build a list. If someone is equally confused: this is using type-directed disambiguation of constructors.

Octachron · 2017-07-16T22:16:17Z

For the typing/oprint.ml part, neither (+) nor M.(+) 1 2 are values (or types) and thus they are never represented or printed as an outcome tree? An alternative will be to intercept :: in the construction of Oval_constr in toplevel/genprintval.ml.

Initially, I implemented the M.[ a; …; b] form but the problem is that this means that one has to detect all shadowing of constructors triggered by opening M and reverse them, if possible. For instance, with

     type t = A | B
     let x = A
     module M = struct type u = A | C type ('a,'b) t = [] | (::) of 'a * ('b,'a) t end

How to print without ambiguity M.[ x; A ] in the toplevel? To solve this issue straigthforwardly, it is needed to be able to precise the path of the constructors :: and [] and only them. This was already possible for [], but not for ::. This is the interest of the M.(::) form. The less than ideal form,
M.(::)(A, M.(::)(M.A, [])), has at least the merit of being very regular and extend quite naturally the existing syntax. A heavier solution might be to add a construction to precise the path of these constructors while retaining the [a ; …] syntax and only them, but then you are right that exotic terminators will be a problem.

For types that with exotic terminators, cons syntax could be an intermediary sugared form
a :: b :: c :: d :: Not_nil but there is still no way to precise the path of the constructor :: independently. So one will need a syntax for qualified operators (which would be really nice but I fear this would be a much more involved change than this PR).

gasche · 2017-07-16T22:24:59Z

I don't understand the "neither (+) nor M.(+) 1 2 are values (or types) and thus they are never represented or printed as an outcome tree?" comment: (+) 1 2 and Pervasives.(+) 1 2 are both valid input expressions, that get printed back as 1 + 2 and Pervasives.(+) 1 2. I don't understand what their outcometree representation is. That said, now I see a difference between the two cases, which is that (::) occurs in a constructor node whilte (+) occurs in an application node, so maybe the resugaring logic I had in mind is only present at application nodes.

I do understand the rest of the answer: basically resugaring is too hard here. Another option would be to keep trace of the sugaring in out-of-band attributes of the parsetree (I believe that some constructions already do this?). The fact that you already considered resugaring definitely suggests that going with what you propose now is a reasonable (if maybe not complete) step forward.

Octachron · 2017-07-16T22:32:47Z

Note that I am speaking of the printing of output values in the toplevel:

# Pervasives.(+) 1 2;;
- : int = 3 (* ← i.e. this right-hand side of the toplevel output *)

Sorry for the confusing description.

gasche · 2017-07-16T23:02:28Z

Oh. Sorry for missing this important aspect of the problem.

I agree that correctness (of the "can be printed back" specification) is the first concern here, so your approach does sound reasonable. I will do a code review, and approve the PR if I believe it is correct. However, I would like to get a third informed opinion on this problem -- it does seem that we are on a slippery slope.

Re. resugaring: it is impossible to always qualify the list elements in a way that makes resugaring correct, as opening the module may shadow some identifiers from the global scope that currently have no non-ambiguous qualified name. (Makes you wish for a ~M.<expr> construction to un-open a module...). We could, relying on much typing information, write a function that perfectly resugars or fails if impossible. However, my intuition would be that, in most cases, the list elements would not have to be printed differently because of the opened module (for example, M.(::)(A, M.(::)(1, M.[])), when A is not defined in M). Is it possible (and interestingly easier) to determine when that is the case, and resugar only then? I guess that in general the question is whether <expr> and M.<expr> are equivalent, and that sounds painful to implement...

gasche · 2017-07-16T23:36:34Z

typing/oprint.ml

@@ -22,11 +22,13 @@ let cautious f ppf arg =
  try f ppf arg with
    Ellipsis -> fprintf ppf "..."

+let fix_ident = function "::" -> "(::)" | s -> s


This approach is correct, but a more elegant approach would be to consider the type out_ident as if it was defined by the more informative

type out_ident = | Oide_apply of out_ident * out_ident | Oide_dot of out_ident * out_lident | Oide_ident of out_lident and out_lident = string (* lident: short lowercase identifier *)

and have a print_lident ppf s function that internally applies the fix logic.

(There may be a better name than lident; in the parser lident/uident clearly refer to path components without dots, but in the typer "ident" rather refers to "long idents" that are in fact value paths. Maybe "name" would work, with the idea that a value path is a (possibly empty) module path followed by a value name, but it is less self-evident.)

I went with print_lident since it seemed a good enough fit.

gasche

Thanks for the changes.

alainfrisch · 2017-07-18T21:30:11Z

I find it weird that in M.(::)(e1, e2), the module M is opened for e1 and e2. This is at odds with the idea that (::) is a valid constructor name, which would suggest that M.(::)(e1, e2) is interpreted as a simple constructor application with a qualified constructor name.

alainfrisch · 2017-07-18T21:32:20Z

Moreover, I'm concerned that a change to the grammar does not impact the reference manual; but it seems that using (::) as a constructor name is not documented anyway, right?

objmagic · 2017-07-18T22:34:22Z

@alainfrisch [] and :: have been documented: https://caml.inria.fr/pub/docs/manual-ocaml/typedecl.html

Octachron · 2017-07-18T22:41:36Z

I find it weird that in M.(::)(e1, e2), the module M is opened for e1 and e2. This is at odds with the idea that (::) is a valid constructor name

I agree that it was the wrong behavior, I have fixed it (and the printing of M.(::) in pprintast.ml).

Concerning the documentation, maybe it would make sense to move the description of [] and (::) directly to the documentation of constr-name? I will check if they are now allowed in all context where constr-name is admitted.

alainfrisch · 2017-07-19T07:26:41Z

[] and :: have been documented: https://caml.inria.fr/pub/docs/manual-ocaml/typedecl.html

Indeed, but not when used as constructors in expressions ((::)(1, 2)). Btw, I find it confusing that one can define a constructor (::) of arbitrary arity but not use it with the prefix syntax:

# type t = (::) of int * int * int;;
type t = (::) of int * int * int
# (::) (1, 2, 3);;
Characters 13-14:
  (::) (1, 2, 3);;
               ^
Error: Syntax error: operator expected.
# type t = (::);;
type t = (::)
# (::);;
Characters 4-6:
  (::);;
      ^^
Error: Syntax error: operator expected.

We should try to make it all more uniform, both in the grammar definition and in the parser. Ideally, M.(::)(e1, e2) would not require any special parsing rule. It would be the standard syntax for a constructor application.

Octachron · 2017-07-19T10:23:02Z

On one hand, I would think that the fact that not-binary :: constructor cannot be used with either the binary sugar form a::b::c … nor with the list syntax sugar [a;…;z] is an argument for rejecting such definition of non-binary :: constructor.

On the other hand, currently the parser spend some time and effort to special case ::. It would be more homogeneous to simply add the (::) and M.….(::) case to constr_longident and treat (::) as a standard constructor (see this commit for instance).

alainfrisch · 2017-07-19T12:15:36Z

We should aim at uniformity and reducing special cases in the grammar and type-checker. If we decide that (::) can be used as a constructor in a user-defined sum type, there is no reason to impose any restriction on its arity, neither on the declaration side nor on the use side. I don't see why users would use this constructor if not to use the infix syntax (and thus of arity 2), but there is no point adding ad hoc restrictions here, and I'm sure one could find scenario where it would be useful to use a different arity (here is one: an automated tool that rewrites code to add an extra argument to each constructor, e.g. the callstack of its allocation point).

Note that this is coherent with infix binary operators. You can define let (+) = 42.

xavierleroy · 2017-07-19T12:20:10Z

I feel this issue is consuming way too much brain power and developer time relative to its importance.

Do we really need (::) as a constructor? Is anyone using it? If I remember correctly, this syntax was introduced following the suggestion of the Coq folks, to facilitate the extraction (autogeneration) of Caml code. Yet Coq's extraction today doesn't generate (::)(a, b) because they special-cased Caml lists and their syntax.

gasche · 2017-07-19T12:27:09Z

A hack as cute ends up being reused, always, and it happens that this one is instrumental to having a nice syntax for heterogeneous lists using GADTs. See for example this type declaration in the excellent fuzzing library Crowbar (to be presented at the OCaml workshop!), and these uses in a fuzzer for PPrint (found no bugs).

Drup · 2017-07-19T12:36:57Z

The ability to redefine :: for custom list-like structures is very useful to create small DSLs. Also, the cat is out of the bag now. This has been available for several OCaml versions. There
was already a (long) discussion in #234 .

On the topic at hand, I agree with @alainfrisch : just behave exactly like normal operators and allow arbitrary arity. This is more uniform.

Octachron · 2017-07-19T12:58:12Z

Note that this is coherent with infix binary operators. You can define let (+) = 42.

Good point, I agree that coherence with infix operators should be respected. Fixed.

@xavierleroy , I agree that the lack of polish of this PR on my end ended up taking to much time of the people involved, and I shall fix this point in the future ; but it seems reasonable that fixing polish issues generate some discussions.

gasche · 2017-07-19T13:04:35Z

It seems reasonable that fixing polish issues generate some discussions.

Especially the Polish issues about UTF8 internalization support.

alainfrisch · 2017-07-19T15:17:35Z

parsing/parser.mly

@@ -2388,8 +2381,10 @@ val_longident:
 ;
 constr_longident:
    mod_longident       %prec below_DOT         { $1 }
+  | mod_longident DOT LPAREN COLONCOLON RPAREN  { Ldot($1,"::") }


What about mod_longident DOT LPAREN RPAREN and mod_longident DOT LBRACKET RBRACKET?

They are currently interpreted as delimited local open, in other words they are mapped to (let open M in ()) or (let open M in []), which means that

module M = struct end let ( ) = M.( )

is well typed but raises a warning(33) for unused open.

Well, it's a bit weird that the qualified syntax is not available for these constructors.

But interpreting M.[] as a qualified reference to constructor [] would be incompatible with the current interpretation of M.[e1; ...; en] (which is a local open). I now think it was a mistake to have this local open interpretation. It would have been better to map the expression above to:

M.(::) e1(M.(::) e2 ..(M.(::) en M.[])

(interpreting M.(::) and M.[] as normal qualified references)

That would seem to violate the principle of least astonishment.

alainfrisch · 2017-07-19T15:23:23Z

I think this is a small but clear improvement to the parser. The grammar is simpler and more regular.

gasche · 2017-07-19T16:29:28Z

I thought about the "was it a mistake?" point as well but I don't like the idea (with the alternative) that M.[a; b; c] and M.([a; b; c]) have different interpretations. I'm afraid that the cat was out of the box when people decided to use M.(...) as syntactic sugar for local open.

mshinwell · 2017-08-10T12:16:30Z

@alainfrisch In which case, could you review the recent patches and then approve them, so this can be merged?

alainfrisch · 2017-08-10T17:36:59Z

Ok for merging once the conflict is resolved.

mshinwell · 2017-08-11T09:10:34Z

I'm unsure about @Octachron 's CLA status but I think this is small enough that it doesn't need a CLA. I took the liberty of fixing the conflict (was only in Changes) and am about to merge.

fix link to security blog post

gasche reviewed Jul 16, 2017

View reviewed changes

gasche approved these changes Jul 16, 2017

View reviewed changes

gasche approved these changes Jul 17, 2017

View reviewed changes

alainfrisch self-requested a review July 19, 2017 07:27

Octachron force-pushed the oprint_exotic_list branch from 42930cb to 69c4e1c Compare July 19, 2017 13:05

alainfrisch reviewed Jul 19, 2017

View reviewed changes

Octachron added 3 commits July 20, 2017 14:00

oprint: do not print "(::)" as "::"

a6452ca

test the printing of exotic lists in the toplevel

529b3ca

make M.(::) parsable

60f6fd6

update changes

570c31f

Octachron force-pushed the oprint_exotic_list branch from 69c4e1c to 570c31f Compare July 20, 2017 13:20

Merge branch 'trunk' into oprint_exotic_list

f31e125

alainfrisch added this to the 4.06.0 milestone Jul 20, 2017

mshinwell added 2 commits August 11, 2017 10:09

Merge with trunk

c0562d3

Merge with trunk

8170e02

mshinwell merged commit 7d671fc into ocaml:trunk Aug 11, 2017

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Update 2023-05-31-opam-2-1-5.md (ocaml#1247)

589f602

fix link to security blog post

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M.(::) syntax and printing exotic lists in the toplevel. #1247

M.(::) syntax and printing exotic lists in the toplevel. #1247

Octachron commented Jul 16, 2017

gasche commented Jul 16, 2017 •

edited

Loading

gasche commented Jul 16, 2017

gasche Jul 16, 2017

Octachron Jul 17, 2017

gasche commented Jul 16, 2017

Octachron commented Jul 16, 2017 •

edited

Loading

gasche commented Jul 16, 2017

Octachron commented Jul 16, 2017 •

edited

Loading

gasche commented Jul 16, 2017

gasche Jul 16, 2017

Octachron Jul 17, 2017

gasche left a comment

alainfrisch commented Jul 18, 2017

alainfrisch commented Jul 18, 2017

objmagic commented Jul 18, 2017

Octachron commented Jul 18, 2017

alainfrisch commented Jul 19, 2017

Octachron commented Jul 19, 2017 •

edited

Loading

alainfrisch commented Jul 19, 2017

xavierleroy commented Jul 19, 2017

gasche commented Jul 19, 2017

Drup commented Jul 19, 2017 •

edited

Loading

Octachron commented Jul 19, 2017

gasche commented Jul 19, 2017

alainfrisch Jul 19, 2017

Octachron Jul 19, 2017

alainfrisch Jul 19, 2017

damiendoligez Jul 19, 2017

alainfrisch commented Jul 19, 2017

gasche commented Jul 19, 2017

mshinwell commented Aug 10, 2017

alainfrisch commented Aug 10, 2017

mshinwell commented Aug 11, 2017

M.(::) syntax and printing exotic lists in the toplevel. #1247

M.(::) syntax and printing exotic lists in the toplevel. #1247

Conversation

Octachron commented Jul 16, 2017

gasche commented Jul 16, 2017 • edited Loading

gasche commented Jul 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche commented Jul 16, 2017

Octachron commented Jul 16, 2017 • edited Loading

gasche commented Jul 16, 2017

Octachron commented Jul 16, 2017 • edited Loading

gasche commented Jul 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche left a comment

Choose a reason for hiding this comment

alainfrisch commented Jul 18, 2017

alainfrisch commented Jul 18, 2017

objmagic commented Jul 18, 2017

Octachron commented Jul 18, 2017

alainfrisch commented Jul 19, 2017

Octachron commented Jul 19, 2017 • edited Loading

alainfrisch commented Jul 19, 2017

xavierleroy commented Jul 19, 2017

gasche commented Jul 19, 2017

Drup commented Jul 19, 2017 • edited Loading

Octachron commented Jul 19, 2017

gasche commented Jul 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alainfrisch commented Jul 19, 2017

gasche commented Jul 19, 2017

mshinwell commented Aug 10, 2017

alainfrisch commented Aug 10, 2017

mshinwell commented Aug 11, 2017

gasche commented Jul 16, 2017 •

edited

Loading

Octachron commented Jul 16, 2017 •

edited

Loading

Octachron commented Jul 16, 2017 •

edited

Loading

Octachron commented Jul 19, 2017 •

edited

Loading

Drup commented Jul 19, 2017 •

edited

Loading