-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M.(::) syntax and printing exotic lists in the toplevel. #1247
Conversation
Thanks for working on this, it is indeed an interesting problem. Cold, I find it a bit surprising in 3b5e4ba that you need to handle |
The most natural solution that I would expect is for One justification for your approach is that there are AST nodes, whenever the list ends on a list expression that is not |
Changes
Outdated
@@ -9,6 +9,10 @@ Working version | |||
can be used as a placeholder for a polymorphic function. | |||
(Stephen Dolan) | |||
|
|||
- GPR#1247: M.(::) construction for expression and pattern | |||
(and fix printing of (::) in toplevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes specialist trick of the day: cutting lines early can avoid sentences starting with a parenthesis, to avoid confusion with the credit line. and pattern (plus fixing of (::) in toplevel)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea, fixed.
L.[ ([1;2]:int list); "3"; [4;5;6] ];; When reading this I was at first confused as to why this is valid OCaml code -- I would expect |
For the Initially, I implemented the type t = A | B
let x = A
module M = struct type u = A | C type ('a,'b) t = [] | (::) of 'a * ('b,'a) t end How to print without ambiguity For types that with exotic terminators, cons syntax could be an intermediary sugared form |
I don't understand the "neither I do understand the rest of the answer: basically resugaring is too hard here. Another option would be to keep trace of the sugaring in out-of-band attributes of the parsetree (I believe that some constructions already do this?). The fact that you already considered resugaring definitely suggests that going with what you propose now is a reasonable (if maybe not complete) step forward. |
Note that I am speaking of the printing of output values in the toplevel: # Pervasives.(+) 1 2;;
- : int = 3 (* ← i.e. this right-hand side of the toplevel output *) Sorry for the confusing description. |
Oh. Sorry for missing this important aspect of the problem. I agree that correctness (of the "can be printed back" specification) is the first concern here, so your approach does sound reasonable. I will do a code review, and approve the PR if I believe it is correct. However, I would like to get a third informed opinion on this problem -- it does seem that we are on a slippery slope. Re. resugaring: it is impossible to always qualify the list elements in a way that makes resugaring correct, as opening the module may shadow some identifiers from the global scope that currently have no non-ambiguous qualified name. (Makes you wish for a |
typing/oprint.ml
Outdated
@@ -22,11 +22,13 @@ let cautious f ppf arg = | |||
try f ppf arg with | |||
Ellipsis -> fprintf ppf "..." | |||
|
|||
let fix_ident = function "::" -> "(::)" | s -> s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is correct, but a more elegant approach would be to consider the type out_ident
as if it was defined by the more informative
type out_ident =
| Oide_apply of out_ident * out_ident
| Oide_dot of out_ident * out_lident
| Oide_ident of out_lident
and out_lident = string (* lident: short lowercase identifier *)
and have a print_lident ppf s
function that internally applies the fix
logic.
(There may be a better name than lident
; in the parser lident/uident clearly refer to path components without dots, but in the typer "ident" rather refers to "long idents" that are in fact value paths. Maybe "name" would work, with the idea that a value path is a (possibly empty) module path followed by a value name, but it is less self-evident.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with print_lident
since it seemed a good enough fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
I find it weird that in |
Moreover, I'm concerned that a change to the grammar does not impact the reference manual; but it seems that using |
@alainfrisch |
I agree that it was the wrong behavior, I have fixed it (and the printing of Concerning the documentation, maybe it would make sense to move the description of |
Indeed, but not when used as constructors in expressions ( # type t = (::) of int * int * int;;
type t = (::) of int * int * int
# (::) (1, 2, 3);;
Characters 13-14:
(::) (1, 2, 3);;
^
Error: Syntax error: operator expected.
# type t = (::);;
type t = (::)
# (::);;
Characters 4-6:
(::);;
^^
Error: Syntax error: operator expected. We should try to make it all more uniform, both in the grammar definition and in the parser. Ideally, |
On one hand, I would think that the fact that not-binary On the other hand, currently the parser spend some time and effort to special case |
We should aim at uniformity and reducing special cases in the grammar and type-checker. If we decide that Note that this is coherent with infix binary operators. You can define |
I feel this issue is consuming way too much brain power and developer time relative to its importance. Do we really need |
A hack as cute ends up being reused, always, and it happens that this one is instrumental to having a nice syntax for heterogeneous lists using GADTs. See for example this type declaration in the excellent fuzzing library Crowbar (to be presented at the OCaml workshop!), and these uses in a fuzzer for PPrint (found no bugs). |
The ability to redefine On the topic at hand, I agree with @alainfrisch : just behave exactly like normal operators and allow arbitrary arity. This is more uniform. |
Good point, I agree that coherence with infix operators should be respected. Fixed. @xavierleroy , I agree that the lack of polish of this PR on my end ended up taking to much time of the people involved, and I shall fix this point in the future ; but it seems reasonable that fixing polish issues generate some discussions. |
Especially the Polish issues about UTF8 internalization support. |
42930cb
to
69c4e1c
Compare
@@ -2388,8 +2381,10 @@ val_longident: | |||
; | |||
constr_longident: | |||
mod_longident %prec below_DOT { $1 } | |||
| mod_longident DOT LPAREN COLONCOLON RPAREN { Ldot($1,"::") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about mod_longident DOT LPAREN RPAREN
and mod_longident DOT LBRACKET RBRACKET
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are currently interpreted as delimited local open, in other words they are mapped to (let open M in ())
or (let open M in [])
, which means that
module M = struct end
let ( ) = M.( )
is well typed but raises a warning(33) for unused open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's a bit weird that the qualified syntax is not available for these constructors.
But interpreting M.[]
as a qualified reference to constructor []
would be incompatible with the current interpretation of M.[e1; ...; en]
(which is a local open). I now think it was a mistake to have this local open interpretation. It would have been better to map the expression above to:
M.(::) e1(M.(::) e2 ..(M.(::) en M.[])
(interpreting M.(::) and M.[] as normal qualified references)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would seem to violate the principle of least astonishment.
I think this is a small but clear improvement to the parser. The grammar is simpler and more regular. |
I thought about the "was it a mistake?" point as well but I don't like the idea (with the alternative) that |
69c4e1c
to
570c31f
Compare
@alainfrisch In which case, could you review the recent patches and then approve them, so this can be merged? |
Ok for merging once the conflict is resolved. |
I'm unsure about @Octachron 's CLA status but I think this is small enough that it doesn't need a CLA. I took the liberty of fixing the conflict (was only in Changes) and am about to merge. |
fix link to security blog post
The initial aim of this PR was to fix the printing of exotic lists in the toplevel.
For instance, consider the following silly example of an alternating list:
Trying to print a value of type
_ L.t
in the toplevel leads toSimilarly, after opening the module
L
:In both examples, the printed value is not using a valid OCaml syntax. Fortunately, the last case can be fixed directly by escaping
::
in identifiers as(::)
, which is done in the first commit of this PR.However, for the first example, this change yields
which is still not syntactically valid because
L.(::)
is not. Moreover,L.(::)( [1;2], L.[] )
cannot be mapped directly to an existing synctatic construction:L.[ [1;2]; "3"; [4;5;6] ]
would lead to the wrong type for the inner lists[1;2]
and[4;5;6]
. To fix this issue, the third commit in this PR adds to the parserMod.Long.Ident.(::)(_,_)
as a valid pattern and expression, mirroring the existing syntaxM.[]
.With this change, the toplevel can now print exotic lists as proper ocaml values (even if the sugared form
[a;…;z]
is lost compared to standard lists).