-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 'raw identifier' syntax #11252
Conversation
0b680a0
to
463589f
Compare
@gasche You expressed interest in this patch at some point, would you be willing / have time to review? |
Yes, during ICFP last September I realized that I had forgotten about the PR (I liked the RFC and ended up in the awkward position of encouraging you to send a PR about it), and then I quickly forgot about it again.
Yes!
No! But tomorrow we have a meeting about encouraging people to do right by reviewing more stuff, let us try to designate some volunteers there. So many people eager to help, it would be a shame if I stole this opportunity from them. |
I regret that this was allowed to slip off the radar for 5.0 - it should certainly be in 5.1 and if you're happy to rebase I'm happy to either review it or ensure it gets reviewed in time! |
Now rebased! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the code changes of this PR are sound.
My only interrogation is at the level of the specification. As of the current state of this PR:
- Module names and value constructors can never be raw identifiers (unlike what the original RFC suggests), because raw identifiers cannot be capitalized (e.g.
\#Foo
). - However, polymorphic variant constructors can be raw identifiers (e.g.
`\#true
).
(I thought polymorphic variant constructors were necessarily capitalized, but I checked and apparently they don’t need to be, i.e. This is discouraged, see my message below.`x
is valid.)
As a consequence, the simple rule “raw identifiers can be used whenever any non-capitalized identifier can be used” applies. I think this simple rule should be stated somewhere (probably in the manual) so that it is clear how to use raw identifiers.
| raw_ident_escape (lowercase identchar * as name) | ||
{ LIDENT name } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting to see the equivalent of this for uppercase
and UIDENT
. Without it, raw identifiers can only start with a lowercase character, and are a bit less general than what the RFC mentions:
This syntax can be used anywhere that foo can be used:
~\#foo
is a labelled argument,`\#Foo
is a polymorphic variant,'\#foo
is a type variable,\#Foo
is a module or constructor name, etc.
Arguably this is sufficient as OCaml has no capitalized keywords. I just want to make sure that’s what we want.
| `btrue | ||
| `bfalse ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to add cases for true
and false
in this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I understand. true
and false
are the only keywords that can be used as constructors, so we must be wary of not printing them as escaped (since, on the other hand, raw identifiers cannot be used as constructor names).
Using non-capitalized tag names is in fact not considered standard and may be explicitly disallowed in the future (see #12047). So maybe we don’t want to encourage the use of raw identifiers as tag names. |
I think that raw identifiers as tag names should have the same status as lowercase tag names i.e. undocumented but supported. Then if we ever remove lowercase tag names or make them official we can do the same with raw identifier tag names. For the same reasons that no one has managed to remove lowercase tag names -- they are actually widely used -- it's important to be able to use raw identifiers for them too. Otherwise when adding a keyword you risk not being able to use a library that uses that keyword as a tag name. |
I see, that makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this PR.
I find myself confused by some of the naming choices in the code, and I think that this is because there is a lack of clarity in the terminology. When we write the OCaml expression \#foo
, we may want to talk about \#foo
(the identifier as it appears in source code) and foo
(the internal payload / identifying information of the identifier). These two different things should have different names. Currently we both call them "identifiers", and then we have to remember to "protect" an identifier before printing it. This is the same distinction between (+)
and +
, (let!)
and let!
.
Then there is a similar question at the level of type variables, between 'a
and a
. (Is it exactly the same distinction, or a distinct distinction?)
I would propose to call \#foo
and (+)
identifiers, and foo
and +
the name of those identifiers. This feels natural to me, and it is consistent with the Rust documentation of raw identifiers (emphasis mine):
This is particularly useful when Rust introduces new keywords, and a library using an older edition of Rust has a variable or function with the same name as a keyword introduced in a newer edition.
If we like this terminology (other proposals are welcome), we should use it in the code, in particular protect_ident
could be ident_of_name
or just ident
(and be properly documented) and tyvar_name
could be tyvar_of_name
or just tyvar
.
@@ -93,7 +96,7 @@ Working version | |||
(Stephen Dolan, review by Xavier Leroy) | |||
|
|||
- #11418, #11708: RISC-V multicore support. | |||
(Nicolás Ojeda Bär, review by KC Sivaramakrishnan) | |||
(Nicolás Ojeda Bär, review by KuC Sivaramakrishnan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated typo.
@@ -95,16 +95,17 @@ let needs_parens txt = | |||
let needs_spaces txt = | |||
first_is '*' txt || last_is '*' txt | |||
|
|||
let string_loc ppf x = fprintf ppf "%s" x.txt | |||
|
|||
(* add parentheses to binders when they are in fact infix or prefix operators *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring should be updated, given that the feature scope was extended to also "protect" identifiers that are also keywords. Suggestion:
(* Turn an arbitrary identifier name into a lexically valid identifier. *)
I realize as I write this that I don't know what is the right terminology for these distinct objects (the "string payload of an identifier", which I call "identifier name" here, and the "identifier as it appears in the source file", which I call a "lexically valid identifier'.)
@@ -268,16 +273,21 @@ let iter_loc f ctxt {txt; loc = _} = f ctxt txt | |||
|
|||
let constant_string f s = pp f "%S" s | |||
|
|||
let tyvar ppf s = | |||
let tyvar_name s = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the new function name tyvar_name
confusing. If I was asked to guess what is the "name" of a tyvar, I would say that foo
is the name of the tyvar 'foo
. This function is in fact doing the opposite thing.
@@ -684,7 +686,8 @@ open Printtyp | |||
|
|||
let report_error env ppf = function | |||
| Unbound_type_variable name -> | |||
let add_name name _ l = if name = "_" then l else ("'" ^ name) :: l in | |||
let add_name name _ l = | |||
if name = "_" then l else Pprintast.tyvar_name name :: l in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this logic (being a no-op on "_"
) should be added to tyvar_name
.
@OlivierNicole I remember that we discussed this PR and formed a cunning plan, but I don't remember what the plan was. You were supposed to resubmit a rebased/updated version of this PR, and I would review it; or maybe the other way around? Do you remember, which one do you prefer now? (If you resubmit and I review, we don't need to find yet another maintainer for approval before merging. But then if you prefer to keep reviewing that is also fine by me.) |
I can rebase and update it within a few days. |
This has been rebased as an independent PR #12323, which is now merged. Closing. Thanks @stedolan for the nice RFC and implementation, and @OlivierNicole for going the last mile. |
This PR implements part of RFC 27 by adding support for 'raw identifiers'
\#foo
. The syntax\#foo
means the same thing asfoo
, but is always an identifier even whenfoo
is a keyword. This allows future versions of the language to introduce new keywords in a backwards-compatible way (see the RFC for full details).The important part of this patch is the small patch to
lexer.mll
which adds the new concrete syntax. The entire rest of the patch is changes to various pretty-printers, to ensure that keywords are printed in appropriately escaped form should they ever occur as variable names, in types, in error messages, and the like.