Add a -keywords <version?+list> flag#13471
Conversation
|
I'm fine with the proposed design, but haven't done a review yet. I wonder if we should ask for I discussed with @Octachron the possibility of supporting |
| | Unknown_keyword name -> | ||
| Location.errorf ~loc | ||
| "%a has been marked as a future keyword,@ \ | ||
| but this version of OCaml cannot handle it." |
|
|
||
| let keyword_table = | ||
| let keyword_table_all = | ||
| create_hashtable 149 [ |
There was a problem hiding this comment.
Could this be an association list, now that it isn't used in a performance-sensitive way anymore? It would help clarify what is mutable via configuration and what is not.
| Hashtbl.remove keyword_table "effect" | ||
| ) v | ||
|
|
||
| let parse_keyword_edition s = |
There was a problem hiding this comment.
Not fond of the overall logic design here. The Lexer is a busy enough place, with a lot of subtle stuff, and now we are adding configuration parsing and interpretation inside it. Here would be my preferred API:
- when we initialize the lexer, we can pass it a list or set of keywords
(this could be an argument toLexer.init, or an alternate initializer for compatibility-reason) - we have code somewhere else in the compiler codebase, for example in our beloved utils/misc.ml module (a submodule Keyword_edition?) that parses the configured edition and turns it into a list or set of supported keywords
If I understand correctly, the logic in the lexer initialization code would then be to populate its keyword table from an immutable keyword->token association list and an (optional) keyword list: for each keyword in the keyword list, add them to the keyword table, with None if they are missing from the association list. This is much less code in lexer.mll than you currently have, and I like it.
There was a problem hiding this comment.
The flag parsing logic could certainly be moved to Clflags, and having the list of keywords as an optional argument of Lexer.init would work too.
| begin match Option.map parse_keyword_edition !Clflags.keyword_edition with | ||
| | None -> () | ||
| | Some (edition,(remove,add)) -> set_keyword_edition ~remove ~add edition | ||
| | Some (edition, add) -> set_keyword_edition ~add edition |
There was a problem hiding this comment.
Note: if we are to review commit-by-commit, I would encourage you to squash both commits so that I only see the simpler version. (You can extract the support for keyword removal as a separate patch by just using git revert to this one, and then keep it around in a separate branch in case the people ask for it again.)
| "<version+list> set keywords following the <version+list> spec:\n | ||
| \ -<version> if present specifies the base set of keywords\n | ||
| \ (if absent the current set of keywords is used) | ||
| \ -<list> is a \"+\"-separated list of keywords to add to\n |
There was a problem hiding this comment.
the spacing looks a bit odd here, I think the second list item is one space to the right of the first item
| Without an explicit version number, the base set of keywords is the | ||
| set of keywords in the current version of OCaml. | ||
| Supplementary keywords that does not match any known keyword in the current | ||
| version of the compiler triggers an error whenever they are present in the |
There was a problem hiding this comment.
Supplementary keywords that do not match ... trigger an error whenever they are present ...
| set of keywords in the current version of OCaml. | ||
| Supplementary keywords that does not match any known keyword in the current | ||
| version of the compiler triggers an error whenever they are present in the | ||
| source code. |
| let parse_version s = | ||
| let bad_version () = | ||
| raise (Arg.Bad "Ill-formed version in keywords flag,\n\ | ||
| the expected format is %d.%d .") |
There was a problem hiding this comment.
It took me a while to understand that %d.%d is not a format literal here, it is something you would show to the user. I would rather use: is <major>.<minor>, for example 4.12.
There was a problem hiding this comment.
Note: I would use "the supported format", in case some future compiler releases starts accepting more (or less) formats. Maybe the format provided by the user is not wrong (for the up-to-date version of the compiler), we just don't support it yet.
| if v < (4,2) then | ||
| Hashtbl.remove keyword_table "nonrec"; | ||
| if v < (5, 3) then | ||
| Hashtbl.remove keyword_table "effect" |
There was a problem hiding this comment.
Note: if you do refactor this code to simply return a list/set of supported keywords, you will probably rewrite this piece of code, and maybe the result will be nicer. (Ideally the natural approach reads like keywords from newer versions are added, rather than removed.)
| | Unknown_keyword name -> | ||
| Location.errorf ~loc | ||
| "%a has been defined as an additional keyword.@ \ | ||
| This version of OCaml does not support this keyword." |
There was a problem hiding this comment.
We could even do "The current version of OCaml (%s)@ [...]" Sys.ocaml_version?
| match Hashtbl.find keyword_table name with | ||
| | Some x -> x | ||
| | None -> error lexbuf (Unknown_keyword name) | ||
| | exception Not_found -> LIDENT name |
There was a problem hiding this comment.
This might be clearer with a custom type:
let find_keyword lexbuf name =
match Hashtbl.find keyword_table name with
| Known x -> x
| Future -> error lexbuf (Unknown_keyword name)
| exception Not_found -> LIDENT name
gasche
left a comment
There was a problem hiding this comment.
I was hoping to move even more logic out of the lexer (the interpretation of the configuration language, not just the parsing of its syntax, and in particular almost everything related to reasoning on OCaml versions), but I think that you chose the current approach to avoid duplicating the list of keywords in two different places. Those who do the work decide. Approved.
(Maybe the UI would deserve some scrutiny by other people. Did you ping the original thread to point at this PR?)
kit-ty-kate
left a comment
There was a problem hiding this comment.
While that option looks really cool, i'm personally a bit skeptic about its usefulness in practice.
As far as i've seen so far during this upgrade session for OCaml 5.3 (which introduced the effect keyword), changes due to the introduction of a new keyword are fairly straightforward so i doubt active package maintainers would use it.
So in practice i can only see it being used for ancient software that someone want to compile with a more recent version of the compiler. However for that case i would argue that most of the breaking changes are caused by the standard library, C API or compiler-libs (for stubs or larger software), and sometimes type-checker (in that order). Comparatively these breakages are infinitely more time-consuming to deal with than a couple of new keywords.
The only use-case for this that i can see so far would be for self-contained ocaml scripts (think setup.ml from the late OASIS). However i feel like the number of such scripts in the wild have dwindled in the past 5 years or so. So i would expect very few people to use it.
So my main question is: what is the target use-case?
I can see some parallels with the concept of Editions in Rust. However while i can see the use-case for Editions which:
- includes more than just keywords
- doesn't really need to bother with the standard library (i'm not sure how it works but at least that's what they say)
- is set by default in every new project by cargo
I wonder if instead of calling it -keywords, something more general like -compat with extra features such as:
- limiting access to the standard library functions or modules according to their
@sincedeclaration to avoid issues with shadowing when usingopen - tying future big changes like
safe-stringwas
would actually be used more widely. I for one, would be happy to use that.
| in | ||
| List.iter add_keyword all_keywords; | ||
| List.iter (fun name -> | ||
| match List.find (fun (n,_,_) -> n = name) all_keywords with |
There was a problem hiding this comment.
| match List.find (fun (n,_,_) -> n = name) all_keywords with | |
| match List.find (fun (n,_,_) -> n = (name : string)) all_keywords with |
There was a problem hiding this comment.
Since this doesn't change the comparison used (in both cases), I will keep the code as it is for now.
|
@kit-ty-kate , I have mostly two use cases in mind:
Note that this PR has been mostly written on a sense of duty, because it was planned that we will have a way to disable the |
|
I stumbled upon the new I understand it's not easy to produce a nicer error in this case, but it could have been nice with a transition period where a user would be warned about this future keyword. I think adding future keywords is a really interesting idea. This could be used in e.g. dune, which I will upgrade more frequently than my compiler, to warn about new future keywords. |
Indeed.
For what is worth, this is what we did at LexiFi to future-proof our 4.14 codebase in preparation for 5.3: we added the following rule to the lexer: diff --git a/ocaml/parsing/lexer.mll b/ocaml/parsing/lexer.mll
index 89d68763007..a54b5e87828 100644
--- a/ocaml/parsing/lexer.mll
+++ b/ocaml/parsing/lexer.mll
@@ -406,6 +406,9 @@ rule token = parse
| "?" (lowercase_latin1 identchar_latin1 * as name) ':'
{ warn_latin1 lexbuf;
OPTLABEL name }
+ | "effect" as name (* LEXIFI *)
+ { Location.alert ~kind:"future-keyword" (Location.curr lexbuf) "identifier will become keyword in 5.3";
+ LIDENT name }
| lowercase identchar * as name
{ try Hashtbl.find keyword_table name
with Not_found -> LIDENT name }This means that whenever the lexer sees |
|
There are three reasonable places to specify they keyword set / language edition in use:
I now think it is a mistake to provide only (1). It might even be a mistake to provide (1) at all. The problem is that there are lots of tools other than the compiler which care what a keyword is: some preprocessors, Providing mechanism (1) as a fallback can be useful for when you want to compile old code without modification, where you accept that tooling won't work properly on said code. But it should at least not be the only provided mechanism. We had this experience on the Jane Street branch when we first added locals. Initially, we used mechanism (1), a command-line flag, and regretted it. In that context, there is only one build system and a limited number of editors / syntax highlighters, and it was still painful. |
|
Suggesting to have this in the file directly is related to ocaml/RFCs#26 , a RFC discussing how to put information for the whole compilation unit at the top of a file. There is no progress on this RFC because no one seems to care about it, so it would help if people pointed out that there is a need. |
|
@stedolan , note that this PR also allow to set the My aim for the |
|
Consensus from the maintainer meeting: okay, let's have this feature in! (@nojb asked whether we could get a version that warns, instead of erroring, on the additional keywords in the list. To be discussed later.) |
dccc5f1 to
ea248e7
Compare
This commit adds a `-keywords <version?+list>` flag which takes as argument:
- an optional version v number (formatted as %d.%d)
- a +-separated list of additional keywords
and defines the set of keywords recognized by the lexer as the set of keywords at the version `v` of OCaml (defaulting to the current version if no versions were given) completed by the list of additional keywords. This is intended to provide an easy way to keep old OCaml code with newer version of the compiler with additional keywords.
(cherry picked from commit f5ff742)
This PR proposes to add a
-keywords <version?+list>which takes as argument:vnumber (formatted as%d.%d)+-separated list of additional keywordsand defines the set of keywords recognized by the lexer as the set of keywords at the version
vof OCaml (defaulting to the current version if no versions were given) completed by the list of additional keywords.In other words,
is equivalent to
whereas
makes the lexer read both
nonrecandeffectas ordinary lower case identifiers.Moreover, codebases worrying about future compatibility can now register additional keywords with
Whenever the lexer encounters registered keywords with an unknown semantic (like
atomic) in source, it raises an "unknown keyword" error.:To help with compatibility with old codebase, the new flag can be provided through OCAMLPARAM:
Note that PR proposal is a slight alternative to the compiler flags for enabling and disabling keywords suggested in ocaml/RFCs#27