Skip to content

Add a -keywords <version?+list> flag#13471

Merged
Octachron merged 11 commits intoocaml:trunkfrom
Octachron:keywords_edition
Oct 29, 2024
Merged

Add a -keywords <version?+list> flag#13471
Octachron merged 11 commits intoocaml:trunkfrom
Octachron:keywords_edition

Conversation

@Octachron
Copy link
Copy Markdown
Member

This PR proposes to add a -keywords <version?+list> which takes as argument:

  • an optional version v number (formatted as %d.%d)
  • a +-separated list of additional keywords

and defines the set of keywords recognized by the lexer as the set of keywords at the version v of OCaml (defaulting to the current version if no versions were given) completed by the list of additional keywords.

In other words,

-keywords 5.2+effect

is equivalent to

-keywords 5.3

whereas

-keywords 4.1

makes the lexer read both nonrec and effect as ordinary lower case identifiers.

Moreover, codebases worrying about future compatibility can now register additional keywords with

-keywords +atomic+effect

Whenever the lexer encounters registered keywords with an unknown semantic (like atomic) in source, it raises an "unknown keyword" error.:

Error: atomic has been marked as a future keyword,
       but this version of OCaml cannot handle it.

To help with compatibility with old codebase, the new flag can be provided through OCAMLPARAM:

OCAMLPARAM=_,keywords=5.2 ocamlopt ...

Note that PR proposal is a slight alternative to the compiler flags for enabling and disabling keywords suggested in ocaml/RFCs#27

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 25, 2024

I'm fine with the proposed design, but haven't done a review yet.

I wonder if we should ask for ocaml.5.2+effects rather than just 5.2+effects (or maybe support both), to leave room for experimental versions or forks that have their own release (for example the Multicore people could have used ocaml-multicore.4.12 as the name of their own keyword set).

I discussed with @Octachron the possibility of supporting -keyword in addition to +keyword, so that I can use -keywords -effect as a quick-and-dirty approximation of 5.2 when I just want to get rid of one specific keyword. He is not excited about that suggestion and I'm happy to do without it for now.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review comments.

Comment thread parsing/lexer.mll Outdated
| Unknown_keyword name ->
Location.errorf ~loc
"%a has been marked as a future keyword,@ \
but this version of OCaml cannot handle it."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not support it?

Comment thread parsing/lexer.mll Outdated

let keyword_table =
let keyword_table_all =
create_hashtable 149 [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an association list, now that it isn't used in a performance-sensitive way anymore? It would help clarify what is mutable via configuration and what is not.

Comment thread parsing/lexer.mll Outdated
Hashtbl.remove keyword_table "effect"
) v

let parse_keyword_edition s =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fond of the overall logic design here. The Lexer is a busy enough place, with a lot of subtle stuff, and now we are adding configuration parsing and interpretation inside it. Here would be my preferred API:

  • when we initialize the lexer, we can pass it a list or set of keywords
    (this could be an argument to Lexer.init, or an alternate initializer for compatibility-reason)
  • we have code somewhere else in the compiler codebase, for example in our beloved utils/misc.ml module (a submodule Keyword_edition?) that parses the configured edition and turns it into a list or set of supported keywords

If I understand correctly, the logic in the lexer initialization code would then be to populate its keyword table from an immutable keyword->token association list and an (optional) keyword list: for each keyword in the keyword list, add them to the keyword table, with None if they are missing from the association list. This is much less code in lexer.mll than you currently have, and I like it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag parsing logic could certainly be moved to Clflags, and having the list of keywords as an optional argument of Lexer.init would work too.

Comment thread parsing/lexer.mll Outdated
begin match Option.map parse_keyword_edition !Clflags.keyword_edition with
| None -> ()
| Some (edition,(remove,add)) -> set_keyword_edition ~remove ~add edition
| Some (edition, add) -> set_keyword_edition ~add edition
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we are to review commit-by-commit, I would encourage you to squash both commits so that I only see the simpler version. (You can extract the support for keyword removal as a separate patch by just using git revert to this one, and then keep it around in a separate branch in case the people ask for it again.)

Comment thread driver/main_args.ml Outdated
"<version+list> set keywords following the <version+list> spec:\n
\ -<version> if present specifies the base set of keywords\n
\ (if absent the current set of keywords is used)
\ -<list> is a \"+\"-separated list of keywords to add to\n
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spacing looks a bit odd here, I think the second list item is one space to the right of the first item

Comment thread man/ocamlc.1 Outdated
Without an explicit version number, the base set of keywords is the
set of keywords in the current version of OCaml.
Supplementary keywords that does not match any known keyword in the current
version of the compiler triggers an error whenever they are present in the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplementary keywords that do not match ... trigger an error whenever they are present ...

set of keywords in the current version of OCaml.
Supplementary keywords that does not match any known keyword in the current
version of the compiler triggers an error whenever they are present in the
source code.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same singular/plural fix)

Comment thread parsing/lexer.mll Outdated
let parse_version s =
let bad_version () =
raise (Arg.Bad "Ill-formed version in keywords flag,\n\
the expected format is %d.%d .")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand that %d.%d is not a format literal here, it is something you would show to the user. I would rather use: is <major>.<minor>, for example 4.12.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I would use "the supported format", in case some future compiler releases starts accepting more (or less) formats. Maybe the format provided by the user is not wrong (for the up-to-date version of the compiler), we just don't support it yet.

Comment thread parsing/lexer.mll Outdated
if v < (4,2) then
Hashtbl.remove keyword_table "nonrec";
if v < (5, 3) then
Hashtbl.remove keyword_table "effect"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if you do refactor this code to simply return a list/set of supported keywords, you will probably rewrite this piece of code, and maybe the result will be nicer. (Ideally the natural approach reads like keywords from newer versions are added, rather than removed.)

Comment thread parsing/lexer.mll
| Unknown_keyword name ->
Location.errorf ~loc
"%a has been defined as an additional keyword.@ \
This version of OCaml does not support this keyword."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even do "The current version of OCaml (%s)@ [...]" Sys.ocaml_version?

Comment thread parsing/lexer.mll
match Hashtbl.find keyword_table name with
| Some x -> x
| None -> error lexbuf (Unknown_keyword name)
| exception Not_found -> LIDENT name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer with a custom type:

let find_keyword lexbuf name =
  match Hashtbl.find keyword_table name with
  | Known x -> x
  | Future -> error lexbuf (Unknown_keyword name)
  | exception Not_found -> LIDENT name

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to move even more logic out of the lexer (the interpretation of the configuration language, not just the parsing of its syntax, and in particular almost everything related to reasoning on OCaml versions), but I think that you chose the current approach to avoid duplicating the list of keywords in two different places. Those who do the work decide. Approved.

(Maybe the UI would deserve some scrutiny by other people. Did you ping the original thread to point at this PR?)

Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that option looks really cool, i'm personally a bit skeptic about its usefulness in practice.

As far as i've seen so far during this upgrade session for OCaml 5.3 (which introduced the effect keyword), changes due to the introduction of a new keyword are fairly straightforward so i doubt active package maintainers would use it.

So in practice i can only see it being used for ancient software that someone want to compile with a more recent version of the compiler. However for that case i would argue that most of the breaking changes are caused by the standard library, C API or compiler-libs (for stubs or larger software), and sometimes type-checker (in that order). Comparatively these breakages are infinitely more time-consuming to deal with than a couple of new keywords.

The only use-case for this that i can see so far would be for self-contained ocaml scripts (think setup.ml from the late OASIS). However i feel like the number of such scripts in the wild have dwindled in the past 5 years or so. So i would expect very few people to use it.

So my main question is: what is the target use-case?

I can see some parallels with the concept of Editions in Rust. However while i can see the use-case for Editions which:

  • includes more than just keywords
  • doesn't really need to bother with the standard library (i'm not sure how it works but at least that's what they say)
  • is set by default in every new project by cargo

I wonder if instead of calling it -keywords, something more general like -compat with extra features such as:

  • limiting access to the standard library functions or modules according to their @since declaration to avoid issues with shadowing when using open
  • tying future big changes like safe-string was

would actually be used more widely. I for one, would be happy to use that.

Comment thread parsing/lexer.mll
in
List.iter add_keyword all_keywords;
List.iter (fun name ->
match List.find (fun (n,_,_) -> n = name) all_keywords with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match List.find (fun (n,_,_) -> n = name) all_keywords with
match List.find (fun (n,_,_) -> n = (name : string)) all_keywords with

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doesn't change the comparison used (in both cases), I will keep the code as it is for now.

Comment thread parsing/lexer.mll
Comment thread driver/main_args.ml Outdated
@Octachron
Copy link
Copy Markdown
Member Author

@kit-ty-kate , I have mostly two use cases in mind:

  • an easy and small fix at the build system or package manager for code base using the effect, without needing any patch to the source code itself.
  • a way to make sure that a codebase don't use additional keywords from a future or an experimental version of the compiler.

Note that this PR has been mostly written on a sense of duty, because it was planned that we will have a way to disable the effect keyword when enabling the effect handler syntax. If we agree that finally we don't need the added complexity, this is all the better in my opinion.

@reynir
Copy link
Copy Markdown

reynir commented Sep 30, 2024

I stumbled upon the new effect keyword in CI. Please see the error message below. To me the error message was not very helpful. It was telling me the colon was a syntax error. I now understand why, but with me being unaware of effect becoming a keyword in 5.3 (though with developments in recent years it was not hard to guess) it was not obvious why this was a syntax error. While the fix is easy to implement I think the error is not easy to spot if you don't follow the OCaml changelog closely.

File "src/config.ml", line 2073, characters 23-24:
2073 | let parse_next (effect : parser_effect) initial_state :
                              ^
Error: Syntax error

I understand it's not easy to produce a nicer error in this case, but it could have been nice with a transition period where a user would be warned about this future keyword. I think adding future keywords is a really interesting idea. This could be used in e.g. dune, which I will upgrade more frequently than my compiler, to warn about new future keywords.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Sep 30, 2024

I understand it's not easy to produce a nicer error in this case, but it could have been nice with a transition period where a user would be warned about this future keyword.

Indeed.

I think adding future keywords is a really interesting idea. This could be used in e.g. dune, which I will upgrade more frequently than my compiler, to warn about new future keywords.

For what is worth, this is what we did at LexiFi to future-proof our 4.14 codebase in preparation for 5.3: we added the following rule to the lexer:

diff --git a/ocaml/parsing/lexer.mll b/ocaml/parsing/lexer.mll
index 89d68763007..a54b5e87828 100644
--- a/ocaml/parsing/lexer.mll
+++ b/ocaml/parsing/lexer.mll
@@ -406,6 +406,9 @@ rule token = parse
   | "?" (lowercase_latin1 identchar_latin1 * as name) ':'
       { warn_latin1 lexbuf;
         OPTLABEL name }
+  | "effect" as name (* LEXIFI *)
+      { Location.alert ~kind:"future-keyword" (Location.curr lexbuf) "identifier will become keyword in 5.3";
+        LIDENT name }
   | lowercase identchar * as name
       { try Hashtbl.find keyword_table name
         with Not_found -> LIDENT name }

This means that whenever the lexer sees effect an alert is emitted. The user is then free to enable the alert or not if she wants to be warned about the use of the (future-) keyword. The advantage is that this uses an existing mechanism (alerts) instead of inventing yet another one (with a separate command-line flag, etc).

@stedolan
Copy link
Copy Markdown
Contributor

There are three reasonable places to specify they keyword set / language edition in use:

  1. On the compiler command line (this PR, proposed in RFC 27)
  2. In the file, as a toplevel attribute or lexer directive (not in this PR, but also proposed in RFC 27)
  3. Globally, as parameters to configure or to the opam switch (neither in this PR nor RFC 27)

I now think it is a mistake to provide only (1). It might even be a mistake to provide (1) at all.

The problem is that there are lots of tools other than the compiler which care what a keyword is: some preprocessors, merlin, ocamlformat, ocp-indent, editor syntax highlighting, etc. In many cases, the tool is given the text of the file, and there is no obvious way to communicate other information from the build. This can work with mechanism (2) (since the tool can see the attribute or directive) and mechanism (3) (since the tool is configured at install time), but not with mechanism (1).

Providing mechanism (1) as a fallback can be useful for when you want to compile old code without modification, where you accept that tooling won't work properly on said code. But it should at least not be the only provided mechanism.

We had this experience on the Jane Street branch when we first added locals. Initially, we used mechanism (1), a command-line flag, and regretted it. In that context, there is only one build system and a limited number of editors / syntax highlighters, and it was still painful.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 30, 2024

Suggesting to have this in the file directly is related to ocaml/RFCs#26 , a RFC discussing how to put information for the whole compilation unit at the top of a file. There is no progress on this RFC because no one seems to care about it, so it would help if people pointed out that there is a need.

@Octachron
Copy link
Copy Markdown
Member Author

@stedolan , note that this PR also allow to set the keywords flag through environment variable via OCAMLPARAM.

My aim for the -keywords flag is mostly to be able to mix legacies libraries (or opam packages) that use effect keyword with more up-to-date library. In other words, it provides the -keywords flag as a fallback mechanism only for which tooling support is not expected because the time spent implementing support for -keywords in tools would probably better be spent updating the libraries using identifiers newly promoted to keywords.

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 28, 2024

Consensus from the maintainer meeting: okay, let's have this feature in!

(@nojb asked whether we could get a version that warns, instead of erroring, on the additional keywords in the list. To be discussed later.)

@Octachron Octachron merged commit f5ff742 into ocaml:trunk Oct 29, 2024
Octachron added a commit that referenced this pull request Oct 29, 2024
This commit adds a `-keywords <version?+list>` flag which takes as argument:

    - an optional version v number (formatted as %d.%d)
    - a +-separated list of additional keywords

and defines the set of keywords recognized by the lexer as the set of keywords at the version `v` of OCaml (defaulting to the current version if no versions were given) completed by the list of additional keywords. This is intended to provide an easy way to keep old OCaml code with newer version of the compiler with additional keywords.

(cherry picked from commit f5ff742)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants