New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Menhir-generated parser #292

Merged
merged 92 commits into from Sep 2, 2018

Conversation

Projects
None yet
@gasche
Member

gasche commented Nov 15, 2015

This WIP is meant to present work ongoing to switch the OCaml parser from using ocamlyacc as a parser generator to Menhir. This work is joint work with Frédéric Bour ( @def-lkb ), who started this work inside Merlin, François Pottier and Yann Régis-Gianas who have been very reactive at improving Menhir as necessary¹, and Jacques-Henri Jourdan who provided various comments, explanations of the LR stuff, and suggestions along the way.

¹: we rely on several Menhir features introduced for this work, so this patch only works with the very last (or in-development) version of Menhir.

Ocamlyacc is a stable tool (that hasn't required much maintenance effort in the past ten years) and gives good parsing performance. Some reasons to replace it by Menhir are the following, in decreasing order of perceived strength:

  • Menhir lets us write better grammars. Menhir has abstraction features that allow to maintain grammars more conveniently (we can use variable names instead of $2, parametrized rules can remove a lot of redundancy); I expect that most of the grammar-related woes of the recent docstring parsing effort would have been removed by Menhir. Its conflict explanation features should also make it much easier to refactor the grammar while preserving input programs, or evaluate proposed syntax changes
  • Menhir will give us better error messages. Menhir is getting solid support for providing better error messages than would be ever possible with OCamlYacc, thanks to work initiated by Frédéric in Merlin (that resulted in particular in Menhir's new incremental parsing interface, and several suggestion of error explanation and recovery) and additional work by François (see his recent draft Reachability and error diagnosis in LR(1) automata).
  • Menhir is a living project (rather than an esteemed fossil) with a pure OCaml codebase. This makes it easier to maintain and evolve, and much of the above benefits come directly from this flexibility.

Note that we have not yet started applying any work on syntax error messages on the OCaml grammar (there is good work in Merlin and François' experiment on Compcert's C grammar are very promising). This is strictly future work, but getting a Menhir grammar that we would be ready to integrate in the compiler (hopefully a future evolution of the present PR) is a necessary first step.

Performance aspects

Menhir has a --table backend, that generates OCaml code traversing the automated represented as a tabular data-structure (just as OcamlYacc does), and a --code backend that compiles the automaton traversal into pure OCaml code (removing the interpretation overhead). We plan to use the --table backend in the OCaml parser for two reasons:

  • the coming features for syntax error support are only available for the --table backend for now (and may never come to the --code backend)
  • the --code backend generates a huge nest of recursive functions that is hard to handle for the OCaml compiler (both in the type-checking part and later code-generation parts); it currently takes around a minute to compile (which would add a lot to the total compiler build time, which is around a few minutes) and this could even worsen with future compiler changes, so it seems to fragile to use inside the compiler.

While the --code backend is more than competitive with Ocamlyacc performance-wise, using the --table backend would mean a degradation in parsing time from the current parser. The PR includes a benchmark script that you can run on your machine (bash menhir-bench.bash after make world.opt and make install).

On my machine, using Menhir --table adds a 20% overhead to the parsing pass of ocamlc.opt, and a 50% overhead to the parsing pass of ocamlc (we pay a lot when we replace C code into bytecode-interpreted OCaml code). However, the overhead is much more acceptable when considering the whole compilation chain: for bytecode compilation (noticeably faster than native compilation), the overhead I measure becomes 6% in native code and 5% in bytecode.

Is the OCaml community ready to accept a 5% performance degradation for bytecode compilation? I think that it is worth it, as it comes with better error messages and (for maintainers) a grammar that is easier to maintain and evolve.

Bootstrap story

After a few experiments, the bootstrap story that I ended with is the following:

  • the code of the Menhir runtime library and the generated OCaml parser are stored in boot/menhir/ (and copied in parsing/ at parser-compilation time); the grammar is stored in parsing/parser.mly as usual (parser_menhir.mly in the current patch, that maintains the ocamlyacc parser in parallel)
  • a target promote-menhir is added to the Makefile; it generates the OCaml parser from the .mly and copies it, along with a matching copy of the Menhir runtime library, to boot/menhir

This means that Menhir is not a build dependency (nor runtime dependency) of the OCaml compiler distribution, as the source of the parser is kept. The only time at which Menhir is necessary is when running the promote-menhir target to update the parser, that is, for the people that want to change the OCaml grammar.

Status of the proposed migration

The first step of the migration was to get a working Menhir-generated parser that could be compared to the OcamlYacc parser, with as little changes to the grammar as possible. As Frédéric predicted, the main difficulty was the handling of symbol locations, which uses an imperative interface (the Parsing module) in OCamlyacc, and relies on a pure data-passing interface in Menhir. This required extensive changes to the auxiliary functions called by the parser (and the Docstrings module supporting the parsing of documentation comments). On the side of grammar, I use the very ugly hack of relying on cpp to make #define hiding the location-passing from the semantic actions. This allows to write semantic actions that are extremely close to the yacc ones, which simplifies review and correctness checking.

Once this point is done, we have a parser that can easily be verified to be correct: I set up the OCaml frontend so that it parses each input file with both the yacc-generated and menhir-generated parsers, and fail if the resulting ASTs do not exactly match (including locations, etc.). The current parser passes this test for all the files touched by make world.opt (we could test it further on all OPAM-accessible sources, but this would require extensive data collection work; I am already confident that the parser has very few regressions, if any). (Getting the exact same locations as yacc require some changes in Menhir's location handling by François.)

The next step is to use Menhir's abstraction features to remove all the #define used in the semantic actions, and thus get a proper .mly -- the "cpp war". This patch is still in RFC stage because this step is not finished, and I would not propose to include a cpp-preprocessed grammar in trunk. I have done two first patches in this direction, one for mkrhs and one for extra_{str,sig,...}. It is easy to verify that these changes, that get us further and further from the yacc grammar, are correctness-preserving, thanks to the AST comparison machinery.

Requirements for a final replacement?

My personal opinion is that, as soon as this last step is finished, the resulting grammar could be considered for inclusion, replacing the current ocamlyacc grammar. This requires accepting the parsing performance overhead. Finally, this would not be the end of the road, as more work is required to get better syntax error messages.

@gasche gasche changed the title from [RFC] Menhir-generated parser to [WIP] Menhir-generated parser Nov 15, 2015

@gasche gasche referenced this pull request Nov 15, 2015

Closed

Shave the yacc #33

@gasche gasche added the enhancement label Nov 15, 2015

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 15, 2015

Member

I should point out that what initially motivated to start this work was a comment by Daniel Bünzli ( @dbuenzli ) on a blog post about improving typing error messages. Daniel wrote:

The parser would certainly benefit of being rewritten by hand to provide good error messages and error recovery.

Which made me realize that the message "Error: Syntax error" may be a first low-hanging fruit for improving the usability of OCaml errors. Daniel then added,

(yes, hand written parsers are the only way to achieve that)

and we are working to prove him wrong on this part.

Member

gasche commented Nov 15, 2015

I should point out that what initially motivated to start this work was a comment by Daniel Bünzli ( @dbuenzli ) on a blog post about improving typing error messages. Daniel wrote:

The parser would certainly benefit of being rewritten by hand to provide good error messages and error recovery.

Which made me realize that the message "Error: Syntax error" may be a first low-hanging fruit for improving the usability of OCaml errors. Daniel then added,

(yes, hand written parsers are the only way to achieve that)

and we are working to prove him wrong on this part.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Nov 16, 2015

Contributor

and we are working to prove him wrong on this part.

I'd love to be.

Contributor

dbuenzli commented Nov 16, 2015

and we are working to prove him wrong on this part.

I'd love to be.

@Chris00

This comment has been minimized.

Show comment
Hide comment
@Chris00

Chris00 Nov 16, 2015

Member

Maybe we can use this change to parse -. x**n rightly as -. (x**n) and not as (-. x)**n?

Member

Chris00 commented Nov 16, 2015

Maybe we can use this change to parse -. x**n rightly as -. (x**n) and not as (-. x)**n?

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Nov 16, 2015

Contributor

I like the fact not to have menhir as a build dependency, but I think it should not be the default : most users just use OPAM now to compile OCaml, so the OPAM build steps for OCaml could be make world-without-menhir, whereas the OCaml coreteam would probably want make world to regenerate the parser if it has been modified.

Maybe it would be possible to include Menhir's runtime library into OCaml sources, so that we would only depend on the external executable (that could be compiled with another version of OCaml), while using the bootstrap compiler for Menhir's library ?

Contributor

lefessan commented Nov 16, 2015

I like the fact not to have menhir as a build dependency, but I think it should not be the default : most users just use OPAM now to compile OCaml, so the OPAM build steps for OCaml could be make world-without-menhir, whereas the OCaml coreteam would probably want make world to regenerate the parser if it has been modified.

Maybe it would be possible to include Menhir's runtime library into OCaml sources, so that we would only depend on the external executable (that could be compiled with another version of OCaml), while using the bootstrap compiler for Menhir's library ?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 16, 2015

Member

@Chris00 noted, but I'd like to avoid any discussion of specific syntactic change, an keep this PR focused on the discussion of the parser change itself.

@lefessan What I would like to add in the short term is a test, when compiling the parser OCaml source, that the .mly file is not more recent -- and fail if that test does not pass. This is not as automatic as what you suggest (I could even test whether Menhir is available and only compile the grammar in that case), but it would already provide a clear workflow (if make world fails, just run make promote-menhir and start again). The dependencies information are good (I use menhir --depend for this), so there should be no conflict issues when restarting the build from a non-clean state. (I did indeed make the mistake during my testing of forgetting to update the OCaml parser after modifying the .mly(p).)

Right now, Menhir's runtime library is included in the OCaml distribution, in boot/menhirLib.{ml,mli}, and this is versioned. The promote-menhir target takes care to refresh these files at the same time the parser_menhir.ml is produced from the .mly, so no mismatch should happen (I was careful to ensure this was possible and abundantly whined to François until he implemented the features necessary to support this workflow). Furthermore, François added to recent versions of Menhir a cute trick that makes sure that, if you try to compile a generated parser against a mismatched runtime library, compilation will fail with a typing error (the parser requires a version_$VERSION identifier that the library provides), so there should be no silent error in this scenario.

(Indeed, the menhir executable can be compiled with another version of OCaml, typically 4.02.3 while you compile for trunk. Then the menhirLib.ml file included in boot/ will be those released by François for that other version of OCaml, so in theory this could fail if you use a very recent version of Menhir to compile a very old version of OCaml. The generated grammar also imposes version constraints, as the type-safe stack introspection interface uses GADTs.)

Member

gasche commented Nov 16, 2015

@Chris00 noted, but I'd like to avoid any discussion of specific syntactic change, an keep this PR focused on the discussion of the parser change itself.

@lefessan What I would like to add in the short term is a test, when compiling the parser OCaml source, that the .mly file is not more recent -- and fail if that test does not pass. This is not as automatic as what you suggest (I could even test whether Menhir is available and only compile the grammar in that case), but it would already provide a clear workflow (if make world fails, just run make promote-menhir and start again). The dependencies information are good (I use menhir --depend for this), so there should be no conflict issues when restarting the build from a non-clean state. (I did indeed make the mistake during my testing of forgetting to update the OCaml parser after modifying the .mly(p).)

Right now, Menhir's runtime library is included in the OCaml distribution, in boot/menhirLib.{ml,mli}, and this is versioned. The promote-menhir target takes care to refresh these files at the same time the parser_menhir.ml is produced from the .mly, so no mismatch should happen (I was careful to ensure this was possible and abundantly whined to François until he implemented the features necessary to support this workflow). Furthermore, François added to recent versions of Menhir a cute trick that makes sure that, if you try to compile a generated parser against a mismatched runtime library, compilation will fail with a typing error (the parser requires a version_$VERSION identifier that the library provides), so there should be no silent error in this scenario.

(Indeed, the menhir executable can be compiled with another version of OCaml, typically 4.02.3 while you compile for trunk. Then the menhirLib.ml file included in boot/ will be those released by François for that other version of OCaml, so in theory this could fail if you use a very recent version of Menhir to compile a very old version of OCaml. The generated grammar also imposes version constraints, as the type-safe stack introspection interface uses GADTs.)

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Nov 18, 2015

Contributor

@gasche amazing work.

Contributor

whitequark commented Nov 18, 2015

@gasche amazing work.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 18, 2015

Member

We discussed this at the developer meeting today, and it was decided (in a rather predictable way) that it is too soon to consider for integration the next release -- so in particular I won't try to get it mergeable in trunk before 4.03.

Member

gasche commented Nov 18, 2015

We discussed this at the developer meeting today, and it was decided (in a rather predictable way) that it is too soon to consider for integration the next release -- so in particular I won't try to get it mergeable in trunk before 4.03.

@agarwal

This comment has been minimized.

Show comment
Hide comment
@agarwal

agarwal Nov 24, 2015

Member

Out of scope for this PR, but I wanted to mention that in the long term it would be useful to have lexer/parser that is completely lossless, i.e. every input character is retained. It would then be possible to implement a good syntax highlighter. A highlighter needs to output the exact same content that was input, but also annotate tokens based on their identity in the grammar.

Member

agarwal commented Nov 24, 2015

Out of scope for this PR, but I wanted to mention that in the long term it would be useful to have lexer/parser that is completely lossless, i.e. every input character is retained. It would then be possible to implement a good syntax highlighter. A highlighter needs to output the exact same content that was input, but also annotate tokens based on their identity in the grammar.

@let-def

This comment has been minimized.

Show comment
Hide comment
@let-def

let-def Nov 24, 2015

Contributor

@agarwal Not sure that this is really necessary.
A lossless roundtrip through lexing/parsing is useful for all kind of automated source to source transformation, e.g. refactoring.
For syntactic (or semantic) highlighting, current frontend was precise enough (based on my experiments with merlin).

(Although thanks to my fine taste the resulting buffers looked like christmas trees)

Contributor

let-def commented Nov 24, 2015

@agarwal Not sure that this is really necessary.
A lossless roundtrip through lexing/parsing is useful for all kind of automated source to source transformation, e.g. refactoring.
For syntactic (or semantic) highlighting, current frontend was precise enough (based on my experiments with merlin).

(Although thanks to my fine taste the resulting buffers looked like christmas trees)

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Dec 17, 2015

@mshinwell mshinwell changed the title from [WIP] Menhir-generated parser to Menhir-generated parser Jun 10, 2016

@keleshev

This comment has been minimized.

Show comment
Hide comment
@keleshev

keleshev Jun 15, 2016

Contributor

On --code vs. --table: consider—why not both? The compiler could start by using --code generated parser (which gives speed in case of successful parse), then if syntax error is encountered it can switch to the --table parser and re-parse the file to get a better error message.

This approach might change a 6% performance regression into a comparable performance improvement.

Sexplib follows this approach, although with two different parsers: a hand-written one and an ocamlyacc one.

Contributor

keleshev commented Jun 15, 2016

On --code vs. --table: consider—why not both? The compiler could start by using --code generated parser (which gives speed in case of successful parse), then if syntax error is encountered it can switch to the --table parser and re-parse the file to get a better error message.

This approach might change a 6% performance regression into a comparable performance improvement.

Sexplib follows this approach, although with two different parsers: a hand-written one and an ocamlyacc one.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Jun 15, 2016

Member

We thought about this as well, but it does complicate the setup a bit; for example the toplevel is running the lexer+parser directly from the interactive user input, so preparing for a re-parse would require memoizing the token stream.

(Right now the priority would be to update the prototype to the trunk grammar -- the grammar keeps evolving -- and move away from cpp macros for location handling. I personally have no time to commit to this in the short term.)

Member

gasche commented Jun 15, 2016

We thought about this as well, but it does complicate the setup a bit; for example the toplevel is running the lexer+parser directly from the interactive user input, so preparing for a re-parse would require memoizing the token stream.

(Right now the priority would be to update the prototype to the trunk grammar -- the grammar keeps evolving -- and move away from cpp macros for location handling. I personally have no time to commit to this in the short term.)

@bluddy

This comment has been minimized.

Show comment
Hide comment
@bluddy

bluddy Jul 25, 2016

Would it be possible to move Menhir to github?

bluddy commented Jul 25, 2016

Would it be possible to move Menhir to github?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Jul 25, 2016

Member

I'm not the author of Menhir, so it wouldn't be my decision; also, I'm not sure how it relates to this particular work. I generally try to work along with each project's choice of tooling and development model -- as long as it is free software, of course.

Member

gasche commented Jul 25, 2016

I'm not the author of Menhir, so it wouldn't be my decision; also, I'm not sure how it relates to this particular work. I generally try to work along with each project's choice of tooling and development model -- as long as it is free software, of course.

@bluddy

This comment has been minimized.

Show comment
Hide comment
@bluddy

bluddy Jul 25, 2016

Obviously not directly related.

I was directed to this PR after asking a question on IRC regarding a compiler PR I'm working on (so-called 'safe-syntax'). Menhir would definitely make adding syntax improvements easier and involve less duplication than the current method.

bluddy commented Jul 25, 2016

Obviously not directly related.

I was directed to this PR after asking a question on IRC regarding a compiler PR I'm working on (so-called 'safe-syntax'). Menhir would definitely make adding syntax improvements easier and involve less duplication than the current method.

@bluddy bluddy referenced this pull request Jul 25, 2016

Closed

-safe-syntax option #715

@damiendoligez damiendoligez removed this from the 4.04 milestone Aug 2, 2016

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Dec 1, 2016

Contributor

One issue is that Menhir is under the QPL, whereas OCaml is now fully under the LGPL2.1 plus exceptions. Furthermore, the OCaml compiler distribution is fully self-contained right now.

Contributor

DemiMarie commented Dec 1, 2016

One issue is that Menhir is under the QPL, whereas OCaml is now fully under the LGPL2.1 plus exceptions. Furthermore, the OCaml compiler distribution is fully self-contained right now.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 1, 2016

Member

One issue is that Menhir is under the QPL, whereas OCaml is now fully under the LGPL2.1 plus exceptions.

I think I can convince François to use an OCaml-compatible license.

Furthermore, the OCaml compiler distribution is fully self-contained right now.

Sure, but the bytecode binaries present in the boot folder are part of this "fully self-contained" set. I propose to add a menhir-generated grammar and the corresponding runtime (both being simple OCaml source files) to this boot folder -- preserving the property that the distribution is self-contained.

Member

gasche commented Dec 1, 2016

One issue is that Menhir is under the QPL, whereas OCaml is now fully under the LGPL2.1 plus exceptions.

I think I can convince François to use an OCaml-compatible license.

Furthermore, the OCaml compiler distribution is fully self-contained right now.

Sure, but the bytecode binaries present in the boot folder are part of this "fully self-contained" set. I propose to add a menhir-generated grammar and the corresponding runtime (both being simple OCaml source files) to this boot folder -- preserving the property that the distribution is self-contained.

@dra27

This comment has been minimized.

Show comment
Hide comment
@dra27

dra27 Dec 9, 2016

Contributor

For the bootstrap, it would be good (and I would be very happy to help!) with being able to pull in Menhir as a bootstrapped library in the same way FlexDLL is via a submodule on the repository.

Contributor

dra27 commented Dec 9, 2016

For the bootstrap, it would be good (and I would be very happy to help!) with being able to pull in Menhir as a bootstrapped library in the same way FlexDLL is via a submodule on the repository.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 9, 2016

Member

I'm a bit wary of submodules. What are your reasons to think that it would be better than just including the parser and its runtime as OCaml source files in boot/?

Do I correctly understand that you are proposing to pull Menhir as whole, not just the bits necessary to run the parser? Then the development step for people willing to modify the parser would be to pull and build the embedded-menhir rather than their own install of Menhir (the latter is the workflow in the proposed patch). I find it a bit heavy, but one advantage is that it makes it easy for the compiler distribution to have its own patches to Menhir (for example if trunk breaks something that breaks Menhir we can fix it locally).

One thing is that @fpottier still lives in the 19th century of romantic software development: there is no publicly available Menhir repository -- only release tarballs. I think it would be very nice to have him change that process, but I expect it to be a more difficult than getting a version released under an OCaml-compatible license.

Member

gasche commented Dec 9, 2016

I'm a bit wary of submodules. What are your reasons to think that it would be better than just including the parser and its runtime as OCaml source files in boot/?

Do I correctly understand that you are proposing to pull Menhir as whole, not just the bits necessary to run the parser? Then the development step for people willing to modify the parser would be to pull and build the embedded-menhir rather than their own install of Menhir (the latter is the workflow in the proposed patch). I find it a bit heavy, but one advantage is that it makes it easy for the compiler distribution to have its own patches to Menhir (for example if trunk breaks something that breaks Menhir we can fix it locally).

One thing is that @fpottier still lives in the 19th century of romantic software development: there is no publicly available Menhir repository -- only release tarballs. I think it would be very nice to have him change that process, but I expect it to be a more difficult than getting a version released under an OCaml-compatible license.

@dra27

This comment has been minimized.

Show comment
Hide comment
@dra27

dra27 Dec 9, 2016

Contributor

Ah, the 19th Century is certianly a problem! Can't have a submodule without a repo...

My main reason - if it becomes feasible - is that as with FlexDLL, there is a mutual dependency. But it's largely philosophical!

Contributor

dra27 commented Dec 9, 2016

Ah, the 19th Century is certianly a problem! Can't have a submodule without a repo...

My main reason - if it becomes feasible - is that as with FlexDLL, there is a mutual dependency. But it's largely philosophical!

@agarwal

This comment has been minimized.

Show comment
Hide comment
@agarwal

agarwal Dec 9, 2016

Member

One thing is that @fpottier still lives in the 19th century of romantic software development: there is no publicly available Menhir repository -- only release tarballs. I think it would be very nice to have him change that process, but I expect it to be a more difficult than getting a version released under an OCaml-compatible license.

Consider petitioning him via Change.org. 😄

Member

agarwal commented Dec 9, 2016

One thing is that @fpottier still lives in the 19th century of romantic software development: there is no publicly available Menhir repository -- only release tarballs. I think it would be very nice to have him change that process, but I expect it to be a more difficult than getting a version released under an OCaml-compatible license.

Consider petitioning him via Change.org. 😄

@fpottier

This comment has been minimized.

Show comment
Hide comment
@fpottier

fpottier Dec 9, 2016

Love the 19th century bit. Actually, as you (@gasche) know, menhirs date back to the pre-Christian era. That said, I am considering moving Menhir's repo to the publicly-visible gitlab.inria.fr. Would that help?

fpottier commented Dec 9, 2016

Love the 19th century bit. Actually, as you (@gasche) know, menhirs date back to the pre-Christian era. That said, I am considering moving Menhir's repo to the publicly-visible gitlab.inria.fr. Would that help?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 9, 2016

Member

I don't want to say that it is necessary for this particular project -- it's probably not -- but yes, I believe that having a development repository publicly accessible would help people feel confident about Menhir. It's kind of irrational but "having a public repository" is what everyone expects now, and not doing that gives the feeling of a old, closed project (same thing with SVN versioning or... forge hosting; Gitlab is fine, it's the modern underground alternative). It would also have been helpful, I believe, for the people that forked Menhir and built things on top of it (Frédéric, Pippijn), and I am sure it will be helpful to them or others in the future.

Member

gasche commented Dec 9, 2016

I don't want to say that it is necessary for this particular project -- it's probably not -- but yes, I believe that having a development repository publicly accessible would help people feel confident about Menhir. It's kind of irrational but "having a public repository" is what everyone expects now, and not doing that gives the feeling of a old, closed project (same thing with SVN versioning or... forge hosting; Gitlab is fine, it's the modern underground alternative). It would also have been helpful, I believe, for the people that forked Menhir and built things on top of it (Frédéric, Pippijn), and I am sure it will be helpful to them or others in the future.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Dec 13, 2016

Contributor

I personally think that one obstacle might be licensing. @fpottier, would you be opposed to relicensing Menhir under the same terms as OCaml?

Contributor

DemiMarie commented Dec 13, 2016

I personally think that one obstacle might be licensing. @fpottier, would you be opposed to relicensing Menhir under the same terms as OCaml?

@fpottier

This comment has been minimized.

Show comment
Hide comment
@fpottier

fpottier Dec 13, 2016

I just checked with Yann Régis-Gianas. We have no objection in principle to changing Menhir's license so as to use the same license as OCaml. We just need a little time to understand exactly what the OCaml license says.

fpottier commented Dec 13, 2016

I just checked with Yann Régis-Gianas. We have no objection in principle to changing Menhir's license so as to use the same license as OCaml. We just need a little time to understand exactly what the OCaml license says.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 25, 2017

Member

I just pushed a partially-rebased version of the branch that is in line with the current trunk. No other progress in terms of removing the use of cpp macros has been made (in fact I didn't rebase some of the work I previously did removing cpp macros).

Member

gasche commented May 25, 2017

I just pushed a partially-rebased version of the branch that is in line with the current trunk. No other progress in terms of removing the use of cpp macros has been made (in fact I didn't rebase some of the work I previously did removing cpp macros).

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 25, 2017

Member

Interestingly the performance numbers, while they remain in the same ballpark, have improved a bit: on a "best out of 5" test, bytecode ocamlc's parsing pass becomes 56% slower, but native ocamlc.opt is only 13% slower at parsing than with ocamlyacc (it was previously a 20% overhead). I suspect that Menhir improved its efficiency since the last time I ran benchmarks -- November 2015.

For a full bytecode compilation, ocamlc has a 5% slowndown and the ocamlc.opt difference is in the noise. This is good: it means that there is no performance difference when people use .opt compilers. Note that since the well-numbered #512 included in 4.04, the ocamlc executable installed is actually ocamlc.opt, so I expect most people to use the native versions now.

Member

gasche commented May 25, 2017

Interestingly the performance numbers, while they remain in the same ballpark, have improved a bit: on a "best out of 5" test, bytecode ocamlc's parsing pass becomes 56% slower, but native ocamlc.opt is only 13% slower at parsing than with ocamlyacc (it was previously a 20% overhead). I suspect that Menhir improved its efficiency since the last time I ran benchmarks -- November 2015.

For a full bytecode compilation, ocamlc has a 5% slowndown and the ocamlc.opt difference is in the noise. This is good: it means that there is no performance difference when people use .opt compilers. Note that since the well-numbered #512 included in 4.04, the ocamlc executable installed is actually ocamlc.opt, so I expect most people to use the native versions now.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark May 26, 2017

Contributor

Note that since the well-numbered #512 included in 4.04, the ocamlc executable installed is actually ocamlc.opt, so I expect most people to use the native versions now.

This regression will hit the cross-compiler users as those only get usable bytecode compilers. Arguably it is a bug in the cross-compiler build process.

Contributor

whitequark commented May 26, 2017

Note that since the well-numbered #512 included in 4.04, the ocamlc executable installed is actually ocamlc.opt, so I expect most people to use the native versions now.

This regression will hit the cross-compiler users as those only get usable bytecode compilers. Arguably it is a bug in the cross-compiler build process.

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017

gasche and others added some commits Jul 4, 2018

Makefile.menhir: enable strict mode
The yacc parser already uses its --strict flag, which turns grammar
conflicts into build errors so that they are noticed. Menhir uses
a more extensive semantics for the --strict flag, as it turns into
errors grammar warnings that have no yacc counterpart, in particular
unused precedence levels. We have to fix more mistakes, which I guess
is good news.
Use $loc and $sloc (requires Menhir >= 20180703)
This commit includes a menhir parser bootstrap, as it changes the
Docstrings API and thus wouldn't work for the previous bootstrapped
parser.
Revert "parser.mly: use prefixed+currified helpers to facilitate Menh…
…ir port"

This reverts commit f0b7535.

Those changes to the ocamlyacc parser helped during the yacc->menhir
transition, as they allowed to make the semantic action code
indentical between the yacc and menhir parser, which makes rebasing
against new yacc-parser versions much easier.

Instead of removing the change from the history, we revert it
explicitly so that we can easiy come back later to the non-reverted
version for testing or rebasing.
menhir parser: rename MenhirLib into CamlinternalMenhirLib
The goal of this change is to avoid conflicts encountered by
compiler-libs users that would also use their own MenhirLib runtime
for their own parsers.

I first tried to implement a solution to this module-name-conflict
issue using module aliases and -open, but this proven too fragile and
too difficult to get right.
menhir parser: reorder driver/pparse to support (-pp) preprocessing
Damien Doligez spotted a bug with the parser-comparison logic, where
any use of -pp preprocessors would break compilation due to differing
ASTs between yacc and menhir.

This comes from the fact that the parser-comparison logic would
perform the preprocessing step twice, which would generate two
different temporary files to store the postprocessed source, and the
different names of these temporary files would end up in locations of
the two produced ASTs, making them differ.

The present patch simplifies the Pparse logic (which remains rather
involuted due to its slightly over-clever use of function arguments)
and moves the parser comparison to after preprocessing has been performed.
remove the yacc parser
The large diff in boot/menhir/parser_menhir.ml comes from the fact
that the token list is now included in it, instead of being merely
a reference to the yacc parsers' Parser.token type.
add a .depend.menhir file for menhir parser dependencies
We do not write to .depend during the common 'depend' target,
because computing dependencies requires Menhir installed.

@gasche gasche merged commit c303d7b into ocaml:trunk Sep 2, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pmetzger

This comment has been minimized.

Show comment
Hide comment
@pmetzger

pmetzger Sep 2, 2018

Member

I assume that pull requests to make use of Menhir's facilities to improve the compiler's error messages are soon going to be accepted?

Member

pmetzger commented Sep 2, 2018

I assume that pull requests to make use of Menhir's facilities to improve the compiler's error messages are soon going to be accepted?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 2, 2018

Member

This is the plan, but "soon" is too optimistic. @let-def has worked on this and is going to keep working on this, we've discussed it, and we are not convinced that the current error-message facilities are scalable/maintenable enough for the OCaml grammar. @let-def has plan for a different approach (if it works, it can be made available to all Menhir users), but we will need at least a few months to have something to show. This first PR took about three years, so maybe that's "soon" in comparison.

Member

gasche commented Sep 2, 2018

This is the plan, but "soon" is too optimistic. @let-def has worked on this and is going to keep working on this, we've discussed it, and we are not convinced that the current error-message facilities are scalable/maintenable enough for the OCaml grammar. @let-def has plan for a different approach (if it works, it can be made available to all Menhir users), but we will need at least a few months to have something to show. This first PR took about three years, so maybe that's "soon" in comparison.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 2, 2018

Member

Three other things that could be done and are easier tasks:

  1. We could include Menhir as a subrepo and try to hook it within the build system as a mode of use of Menhir. I'm not sure precisely how that would work (would any user of the parser be asked to use this versioned inclusion instead of an external installation?), and to be honest I'm not very interested in working on it, but @dra27 mentioned it and I trust him that it could be an interesting solution.

  2. We could think about using Menhir for other parsers included in the OCaml distribution. In particular, this would be required if we go with a long-term plan to deprecate ocamlyacc and replace it with Menhir globally within the OCaml community. This also sounds a bit tedious: I don't think we want to bootstrap all those other parsers, so it may be that the submodule (task (1)) is a required prerequisite.

  3. I have vague ideas of small things to play with in the Menhir code generation to get some small performance improvements. Not a priority (performance is already fine), but maybe something to do for fun someday.

  4. We could do a writeup of how to do a transition from ocamlyacc to menhir. The way we went at it in this PR is not a reference, because in the meantime @fpottier added new Menhir features to make the transition easier (in fact, quite easy). Our final result is representative of the best way we know to do migrations, but we want to tell a simpler story than our git history.

  5. There are various ways we could factorize the parsing rules to improve sharing because similar syntactic constructions. For example, many of the signature items are also valid, as is, as structure items, and this could be represented by a shared intermediary-AST type with a single set of parsing rule, whose result get injected into both signature and structure items. We discussed those during the Menhir migration, but decided to keep a very non-invasive approach of preserving the parser structure as much as possible, and left those for future work.

Member

gasche commented Sep 2, 2018

Three other things that could be done and are easier tasks:

  1. We could include Menhir as a subrepo and try to hook it within the build system as a mode of use of Menhir. I'm not sure precisely how that would work (would any user of the parser be asked to use this versioned inclusion instead of an external installation?), and to be honest I'm not very interested in working on it, but @dra27 mentioned it and I trust him that it could be an interesting solution.

  2. We could think about using Menhir for other parsers included in the OCaml distribution. In particular, this would be required if we go with a long-term plan to deprecate ocamlyacc and replace it with Menhir globally within the OCaml community. This also sounds a bit tedious: I don't think we want to bootstrap all those other parsers, so it may be that the submodule (task (1)) is a required prerequisite.

  3. I have vague ideas of small things to play with in the Menhir code generation to get some small performance improvements. Not a priority (performance is already fine), but maybe something to do for fun someday.

  4. We could do a writeup of how to do a transition from ocamlyacc to menhir. The way we went at it in this PR is not a reference, because in the meantime @fpottier added new Menhir features to make the transition easier (in fact, quite easy). Our final result is representative of the best way we know to do migrations, but we want to tell a simpler story than our git history.

  5. There are various ways we could factorize the parsing rules to improve sharing because similar syntactic constructions. For example, many of the signature items are also valid, as is, as structure items, and this could be represented by a shared intermediary-AST type with a single set of parsing rule, whose result get injected into both signature and structure items. We discussed those during the Menhir migration, but decided to keep a very non-invasive approach of preserving the parser structure as much as possible, and left those for future work.

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Sep 2, 2018

Contributor

For 5, updating #1726 reminded me that the extended operator grammar rules could be reworked to avoid a lot of duplications. I might propose a PR to do just that.

Contributor

Octachron commented Sep 2, 2018

For 5, updating #1726 reminded me that the extended operator grammar rules could be reworked to avoid a lot of duplications. I might propose a PR to do just that.

@smuenzel-js

This comment has been minimized.

Show comment
Hide comment
@smuenzel-js

smuenzel-js Sep 3, 2018

Contributor

It looks like this can cause a hang on syntax errors, see MPR#7847

Contributor

smuenzel-js commented Sep 3, 2018

It looks like this can cause a hang on syntax errors, see MPR#7847

@shindere

This comment has been minimized.

Show comment
Hide comment
@shindere

shindere Sep 3, 2018

Contributor
Contributor

shindere commented Sep 3, 2018

@let-def

This comment has been minimized.

Show comment
Hide comment
@let-def

let-def Sep 3, 2018

Contributor

@smuenzel-js I am looking at it, thanks.

Contributor

let-def commented Sep 3, 2018

@smuenzel-js I am looking at it, thanks.

@pmetzger

This comment has been minimized.

Show comment
Hide comment
@pmetzger

pmetzger Sep 3, 2018

Member

@gasche:

We could do a writeup of how to do a transition from ocamlyacc to menhir. The way we went at it in this PR is not a reference, because in the meantime @fpottier added new Menhir features to make the transition easier (in fact, quite easy). Our final result is representative of the best way we know to do migrations, but we want to tell a simpler story than our git history.

This seems like an excellent idea.

Member

pmetzger commented Sep 3, 2018

@gasche:

We could do a writeup of how to do a transition from ocamlyacc to menhir. The way we went at it in this PR is not a reference, because in the meantime @fpottier added new Menhir features to make the transition easier (in fact, quite easy). Our final result is representative of the best way we know to do migrations, but we want to tell a simpler story than our git history.

This seems like an excellent idea.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 4, 2018

Member

@shindere I just checked that make clean; make coreall; make bootstrap; make bootstrap works as expected.

Member

gasche commented Sep 4, 2018

@shindere I just checked that make clean; make coreall; make bootstrap; make bootstrap works as expected.

@shindere

This comment has been minimized.

Show comment
Hide comment
@shindere

shindere Sep 4, 2018

Contributor
Contributor

shindere commented Sep 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment