Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Optimize "ocamllex -ml" #1585

Merged
merged 11 commits into from Jan 30, 2018

Conversation

Projects
None yet
3 participants
@alainfrisch
Copy link
Contributor

alainfrisch commented Jan 26, 2018

ocamllex has a -ml mode, where the lexing automaton is translated to OCaml functions instead of the usual approach of generating binary tables interpreted by the C runtime. This PR tweaks ocamllex to produce better code in -ml mode.

Benchmarks are included in the PR, and available on https://github.com/alainfrisch/ocaml/blob/optim_ocamllex_ml/experimental/frisch/bench_ocamllex_optims.md .

Long story short: in native code, the optimizations brings almost a 2x speedup for the part of lexing under the responsibility of ocamllex (i.e. excluding actions). This materializes with a smaller but significant speedup e.g. when lexing OCaml source code with the OCaml lexer. While ocamllex -ml was a bit slower than ocamllex, it becomes significantly faster.

One might thus consider switching to ocamllex -ml to compile OCaml itself (at least for ocamlc.opt/ocamlopt.opt).

In bytecode, ocamllex -ml remains much slower than ocamllex, but the gap is reduced is bit. I believe that the impact for Javascript backends might be significant (not tested).

Optimizations:

  • Inlining functions corresponding to "states", when they are called once or are trivial => fewer function calls, more effective constant propagation.

  • Also avoid a function call to fetch the next character when no buffer refill is needed.

  • Keep more state in OCaml variables, synchronizing with the lexbuf structure only when needed => fewer memory reads/writes.

  • Avoid (non-)allocation of lex_mem when its size is 0 (surprisingly, this brings some significant speedup).

About "keeping more state in OCaml variables":

  • last_action is never stored in lexbuf, but always kept in an OCaml variable (moreover, some constant propagation is performed -- I believe this is only useful to make some closures constant in the "refill handler" case).

  • buffer_len, buffer, curr_pos, last_pos are read from the lexbuf
    when the automaton starts or after refilling the buffer.

  • curr_pos, last_pos are stored to the lexbuf when the automaton returns its result (before passing the hand to the action) or before refilling the buffer.

  • buffer_len, buffer are never modified by the automaton and thus never written back to the lexbuf.

Notes:

  • This has not been extensively tested yet with the "refill handler" feature (only tested that one lexer works with an identity "handler").

  • I've only tested the case where the string is in memory (no refill needed). I suspect that refill is rare enough to not affect performances, but correctness should be tested.

  • A good fraction of the remaining time (esp. with trivial actions) is caused by the update of the lex_curr_p field. It might be worth having a mode where the lexer specification tells ocamllex not to update the field (either because the lexer does not need to track position, or track them by character position, not using Lexing.position).

alainfrisch added some commits Jan 26, 2018

Do not update lex_start_p/lex_curr_p when lex_curr_p == Lexing.dummy_…
…pos.

This allows the client code to disable update of those fields, which
accounts for a significant amount of time spend in the generated code.

Perhaps this should rather be controlled more statically (to avoid a
runtime check -- but it is very cheap), by some directive in the lexer
specification itself (global, or on each rule).
@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

alainfrisch commented Jan 26, 2018

I've added an extra optimization: when lexbuf.lex_curr_p is initially physically equal to Lexing.dummy_pos, the generated code does not update it (in ocamllex -ml mode). The same optimization could trivially be applied for the standard mode (in Lexing.engine/Lexing.new_engine). The speedup is quite impressive.

end
else begin
pr ctx {|
let next_char, _buf, _len, _curr, _last =

This comment has been minimized.

@alainfrisch

alainfrisch Jan 26, 2018

Author Contributor

This depends critically on the optimized compilation scheme for tuple-binding. We would not want to allocate the 4-tuple for every character.

@gasche
Copy link
Member

gasche left a comment

(cc @let-def in his quality of lexer expert :-)

I haven't done a proper review (yet?). I have two questions:

  • is there a risk of interaction between the new OCaml variables you declare and the lexical scope of user code in semantic actions?
  • could you show (parts of) generated code before and after the fact, as a more concrete way to understand the changes that you performed?

let output_auto_defs ctx =
if ctx.has_refill then
pr ctx {|

This comment has been minimized.

@gasche

gasche Jan 28, 2018

Member

I am mildly worried that the implicit scope escapes here would have the result that an ocamllex compiled from Windows would use \r\n-style line escapes, while an ocamllex compiled from Linux would use \n-style line escape -- it seems desirable that two developers on the same project would generate exactly the same lexer code.

This comment has been minimized.

@let-def

let-def Jan 30, 2018

Contributor

Side remark: while reviewing, I tweaked the generator to output indented code. That's especially useful to follow the levels of inlining.
I had to switch to back to strings with explicit \n to have better control in the part I touched. I am not fond of the raw strings escaping.
You can find the patch here https://gist.github.com/9ac8e5f57dec8e52bd3c52cbb882f3f3.

This comment has been minimized.

@alainfrisch

alainfrisch Jan 30, 2018

Author Contributor

Thanks! Patch integrated. I've removed remaining uses of quoted strings, and fixed a bit more the indentation, also for the "refill handler" mode.

(last_action ctx)
(if ctx.has_refill then " k" else "")
in
let ctx = {has_refill; oc; goto_state; last_action=None} in

This comment has been minimized.

@gasche

gasche Jan 28, 2018

Member

I think it would be nicer to move this big blob of code to its own auxiliary function (the current proposal is "mixing abstraction levels").

This comment has been minimized.

@alainfrisch

alainfrisch Jan 30, 2018

Author Contributor

Is the new code better? Another approach would be to avoid the first-class function in the "context", but this would force to make many functions mutually recursive.

@let-def
Copy link
Contributor

let-def left a comment

Overall, the code looks fine to me.
@gasche concerns about semantic changes are right: user actions can now executes in a environment where variables _curr_p, _buf, _len, _last and _curr are bound while previously only variables prefixed by __ would leak. However a minor restructuring of the code would avoid that.

Maybe this can be changed?


let output_auto_defs ctx =
if ctx.has_refill then
pr ctx {|

This comment has been minimized.

@let-def

let-def Jan 30, 2018

Contributor

Side remark: while reviewing, I tweaked the generator to output indented code. That's especially useful to follow the levels of inlining.
I had to switch to back to strings with explicit \n to have better control in the part I touched. I am not fond of the raw strings escaping.
You can find the patch here https://gist.github.com/9ac8e5f57dec8e52bd3c52cbb882f3f3.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

alainfrisch commented Jan 30, 2018

is there a risk of interaction between the new OCaml variables you declare and the lexical scope of user code in semantic actions?

Now fixed.

alainfrisch added some commits Jan 30, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

alainfrisch commented Jan 30, 2018

Before/after, withµ/without the "refill handler" feature:

https://gist.github.com/alainfrisch/49f9a75db9133f70d8d6f628910c4805

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

alainfrisch commented Jan 30, 2018

@let-def I think I've addressed your request for change. Can you confirm?

Do you guys think I should "port" the optimization related to not updating lex_start_p, lex_cur_p when lex_start_p == Lexing.dummy_pos to the non-ml mode? Either way, this should definitely be documented somewhere.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 30, 2018

The code is much nicer with the refactoring commits, thanks.

@let-def

This comment has been minimized.

Copy link
Contributor

let-def commented Jan 30, 2018

@alainfrisch Yes thanks, I review the changes.

@gasche gasche merged commit 1030265 into ocaml:trunk Jan 30, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

alainfrisch commented Jan 30, 2018

Ah, you were fast and merged before I could remove the [WIP] from the title :)

I'll open another PR for updating to the documentation and porting the optimization (not updating lex_curr_p/lex_start_p) to the non-ml mode.

alainfrisch added a commit that referenced this pull request Feb 5, 2018

alainfrisch added a commit that referenced this pull request Feb 5, 2018

alainfrisch added a commit that referenced this pull request Feb 5, 2018

alainfrisch added a commit that referenced this pull request Feb 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.