Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make updating lex_curr_p/lex_start_p optional #1590

Merged
merged 4 commits into from May 31, 2018

Conversation

Projects
None yet
4 participants
@alainfrisch
Copy link
Contributor

commented Jan 30, 2018

Follow up to #1585, where I noticed that disabled the update of lex_curr_p/lex_start_p can bring significant speedup to lexers generated by ocamllex. Since #1585, lexers generated by "ocamllex -ml" don't update these fields if lex_curr_p == Lexing.dummy_pos (the lexer actions need to be careful not to break this property).

This PR document this behavior, and port it to lexers generated by ocamllex (without -ml).

A slightly tangential question is whether Lexing.lexeme_start should return buf.lex_abs_pos + buf.lex_start_pos instead of lexbuf.lex_start_p.pos_cnum as currently (and similarly for lexeme_end). This would make the function usable even when updating the position fields is disabled. Do people see use cases where the new definition would not work?

@alainfrisch alainfrisch force-pushed the alainfrisch:ocamllex_dont_track_pos branch from 6df58aa to 6877c43 Jan 30, 2018

@alainfrisch alainfrisch added the stdlib label Feb 28, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2018

@let-def : do you have an opinion on this small change?

@diml

This comment has been minimized.

Copy link
Member

commented May 16, 2018

Personally I find the motivation valid, but the API a bit error prone. For instance if some code outside of lexing.ml modifies lex_curr_p without comparing it to Lexing.dummy_pos, then the results are going to be random.

Couldn't we arrange things so that when updating lex_curr_p and lex_start_p is disabled, these fields have type unit?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2018

Couldn't we arrange things so that when updating lex_curr_p and lex_start_p is disabled, these fields have type unit?

I don't really see precisely how to do that, and even if it were possible, this would prevent using the same code in different contexts where the locations need to be updated or not. With the current API and the proposed PR, if the lexer is properly written (so as not to update lex_curr_p/lex_start_p manually without checking for dummy first), it can be used in the two modes; the caller decides if locations need to be tracked or not. This allows in particular running a parser first without collecting locations, and running it again collecting location in case an error needs to be reported.

@diml

This comment has been minimized.

Copy link
Member

commented May 16, 2018

I suppose it could be done with a GADT:

type 'a lexbuf_kind =
  | With_positions : position lexbuf_kind
  | Without_positions : unit lexbuf_kind

type _ generic_lexbuf =
  { lex_kind : 'a lexbuf_kind
  ; mutable lex_curr_p : 'a
  ; ...
  }

type lexbuf = position generic_lexbuf

then lexers that have been updated to support both modes would take a generic_lexbuf.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2018

With the current API, the worst thing that can happen is that people fail to benefit from the optimization if they are not careful about how they write their code. I don't think this justifies the added complexity of the API you suggest.

For instance if some code outside of lexing.ml modifies lex_curr_p without comparing it to Lexing.dummy_pos, then the results are going to be random.

Well, such code would already behave in a weird way if the lexer keeps its initial dummy_pos fields (with cnum = -1, for instance).

@diml

This comment has been minimized.

Copy link
Member

commented May 17, 2018

Hmm, I'm still not fully convinced, but I'm OK with this implementation if we can make the API express the intent a bit better. At least to make it easier to understand what the programmer had in mind when reading code using this feature.

For instance, we can add an argument ?update_positions:bool to the creation functions and a new function should_update_positions : lexbuf -> bool. IMO, the physical equality check shouldn't be documented and left as an implementation detail.

@damiendoligez damiendoligez added this to the 4.07 milestone May 29, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member

commented May 29, 2018

Note that this should be ported to 4.07 to avoid a discrepancy between the two modes of ocamllex.

@alainfrisch @diml can we converge quickly?

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2018

For instance, we can add an argument ?update_positions:bool to the creation functions and a new function should_update_positions : lexbuf -> bool

Do you suggest (1) extending the lexbuf record with an extra field, or just (2) having should_update_positions defined as fun buf -> buf.lex_curr_p != dummy_pos?

I'm happy with (2), but it does not seem compatible with the idea of adding the ?update_positions:bool argument to creation functions.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2018

Note that this should be ported to 4.07 to avoid a discrepancy between the two modes of ocamllex.

I think it's only a performance discrepancy (except for people who rely on dummy_pos being updated, but this would seem a bit crazy).

@diml

This comment has been minimized.

Copy link
Member

commented May 30, 2018

I suggest (2). I think it's compatible with adding the ?update_positions:bool argument: lex_curr_p would be initialised as follow in lexing.ml:

lex_curr_p = if update_positions then zero_pos else dummy_pos

The idea is that if one sees lexbuf.lex_curr_p <- Lexing.dummy_pos in user code, it's not really clear why the code is doing this. On the other hand it's quite clear what Lexing.from_string s ~update_positions:false is expected to do.

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2018

Ah right, sorry, I forgot that functions building lexbufs were using zero_pos, not dummy_pos. I like your proposal, now! Just wondering about the risk of changing the API for 4.07. The extra optional argument could break some code; the risk is small, but do we want to take it so close from a release?

@alainfrisch alainfrisch force-pushed the alainfrisch:ocamllex_dont_track_pos branch from 6877c43 to 98556bc May 31, 2018

@alainfrisch alainfrisch force-pushed the alainfrisch:ocamllex_dont_track_pos branch from 1fdd39a to dfdea59 May 31, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 31, 2018

@diml : I've implemented your proposal, with slightly different names. Are you ok with the names, and the documentation?

I've left the Changes entry in the 4.07 section in order to simplify cherry-picking if we decide to have this in 4.07.

@gasche

This comment has been minimized.

Copy link
Member

commented May 31, 2018

I think that this is a nice change, but I don't feel a particular urgency to get it into 4.07 rather than trunk.

@diml

diml approved these changes May 31, 2018

Lexers can optionally maintain the [lex_curr_p] and [lex_start_p]
position fields. This "position tracking" mode is the default, and
it corresponds to passing [~with_position:true] to functions that
create lexer buffer. In this mode, the lexing engine and lexer

This comment has been minimized.

Copy link
@diml

diml May 31, 2018

Member

this should be plural: buffers*

(** Create a lexer buffer with the given function as its reading method.
When the scanner needs more characters, it will call the given
function, giving it a byte sequence [s] and a byte
count [n]. The function should put [n] bytes or fewer in [s],
starting at index 0, and return the number of bytes
provided. A return value of 0 means end of input. *)

val with_positions : lexbuf -> bool
(** Tell whether the lexer buffer keeps track of position fields
[lex_curr_p] / [lex_start_p], as determined by the correspond

This comment has been minimized.

Copy link
@diml

diml May 31, 2018

Member

corresponding*

val with_positions : lexbuf -> bool
(** Tell whether the lexer buffer keeps track of position fields
[lex_curr_p] / [lex_start_p], as determined by the correspond
optional argument for function that create lexer buffers

This comment has been minimized.

Copy link
@diml

diml May 31, 2018

Member

functions*

@diml
Copy link
Member

left a comment

I ticked the wrong box...

@diml

This comment has been minimized.

Copy link
Member

commented May 31, 2018

@alainfrisch, yes, they look good to me

alainfrisch added some commits May 31, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented May 31, 2018

Typos fixed. Changes entry fixed and moved to "Working version".

@diml If you agree not to include that in 4.07, can you approve and possibly merge this PR (I'd use "Squash and merge")?

@diml

diml approved these changes May 31, 2018

@diml diml merged commit 4f581b8 into ocaml:trunk May 31, 2018

1 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@diml

This comment has been minimized.

Copy link
Member

commented May 31, 2018

Sure, done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.