New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a hook for line directives in the lexer #823

Closed
wants to merge 1 commit into
base: trunk
from

Conversation

Projects
None yet
4 participants
@hendriktews
Contributor

hendriktews commented Sep 24, 2016

Users of the lexer (and the parser) can register a function that is called for each line directive in the source.

This patch solves the following problem: Some users of Parse.implementation want to extract pieces from the original source files for some locations in the parse tree. Otags is an example: The emacs tags file must contain a copy from the beginning of the line to the tagged identifier. The character positions in Lexing.pos_cnum in the location can however not be used to access the original source file. Lexing.pos_cnum counts characters in the lexing source. As soon as this contains line directives, pos_cnum cannot be used to seek in the original files. Because line directives are treated completely internally in the lexer, users of Parse.implementation do not have a way to detect whether there was a line directive in the source or not.

This patch adds a reference to the lexer, where one can register a hook function to be called on every line directive. With this hook otags can generate tags for files that were preprocessed with ocamlyacc, ocamllex or cppo.

provide a hook for line directives in the lexer
Users of the lexer (and the parser) can register a function that is
called for each line directive in the source.
@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 26, 2016

Contributor

With the info currently in Lexing, it is possible to get the filename (pos_fname), line number (pos_lnum) and character on that line (pos_cnum - pos_bol), but not the file offset for the beginning of the current line. So in order to implement:

The emacs tags file must contain a copy from the beginning of the line to the tagged identifier.

you would need to load the entire file and detect newlines manually. Since I assume Otags needs to read most of the file anyway, what is the drawback of this approach?

Contributor

alainfrisch commented Sep 26, 2016

With the info currently in Lexing, it is possible to get the filename (pos_fname), line number (pos_lnum) and character on that line (pos_cnum - pos_bol), but not the file offset for the beginning of the current line. So in order to implement:

The emacs tags file must contain a copy from the beginning of the line to the tagged identifier.

you would need to load the entire file and detect newlines manually. Since I assume Otags needs to read most of the file anyway, what is the drawback of this approach?

@hendriktews

This comment has been minimized.

Show comment
Hide comment
@hendriktews

hendriktews Sep 26, 2016

Contributor

You are right, otags parses the whole file via Parse.implementation or Parse.interface. Without the hook one would need to make a pass over all files to which line directives refer to in order to find the positions of the newlines in them.

Most of the files don't contain line directives. For them it is much more convenient and efficient to use pos_cnum to seek directly to the right point. The hooks makes it possible to distinguish files with and files without line directives. Moreover, with the hook one only needs to find out the positions of those lines the line directives refer to and can use offsets in between line directives or after the last line directive.

Contributor

hendriktews commented Sep 26, 2016

You are right, otags parses the whole file via Parse.implementation or Parse.interface. Without the hook one would need to make a pass over all files to which line directives refer to in order to find the positions of the newlines in them.

Most of the files don't contain line directives. For them it is much more convenient and efficient to use pos_cnum to seek directly to the right point. The hooks makes it possible to distinguish files with and files without line directives. Moreover, with the hook one only needs to find out the positions of those lines the line directives refer to and can use offsets in between line directives or after the last line directive.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Mar 9, 2017

Member

@hendriktews what does otag do at the moment? Have you observed the line-number to offset resolution to be noticeably slow? I would expect it to be a lot faster than parsing files and more generic than handling only line directives.

I'm marking this issue as suspended until it's clear we need a new hook in the lexer.

Member

diml commented Mar 9, 2017

@hendriktews what does otag do at the moment? Have you observed the line-number to offset resolution to be noticeably slow? I would expect it to be a lot faster than parsing files and more generic than handling only line directives.

I'm marking this issue as suspended until it's clear we need a new hook in the lexer.

@diml diml added the suspended label Mar 9, 2017

@hendriktews

This comment has been minimized.

Show comment
Hide comment
@hendriktews

hendriktews Mar 16, 2017

Contributor

At the moment otags relies on pos_cnum and therefore, for files that do contain line directives, otags produces wrong output or dies with a warning. I have not yet implemented a work-around, because the necessary information is available inside Parse.implementation and I am still hoping to get access to it.

I am not arguing about the efficiency of otags. I just find it odd that, without the hook, otags is forced to reread all the files a second time, while for most files this would not be necessary, because they don't contain line directives. Otags is just forced to reread everything because Parse.implementation buries the information about the presence of line directives inside it.

How about changing the hook into a boolean reference that tells whether Parse.implementation found a line directive? This would permit otags to rely on pos_cnum for all those files that don't contain line directives and to reread files a second time only in those cases where it is unavoidable, because of the presence of line directives.

Contributor

hendriktews commented Mar 16, 2017

At the moment otags relies on pos_cnum and therefore, for files that do contain line directives, otags produces wrong output or dies with a warning. I have not yet implemented a work-around, because the necessary information is available inside Parse.implementation and I am still hoping to get access to it.

I am not arguing about the efficiency of otags. I just find it odd that, without the hook, otags is forced to reread all the files a second time, while for most files this would not be necessary, because they don't contain line directives. Otags is just forced to reread everything because Parse.implementation buries the information about the presence of line directives inside it.

How about changing the hook into a boolean reference that tells whether Parse.implementation found a line directive? This would permit otags to rely on pos_cnum for all those files that don't contain line directives and to reread files a second time only in those cases where it is unavoidable, because of the presence of line directives.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Mar 20, 2017

Member

Well, the work-around is only a few lines of code and it will be more robust in general. For instance you could use it on the output of a ppx rewriter that include other files via some special [%include ...] syntax.

Given that this feature request is motivated by a performance optimization, can you please try to enable the workaround systematically for all files, and if it turns out that it causes a noticeable slowdown then we can reconsider changing Lexer?

Sorry to be pushing back on this simple PR, but I'm not very keen on adding more global state to Lexer. I know there are plenty of global variables in the compiler, but I'd rather try to get rid of them rather than add new ones.

Member

diml commented Mar 20, 2017

Well, the work-around is only a few lines of code and it will be more robust in general. For instance you could use it on the output of a ppx rewriter that include other files via some special [%include ...] syntax.

Given that this feature request is motivated by a performance optimization, can you please try to enable the workaround systematically for all files, and if it turns out that it causes a noticeable slowdown then we can reconsider changing Lexer?

Sorry to be pushing back on this simple PR, but I'm not very keen on adding more global state to Lexer. I know there are plenty of global variables in the compiler, but I'd rather try to get rid of them rather than add new ones.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell May 8, 2017

Contributor

There haven't been any more comments for two months, so I am hoping the workaround that Jeremie suggests has worked. @hendriktews If that is not the case please open another PR in due course.

Contributor

mshinwell commented May 8, 2017

There haven't been any more comments for two months, so I am hoping the workaround that Jeremie suggests has worked. @hendriktews If that is not the case please open another PR in due course.

@mshinwell mshinwell closed this May 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment