New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline comments with # getting associated with the wrong source-code element? #201
Comments
Oddly, even if I insert a junk $code = '<?php
$foo !== 2; # just because
;null;
';
$nodes = $parser->parse($code);
var_dump($nodes); Output:
The comment before the junk I really think that attempting to associate comments as attributes of nodes is a mistake - comment nodes probably should have been represented as just comment nodes. Establishing the association of comments as attributes of (e.g.) class/interface declarations, doesn't really seem like a parser concern? With the possible exception of doc-blocks which do proceed a declaration that can have the previous doc-block applied as an attribute... (?) |
Regarding the null statement, the problem is the first Regarding the actual problem: Yes, you're correct in saying that assigning comments as attributes of the next node is a rather inaccurate representation. The thing is that this parser constructs an AST, where the "A" is for "abstract", which means it's not supposed to include comments or whitespace information. Comments are primarily collected for the sake of doc comments, which are often important for analysis (and in the case of annotations are really part of the code). Comments can turn up virtually anywhere, e.g. When I'm doing source-to-source transformations that are supposed to preserve comments and whitespace, I do it by enabling file/token offsets in the lexer and directly modify the code/tokens based on the offsets stored in the ASTs. Haven't found a way to reliably to these transformations directly on the AST yet. |
Yeah, I see the problem. Hmm, too bad. I can tell you what I had in mind - I think it's a pretty cool idea, I got it from Nim's test approach, and discovered that this is valid syntax and possible in PHP as well. test(
'My test description',
function () {
$foo = 1;
$bar = 2;
$foo < $bar; # foo is less than bar
$foo + 1 === $bar; # and so forth
}
); The test(
'My test description',
function () {
$foo = 1;
$bar = 2;
ok($foo < $bar, '$foo < $bar', 'foo is less than bar');
ok($foo + 1 === $bar, '$foo + 1 === $bar', 'and so forth');
}
); From which the test-runner output would be:
I love this test format - not having to write assertion statements, but just simply writing stand-alone expressions with comments, would make for a very low-noise test format. You could even eliminate most of the comments by just giving meaningful names to variables, which would make the test (and output) practically self-explanatory. For that matter, you could even place the tests in external files, eliminating the use of closures and any test-framework API at all :-) Perhaps I can use the Lexer stand-alone without the Parser to implement this? At least that would be better than Do you know of any other implementation of a CFT for PHP? Or would it make sense to introduce an additional layer in your parser? (not asking you to do the work here of course, just asking if you think it's feasible - or would the additional intermediary CFT model cause too much CPU and memory overhead?) |
So, here is how I would go about your particular case: In the lexer, enable token end positions. Get the tokens from the current file via function getTrailingComment(array $tokens, Node $node) /* : ?string */ {
assert($node->hasAttribute('endTokenPos'));
$pos = $node->getAttribute('endTokenPos');
$endLine = $node->getAttribute('endLine');
for (; $pos < count($tokens); ++$pos) {
if (!is_array($tokens[$pos])) continue;
list($type, $content, $line) = $tokens[$pos];
if ($line > $endLine) break;
if ($type === T_COMMENT || $type === T_DOC_COMMENT) {
return $content;
}
}
return null;
}
The lexer is just token_get_all() with a few compatibility hacks for different PHP versions ;)
I think https://github.com/grom358/pharborist might be doing this. I'm not really sure about this, because the project sadly doesn't have documentation.
Having an additional parse tree layer would indeed be problematic for perf. I can only see this as a separate mode of operation. But I've never looked into this much. |
Any chance this improved with the 4.0 release? Perhaps comments will be preserved like white space without modifications? |
@mindplay-dk If formatting-preservation is enabled, then comments will be kept without modification, yes. The representation of the comments in the AST has stayed the same though. |
I seem to be running into a parser issue, but can't be sure if this is a bug or as-intended. It seems buggy.
Code:
Output:
The comment trailing the
$foo !== 2;
expression (index 1) somehow gets associated with the$bar = 2;
assignment (index 2) which seems odd.Maybe the intention here is for doc-blocks to get correctly associated with the node following the doc-block... Are all comments deliberately associated with the following node whether they're doc-blocks or not?
That seems odd, and appears to be lead to orphaned last comment nodes - example:
Output:
The comment is lost.
I don't know how you'd get around that, since comments are associated forwards with the next node, if there is no next node. I guess you'd need something like an "end of file" node with which the orphaned trailing comment could be associated?
Even so, that would seem kind of patchy, since what's causing the problem in the first place, is the fact that comments aren't treated as nodes, but as "attributes" of nodes - which is true of
/**
doc-blocks; these are attributes of the following node - but other types of comments really aren't.Perhaps this is consistent with the native PHP lexer/parser, and if so, an "end of file" node might be the most reasonable solution (?)
I'm using the release 1.3.0 version.
The text was updated successfully, but these errors were encountered: