-
Notifications
You must be signed in to change notification settings - Fork 6
fix: getTokenAtPositon gets the full token, even when it spans elements #41
Conversation
Codecov Report
@@ Coverage Diff @@
## master #41 +/- ##
==========================================
+ Coverage 65.24% 66.82% +1.58%
==========================================
Files 15 15
Lines 587 615 +28
Branches 155 161 +6
==========================================
+ Hits 383 411 +28
Misses 204 204
Continue to review full report at Codecov.
|
| ASCII, | ||
| /** Every token we encounter that doesn't fall into the other two TokenTypes */ | ||
| NonVariable, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Identity/Identifier/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I thought it was Identity in the go parser but its just Ident and they mention identifier in the comments. The name is confusing anyways because it can include keywords too. I'll switch it to Alphanumeric.
src/token_position.ts
Outdated
| * When tokenizing the DOM, alphanumeric characters are grouped because they are identities. | ||
| * | ||
| * We also group whitespace just in case. See `consumeNextToken` comments for more information. | ||
| * This funciton is a helper function for making sure the node is the same type of a token and if we care |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/funciton/function/
| } | ||
|
|
||
| offsetStep += text.length | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably reduce the scope of the offsetStep variable by factoring this out as a function.
src/token_position.ts
Outdated
| } | ||
|
|
||
| const endNode = textNodes[findTokenEdgeIndex(nodeIndex, 1)] | ||
| startNode = textNodes[findTokenEdgeIndex(nodeIndex, -1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startNode is a bit of a misnomer since it gets reassigned here. Maybe it makes sense to introduce a third variable and call it currentNode or nodeAtOffset.
| tokenRange.setEndAfter(endNode) | ||
|
|
||
| // Return the common ancestor as the full token. | ||
| return tokenRange.commonAncestorContainer as HTMLElement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the common ancestor sometimes be so large (i.e. too high up in the DOM tree) that it includes too many nodes (i.e. the hovered token plus some surrounding nodes)? I'm thinking it might be possible to end up with the opposite problem of what this PR solves, but I haven't taken a close enough look at the DOM for diff views to say for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If convertNode works properly, we shouldn't have to worry about this.
felixfbecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fix https://github.com/sourcegraph/sourcegraph/issues/11901?
It seems like this issue would have also been fixed by using event-positions, how much work is it to move to that vs continuing the investment in the tokenization logic?
| * @param node The node containing the token. | ||
| */ | ||
| function getTokenType(node: Node): TokenType { | ||
| const text = unescape(node.textContent || '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think unescape is correct here, textContent shouldn’t contain HTML entities, unless the code contains actual HTML entities, in which case this would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean it is incorrect here https://sourcegraph.com/github.com/sourcegraph/codeintellify/-/blob/src/token_position.ts#L112?
src/token_position.ts
Outdated
|
|
||
| const nodes: Node[] = [] | ||
|
|
||
| for (const child of Array.from(node.childNodes)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t DOM collections be iterable without converting to an array?
| * | ||
| * @param node is the node in which you want to get all of the text nodes from it's children | ||
| */ | ||
| export const getTextNodes = (node: Node): Node[] => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a good candidate to write as a generator to avoid creating all the intermediate arrays
src/token_position.ts
Outdated
| /** Tokens that are ascii characters but aren't in identies (i.e. {, }, [, ], |, ;, etc) */ | ||
| ASCII, | ||
| /** Every token we encounter that doesn't fall into the other two TokenTypes */ | ||
| NonVariable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Other”?
| const codeView = dom.createElementFromString('<div>To<span>ken</span></div>') | ||
|
|
||
| const positions = [{ line: 1, character: 1 }, { line: 1, character: 3 }] | ||
| for (const position of positions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing the same thing with multiple values it’s better to make each value an individual test (the for loop can be around the it). This makes it clearer when just one of the values start failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Just changed it to try it out and it made it less clear to me where the values were coming from and broke the flow of how the tests read to me.
|
@felixfbecker I'm not sure if it fixes that issue. It seems unrelated to me. I did this for a bug in the implementation for Phabricator. Yes, |
|
🎉 This PR is included in version 3.7.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This PR fixes an issue where only portions of a token can be highlighted. This can happen any time a token spans multiple elements but mostly happens in diffs when just part of a variable name was changed. The resulting DOM can look like this:
my<span>Token</span>. If you hover overmywithout these changes, onlymywill be highlighted and same forToken.These changes walk backwards and forwards in the list of text nodes ensuring we get the full token.