fix: getTokenAtPositon gets the full token, even when it spans elements #41

ijsnow · 2018-08-15T02:10:54Z

This PR fixes an issue where only portions of a token can be highlighted. This can happen any time a token spans multiple elements but mostly happens in diffs when just part of a variable name was changed. The resulting DOM can look like this: my<span>Token</span>. If you hover over my without these changes, only my will be highlighted and same for Token.

These changes walk backwards and forwards in the list of text nodes ensuring we get the full token.

codecov · 2018-08-15T02:12:05Z

Codecov Report

Merging #41 into master will increase coverage by 1.58%.
The diff coverage is 97.77%.

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
+ Coverage   65.24%   66.82%   +1.58%     
==========================================
  Files          15       15              
  Lines         587      615      +28     
  Branches      155      161       +6     
==========================================
+ Hits          383      411      +28     
  Misses        204      204

Impacted Files	Coverage Δ
src/token_position.ts	`90.9% <97.77%> (+2.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 178b32e...c4ddcb1. Read the comment docs.

chrismwendt · 2018-08-15T02:45:04Z

src/token_position.ts

+    ASCII,
+    /** Every token we encounter that doesn't fall into the other two TokenTypes */
+    NonVariable,
+}


s/Identity/Identifier/?

Yea, I thought it was Identity in the go parser but its just Ident and they mention identifier in the comments. The name is confusing anyways because it can include keywords too. I'll switch it to Alphanumeric.

chrismwendt · 2018-08-15T03:25:04Z

src/token_position.ts

+ * When tokenizing the DOM, alphanumeric characters are grouped because they are identities.
+ *
+ * We also group whitespace just in case. See `consumeNextToken` comments for more information.
+ * This funciton is a helper function for making sure the node is the same type of a token and if we care


s/funciton/function/

chrismwendt · 2018-08-15T03:37:40Z

src/token_position.ts

        }
+
+        offsetStep += text.length
+    }


You could probably reduce the scope of the offsetStep variable by factoring this out as a function.

chrismwendt · 2018-08-15T03:39:07Z

src/token_position.ts

+    }
+
+    const endNode = textNodes[findTokenEdgeIndex(nodeIndex, 1)]
+    startNode = textNodes[findTokenEdgeIndex(nodeIndex, -1)]


startNode is a bit of a misnomer since it gets reassigned here. Maybe it makes sense to introduce a third variable and call it currentNode or nodeAtOffset.

chrismwendt · 2018-08-15T03:45:12Z

src/token_position.ts

+    tokenRange.setEndAfter(endNode)
+
+    // Return the common ancestor as the full token.
+    return tokenRange.commonAncestorContainer as HTMLElement


Could the common ancestor sometimes be so large (i.e. too high up in the DOM tree) that it includes too many nodes (i.e. the hovered token plus some surrounding nodes)? I'm thinking it might be possible to end up with the opposite problem of what this PR solves, but I haven't taken a close enough look at the DOM for diff views to say for sure.

If convertNode works properly, we shouldn't have to worry about this.

felixfbecker

Does this fix https://github.com/sourcegraph/sourcegraph/issues/11901?

It seems like this issue would have also been fixed by using event-positions, how much work is it to move to that vs continuing the investment in the tokenization logic?

felixfbecker · 2018-08-15T08:52:43Z

src/token_position.ts

+ * @param node The node containing the token.
+ */
+function getTokenType(node: Node): TokenType {
+    const text = unescape(node.textContent || '')


I dont think unescape is correct here, textContent shouldn’t contain HTML entities, unless the code contains actual HTML entities, in which case this would be wrong.

Does that mean it is incorrect here https://sourcegraph.com/github.com/sourcegraph/codeintellify/-/blob/src/token_position.ts#L112?

felixfbecker · 2018-08-15T08:53:39Z

src/token_position.ts

+
+    const nodes: Node[] = []
+
+    for (const child of Array.from(node.childNodes)) {


Shouldn’t DOM collections be iterable without converting to an array?

felixfbecker · 2018-08-15T08:55:18Z

src/token_position.ts

+ *
+ * @param node is the node in which you want to get all of the text nodes from it's children
+ */
+export const getTextNodes = (node: Node): Node[] => {


This function is a good candidate to write as a generator to avoid creating all the intermediate arrays

felixfbecker · 2018-08-15T08:56:55Z

src/token_position.ts

+    /** Tokens that are ascii characters but aren't in identies (i.e. {, }, [, ], |, ;,  etc) */
+    ASCII,
+    /** Every token we encounter that doesn't fall into the other two TokenTypes */
+    NonVariable,


“Other”?

felixfbecker · 2018-08-15T08:58:12Z

src/token_position.test.ts

+            const codeView = dom.createElementFromString('<div>To<span>ken</span></div>')
+
+            const positions = [{ line: 1, character: 1 }, { line: 1, character: 3 }]
+            for (const position of positions) {


When testing the same thing with multiple values it’s better to make each value an individual test (the for loop can be around the it). This makes it clearer when just one of the values start failing.

I disagree. Just changed it to try it out and it made it less clear to me where the values were coming from and broke the flow of how the tests read to me.

ijsnow · 2018-08-15T17:38:48Z

@felixfbecker I'm not sure if it fixes that issue. It seems unrelated to me. I did this for a bug in the implementation for Phabricator.

Yes, event-position solved this problem already but using it here, as we've discussed, would take a lot more changes. I had these changes slated for last iteration but we didn't get those done. At this point, its more important to get code hosts updated and making this change was quicker for me than moving to event-positions. Each time I make a change to hoverifier.ts, I have a lot more trouble than expected, so I'm just making changes where I feel confident I can get the work done without losing a bunch of time.

sourcegraph-bot · 2018-08-15T19:06:54Z

🎉 This PR is included in version 3.7.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

fix: getTokenAtPositon gets the full token, even when it spans elements

54cd23d

ijsnow requested a review from chrismwendt August 15, 2018 02:12

chrismwendt approved these changes Aug 15, 2018

View reviewed changes

felixfbecker reviewed Aug 15, 2018

View reviewed changes

refactor: address review comments

c4ddcb1

ijsnow merged commit 9938078 into master Aug 15, 2018

ijsnow deleted the get-full-token branch August 15, 2018 19:05

ijsnow mentioned this pull request Sep 6, 2018

Convert node changes text #55

Merged


		const nodes: Node[] = []

		for (const child of Array.from(node.childNodes)) {

fix: getTokenAtPositon gets the full token, even when it spans elements #41

fix: getTokenAtPositon gets the full token, even when it spans elements #41

Uh oh!

Conversation

ijsnow commented Aug 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixfbecker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijsnow commented Aug 15, 2018

Uh oh!

sourcegraph-bot commented Aug 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ijsnow commented Aug 15, 2018 •

edited

Loading

codecov bot commented Aug 15, 2018 •

edited

Loading