Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic tokens: add module name support and improve performance and accuracy by traversing the hieAst along with source code #3958

Merged
merged 84 commits into from
Jan 29, 2024

Conversation

soulomoon
Copy link
Collaborator

@soulomoon soulomoon commented Jan 15, 2024

fix #3957

Things have been done:

  1. Switch Name to Identifier in the implementation and add ModuleName to the HsSemanticTokenType
  2. Strip ` ` and (), and split out qualified names. e.g.`Preclude.length` to Preclude. length
  3. add tokenizer to walk ast with the souce rope to get more accurate result and faster. Should fix Semantic Tokens treats a standalone deriving strategy as type parameter #3983.
  4. add type sig to semanticConfig's TH result

@soulomoon soulomoon self-assigned this Jan 15, 2024
@soulomoon soulomoon marked this pull request as ready for review January 15, 2024 09:34
@soulomoon soulomoon enabled auto-merge (squash) January 15, 2024 09:34
@michaelpj
Copy link
Collaborator

This looks like it's still a work in progress?

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jan 17, 2024

This looks like it's still a work in progress?

WIP, I decicded to do the module name split in here as well.

@michaelpj
Copy link
Collaborator

Looks like we could also do module <name> where but maybe that's not in the HIE ast?

@michaelpj michaelpj marked this pull request as draft January 17, 2024 11:17
@michaelpj
Copy link
Collaborator

The issue with parens is actually kind of tricky. Consider the following cases:

  1. (+): an operator section, maybe the whole thing should be highlighted as a function? Probably shouldn't be an operator since it's not infix in this case? Or maybe it should be an operator? I don't know.
  2. (head): just a plain function, parenthesized
  3. (Prelude.+): again an operator section, but qualified
  4. (Prelude.head): again a function, but parenthesized

My feeling is that 1 should be highlighted with the whole thing as a function, since the parens are "part of" the syntax there (like the backticks in `isSubsetOf`); in 2 and 4 the parens should not be highlighted, and in 3 I'm confused.

I think maybe we're already getting sort of the right thing? I was confused why we were getting the parens in the span for (Prelude.+) but it makes sense since it's an operator section, not just a random parenthesized thing.

In this case I almost think that the cleanest thing to do would be to just add an (overlapping) token for the Prelude bit, and then split the token for the function on either side, so you'd have [<function>(][<module>Prelude][<operator>.+)] or something.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jan 17, 2024

2, 4 won't happen, () is only included if it is operator inside.
we should worry about the `Prelude.head` which suffered the same thing as (Prelude.+)

@michaelpj
Copy link
Collaborator

Okay, so the parentheses are only included in the span if it's an operator section. That seems fine. The question is whether we should highlight the parens when it's an operator section.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jan 17, 2024

In this case I almost think that the cleanest thing to do would be to just add an (overlapping) token for the Prelude bit, and then split the token for the function on either side, so you'd have [(][Prelude][.+)] or something.

Having overlapping is the ideal situation. But we need to consider the fall back situation with no overlapping now. Since most lsp client does not support it yet.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jan 17, 2024

Before we support overlapping tokens.

The following might be a good idea:
Drop the ` `, () for now, as suggested by @konn #3969 (comment).
We do not need to consider the special split involed "``, ()" for now.

@michaelpj
Copy link
Collaborator

But if we drop the parens for operator sections around qualified names, we should do it for normal operator sections like (+) too for consistency

@soulomoon
Copy link
Collaborator Author

yes this is what I have in mind too.

@soulomoon soulomoon changed the title add module name support for semantic tokens in imported section add module name support for semantic tokens Jan 17, 2024
@soulomoon soulomoon marked this pull request as ready for review January 17, 2024 15:49
@soulomoon soulomoon enabled auto-merge (squash) January 17, 2024 15:52
@soulomoon soulomoon enabled auto-merge (squash) January 17, 2024 15:52
@soulomoon soulomoon added merge me Label to trigger pull request merge and removed merge me Label to trigger pull request merge labels Jan 17, 2024
@soulomoon soulomoon marked this pull request as draft January 17, 2024 16:03
@soulomoon soulomoon marked this pull request as ready for review January 17, 2024 16:17
@soulomoon
Copy link
Collaborator Author

soulomoon commented Jan 28, 2024

@wz1000 Thank you for your detailed reviews. Many good advices, I have made some adjustments to the code.

@wz1000
Copy link
Collaborator

wz1000 commented Jan 29, 2024

another thing, though this doesn't have to be addressed in this MR and could be a follow up - now that we are traversing the entire document in order, do we still need to accumulate results into a M.Map Range HsSemanticTokenType? Couldn't we instead directly accumulate a [(Range, HsSemanticTokenType)]?

@soulomoon
Copy link
Collaborator Author

another thing, though this doesn't have to be addressed in this MR and could be a follow up - now that we are traversing the entire document in order, do we still need to accumulate results into a M.Map Range HsSemanticTokenType? Couldn't we instead directly accumulate a [(Range, HsSemanticTokenType)]?

Good point, I'll take a look some time later if there is optimization space we can do on this.

@soulomoon soulomoon enabled auto-merge (squash) January 29, 2024 09:13
@soulomoon soulomoon merged commit 8dfcaf8 into haskell:master Jan 29, 2024
39 checks passed
@soulomoon
Copy link
Collaborator Author

another thing, though this doesn't have to be addressed in this MR and could be a follow up - now that we are traversing the entire document in order, do we still need to accumulate results into a M.Map Range HsSemanticTokenType? Couldn't we instead directly accumulate a [(Range, HsSemanticTokenType)]?

One thing I am not sure about would be wether the mappings of range change would change the range ordering.
If the ordered would change, we are forced to sort it again.

@wz1000
Copy link
Collaborator

wz1000 commented Feb 4, 2024

One thing I am not sure about would be wether the mappings of range change would change the range ordering. If the ordered would change, we are forced to sort it again.

you mean the positionMappings? I'm pretty sure that they are guaranteed to be monotonic, so I don't think they would change sort order.

@soulomoon
Copy link
Collaborator Author

Yes, thanx for clearing this out!

soulomoon added a commit that referenced this pull request Feb 7, 2024
A follow up of #3958 , we have added a tokenizor to walk the hieAst along with the file rope, it means we no longer need to do the detour of storing temperal result as Map Range (Set identifier), instead we can optimize by fusing most of the logic into tokenizer and return [(Range, HsSemanticTokenType)] directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants