Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting embedded language grammars when using semantic tokens #163292

Open
DanTup opened this issue Oct 11, 2022 · 3 comments
Open

Consider supporting embedded language grammars when using semantic tokens #163292

DanTup opened this issue Oct 11, 2022 · 3 comments
Assignees
Labels
feature-request Request for new features or functionality semantic-tokens Semantic tokens issues tokenization Text tokenization
Milestone

Comments

@DanTup
Copy link
Contributor

DanTup commented Oct 11, 2022

This has been discussed a little in some other issues:

The issue is that a language uses semantic tokens, injected embedded grammars do not work. The suggestion given by @alexdima at #113640 (comment) is for the embedded language to coordinate with the language server to suppress semantic tokens where this grammar needs to be used (which Rust has gone ahead with, adding an option to suppress semantic tokens on strings).

This does not seem like a very scalable solution. I had a request at Dart-Code/Dart-Code#4212 related to this where another extension is providing highlighting of some strings inside Dart. When semantic tokens are disabled, everything is fine, but with semantic tokens enabled the Dart server produces string tokens (because strings are a non-default colour) that breaks the embedded language.

Having the Dart server suppress these tokens is not a good solution because:

  1. It means strings that aren't in the embedded languages format would lose their colouring
  2. It requires an LSP server (which is intended to be generic and editor-agnostic by design) to make changes for some specific functionality of another extension (of which there could be many, with varying needs)

It would be much better if this could be done without changes to the server. I don't know what a solution to this would look like, but perhaps the injected language could be allowed to layer it's scopes over the semantic tokens (while semantic tokens are more accurate, I don't believe that's a reason to prevent this), or allow the injected language to apply specifically to some tokens (like strings) from the server (though VS Code's lack of support for multiline semantic tokens may complicate that).

If there are caveats to switching to semantic tokens, it may cause languages to think twice about switching to them (or, may lead to more users turning them off) which would be a shame.

@alexdima alexdima added feature-request Request for new features or functionality tokenization Text tokenization semantic-tokens Semantic tokens issues labels Oct 11, 2022
@alexdima alexdima removed their assignment Oct 11, 2022
@VSCodeTriageBot VSCodeTriageBot added this to the Backlog Candidates milestone Oct 11, 2022
@VSCodeTriageBot
Copy link
Collaborator

This feature request is now a candidate for our backlog. The community has 60 days to upvote the issue. If it receives 20 upvotes we will move it to our backlog. If not, we will close it. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

@VSCodeTriageBot
Copy link
Collaborator

This feature request has not yet received the 20 community upvotes it takes to make to our backlog. 10 days to go. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

@wakaztahir
Copy link

To support semantic tokens in embedded languages

Solution 1

1 - VSCode sends my LSP server a request to get semantic tokens
2 - I lex my language and reach a token for an embedded language
3 - I set a field in this semantic token to indicate embedded language start & length and which embedded language is being used

Cons :
1 - This means vscode needs to go through my semantic tokens, find the embedded language and use tokens from its own set of extensions or lsp servers
2 - lsp server might need to be started to provide semantic tokens for embedded language

Solution 2

1 - VSCode sends my LSP server a request to get semantic tokens
2 - I lex my language and when I reach a token for an embedded language
3 - I send a request back to vscode to get tokens for an embedded language (two way semanticTokens/range)
4 - vscode provides me the semantic tokens, I might need to parse these because the format is different, I add these tokens to my tokens and provide it to vscode

Cons :

1 - Harder to implement, when sending tokens, they are compressed, vscode must not compress them, when sending to server
2 - still requires lsp server to be started to provide semantic tokens for embedded language
3 - this approach is worse than approach above

The biggest problem

I don't just need semantic tokens support for embedded language, I also need support for completions & all that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality semantic-tokens Semantic tokens issues tokenization Text tokenization
Projects
None yet
Development

No branches or pull requests

5 participants