Allow language extensions to have consistent symbol matching for document and workspace #34605

gilmoreorless · 2017-09-18T23:37:23Z

There’s an inconsistency in how language extensions are able to match symbols for a single document versus a workspace.

I’ll explain with an example. Assume for the sake of simplicity that all files in a workspace are the same language, and there is only one extension registered for that language.

When a user chooses “Go to Symbol in File”:

If the language extension has registered a DocumentSymbolProvider, the provideDocumentSymbols method is called with the currently-focused TextDocument.
The language extension returns a list of all symbols in the document.
VS Code filters the full list to match what the user has typed.
VS Code ranks and highlights matched string parts.

When a user chooses “Go to Symbol in Workspace”:

If the language extension has registered a WorkspaceSymbolProvider, the provideWorkspaceSymbols method is called with a query string.
The language extension returns a list of symbols in the workspace that match the query string. In this case, the extension filters the list to match what the user has typed.
VS Code ranks and highlights matched string parts.

So the main difference is that for a document VS Code is doing the string matching, but for a workspace the extension is doing the string matching. This produces inconsistencies because they are not using the same matching algorithm. The problem is the same regardless of whether the language extension is doing the symbol matching within the extension process or as part of a client/server model.

This difference in behaviour has led to multiple issues in the past (e.g. #20039 – "Go to symbol" looks like only take care of upper cases). Within my vscode-zoneinfo extension, I’m using a knowingly-naïve string matching method for the provideWorkspaceSymbols method, because I know that I’m almost guaranteed to have different results from the single-file symbol matching.

I don’t have a definitive solution, but I do have some suggestions.

Option 1: Change `provideDocumentSymbols` to accept a query string

Pro:
If the provideDocumentSymbols method accepted an extra argument for the query string, an extension could then use the same internal methods for matching document and workspace symbols consistently.

Con:
Of course, this continues to have the current downside that different extensions will match symbols in different ways, so there will still be inconsistent results across languages.

Option 2: Provide a shared API for symbol name matching

There have been many issues raised previously about the fuzzy-ish string matching behaviour in VS Code, collected in a single meta-issue at #27317. If there are going to be changes to the string matching behaviour, it would be a good opportunity to provide the matcher as an API for extensions.

Pro:
If an extension’s provideWorkspaceSymbols method could call something like vscode.matchString(symbolName, query), there would be consistency between document and workspace symbol matching.

Con:
This would only work for language extensions that find symbols within the extension process. I presume that most language extensions use the language server protocol instead, so they won’t be able to use a vscode.* API.

A partial solution to this could be to provide the string matcher as a separate Node.js module (or use one of a few existing ones), which can then be included in VS Code as well as a language server. Unfortunately that only helps language servers that use the Node.js ecosystem. Thinking out loud... maybe provide an open abstract description of the matching algorithm, so that other language servers could implement it as well? Though that has veered wildly into massively over-engineering a solution to a relatively minor problem...

The text was updated successfully, but these errors were encountered:

jrieken · 2017-09-19T07:13:57Z

So the main difference is that for a document VS Code is doing the string matching, but for a workspace the extension is doing the string matching.

Yeah, the reasons for that is that workspace symbols can be in ten-thousands and sending them all back and forth might be to expensive for a language service. That why we provide the query string, ideally we full access to all models.

For document symbols we expect less symbols and we also have a display of all symbols (in their order, in future in a hierarchy). Then we take on filtering and highlighting.

I think these constraints remain and that we should spec how we expect language servers to interpret/match that query. Many do a "starts-with" or "indexOf" match but would favour a more lax subsequent string matching. So, foo matches For you because all letters f, o, o appear in that order, case-insensitive, in the target string.

gilmoreorless · 2017-09-19T10:56:03Z

Yep, the constraints make sense, especially regarding a workspace (which is why I suggested making provideDocumentSymbols work more like provideWorkspaceSymbols, rather than the other way around).

Many do a "starts-with" or "indexOf" match but would favour a more lax subsequent string matching. So, foo matches For you because all letters f, o, o appear in that order, case-insensitive, in the target string.

That’s the approach I ended up taking with my extension in lieu of proper fuzzy searching, but a lot of the results don’t get the matched parts highlighted in the picker:

That’s not to say that the matching and/or highlighting is wrong, just inconsistent due to the different processes involved.

jrieken · 2017-09-19T13:30:26Z

That’s not to say that the matching and/or highlighting is wrong, just inconsistent due to the different processes involved.

Yeah, we have a strong matching algorithm that we use for IntelliSense and we should also use it here.

jrieken · 2017-12-11T11:51:46Z

Closing this as we have updated the docs about this

vscodebot bot added the api label Sep 18, 2017

ramya-rao-a assigned jrieken Sep 19, 2017

jrieken added the under-discussion Issue is under discussion for relevance, priority, approach label Sep 19, 2017

jrieken added the editor-symbols definitions, declarations, references label Sep 19, 2017

jrieken added a commit that referenced this issue Sep 22, 2017

Add a word or two about the query-argument, #34605

e59d629

deysaikat95 mentioned this issue Sep 25, 2017

Update from original (#1) #34941

Closed

jrieken closed this as completed Dec 11, 2017

vscodebot bot locked and limited conversation to collaborators Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow language extensions to have consistent symbol matching for document and workspace #34605

Allow language extensions to have consistent symbol matching for document and workspace #34605

gilmoreorless commented Sep 18, 2017

jrieken commented Sep 19, 2017

gilmoreorless commented Sep 19, 2017

jrieken commented Sep 19, 2017

jrieken commented Dec 11, 2017

Allow language extensions to have consistent symbol matching for document and workspace #34605

Allow language extensions to have consistent symbol matching for document and workspace #34605

Comments

gilmoreorless commented Sep 18, 2017

Option 1: Change provideDocumentSymbols to accept a query string

Option 2: Provide a shared API for symbol name matching

jrieken commented Sep 19, 2017

gilmoreorless commented Sep 19, 2017

jrieken commented Sep 19, 2017

jrieken commented Dec 11, 2017

Option 1: Change `provideDocumentSymbols` to accept a query string