"Open symbol by name" not fuzzy matching #697

aniforprez · 2019-03-05T10:43:25Z

Using the "Open symbol by name" feature in VSCode, when fuzzy searching for a symbol with Microsoft Python Language Server enabled, the list of symbols returns absolutely nothing. In contrast, when using Jedi, I'm getting results.

With Jedi off:

With Jedi on:

Searching for the symbol exactly works just fine

jakebailey · 2019-03-05T16:39:21Z

Currently, the symbol index we've implemented uses case-insensitive substrings for matching, so this is "expected" but stands to be improved.

Note that this only applies to workspace symbols, which has a query string sent to the language server. Document symbols (and the outline) have no query, so VS Code is able to do the fuzzy matching itself over the result we return.

aniforprez · 2019-06-05T07:15:02Z

Hey any ETA on this? This is the only thing holding me back from using this extension and it's a pretty big oversight. Searching through symbols over a huge repository without fuzzy search is a huge PITA

jakebailey · 2019-06-06T18:32:35Z

No specific ETA, there are other things we're trying to improve and this hasn't been worked on. It's not an oversight per se, when I implemented the index I just did the minimum working implementation and knew fuzzy matching would be better.

To add this, we'll have to go look at the fuzzy logic the extension uses when looking through its ctags database (https://github.com/microsoft/vscode-python/blob/8f4a3070c3973ddfbb0841decfcb02cccf9f32af/src/client/workspaceSymbols/parser.ts#L155) and do something similar.

jakebailey · 2019-06-06T18:44:07Z

This isn't a real solution, but you can also use VS Code's builtin search tool, which uses ripgrep under the hood (and is very effective, but of course will lack any Python-specific info).

aniforprez · 2019-06-06T20:18:41Z

Ah it's a shame this isn't priority. I will wait for the day it is added because the extension is quite good except for this one grievous oversight. I'm aware of the search tool but it's quite a bit unwieldy and doesn't fuzzy match

jahan01 · 2019-10-28T12:44:39Z

I was following this one microsoft/vscode#33746, but should have followed it here. +1 for this feature

ThiefMaster · 2020-03-11T11:22:40Z

+1 on this: I'm very much used to this from PyCharm, so right now I basically have to choose between:

using pycharm, and not being able to do proper remote development
using vscode+JEDI, and not being able to autogenerate imports
using vscode+PLS, and not being able to jump to symbols in a fuzzy way

jrieken · 2020-03-11T11:47:10Z

index I just did the minimum working implementation and knew fuzzy matching would be better.

We recommend minimal work from extensions and strongly suggest to simply search for all query characters in their order inside "symbol strings". We'll then do the rest, our implemention of that, for reference and copying:

export function isPatternInWord(patternLow: string, patternPos: number, patternLen: number, wordLow: string, wordPos: number, wordLen: number): boolean {
	while (patternPos < patternLen && wordPos < wordLen) {
		if (patternLow[patternPos] === wordLow[wordPos]) {
			patternPos += 1;
		}
		wordPos += 1;
	}
	return patternPos === patternLen; // pattern must be exhausted
}

https://github.com/microsoft/vscode/blob/21dc66054203ab742d36be9f0ef6ecb774ae62f2/src/vs/base/common/filters.ts#L506-L514

aniforprez · 2020-03-11T11:50:21Z

Ah thanks Johannes. I was wondering why the extension had to do the fuzzy matching. Please Python Language Server team I hope this issue can be addressed soon

…

On Wed, 11 Mar, 2020, 17:17 Johannes Rieken, ***@***.***> wrote: index I just did the minimum working implementation and knew fuzzy matching would be better. We recommend minimal work from extensions and strong suggest to simply search for all query characters in their order inside symbol strings. We'll then do the rest, our implemention of that, for reference and copying: export function isPatternInWord(patternLow: string, patternPos: number, patternLen: number, wordLow: string, wordPos: number, wordLen: number): boolean { while (patternPos < patternLen && wordPos < wordLen) { if (patternLow[patternPos] === wordLow[wordPos]) { patternPos += 1; } wordPos += 1; } return patternPos === patternLen; // pattern must be exhausted } https://github.com/microsoft/vscode/blob/21dc66054203ab742d36be9f0ef6ecb774ae62f2/src/vs/base/common/filters.ts#L506-L514 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#697?email_source=notifications&email_token=AAHSO3FKT5AKGSXHCXAFWDTRG524ZA5CNFSM4G3ZGTAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOPHIQI#issuecomment-597587009>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHSO3A2E4EQPEYIP3A5WLLRG524ZANCNFSM4G3ZGTAA> .

jrieken · 2020-03-11T11:52:54Z

Yeah, no fuzzy matching required - only very relaxed filtering (as described above) and in fact with small'ish results <5000 no filtering is required

ThiefMaster · 2020-03-11T11:55:47Z

I think CamelHumps-style matching is very useful. For example, I have classes named RHUserDashboard and RHManageEventBase. Ideally I'd like to find the former when searching for #RHDash and the latter with #RHEvent.

jrieken · 2020-03-11T13:48:39Z

@ThiefMaster I do agree - I have described is more relaxed, to ensure that VS Code will receive as much results as possible, so that we can then filter and sort them - which also includes camel-hump logic

aniforprez · 2020-03-19T01:05:38Z

YES thank you!

ThiefMaster · 2020-03-19T13:11:36Z

Works fine now! :)

Is there any way to specify that you want to search for an exact name btw? For example, I have hundreds of classes named RHSomething, so it's almost impossible to jump to the class named just RH atm. I tried putting it in quotes or adding a regex-style $ at the end but that didn't help.

jrieken · 2020-03-19T13:15:31Z

Is there any way to specify that you want to search for an exact name btw

Type the exact name 🤓 - it filters and scores based on the input text but scoring should put better matches atop. So the closer the match, the more to the top things should be

ThiefMaster · 2020-03-19T13:19:41Z

I tried that. Doesn't work for this particular name. Is there a minimum length (looks like 3 or 4 chars) for this to work?

jrieken · 2020-03-19T13:25:13Z

Hm, should work with all length... Are you trying this with latest insiders? @bpasero is currently reworking this and it might be a regression or feature-request

bpasero · 2020-03-19T13:42:31Z

Fyi the new work is not yet enabled in insiders.

ThiefMaster · 2020-03-19T13:43:55Z

Yes, latest insiders. If someone wants to try to reproduce it, this is the codebase where it happens.

jakebailey · 2020-03-19T16:28:27Z

We have our own workspace symbol query limit internally (1000) to prevent sending back too many results. It's possible RH is so small that we're not getting to the exact match (or a perfect substring match) before finding a fuzzy-ish match and then it's not in the list. I'm not sure if the UI displays how many results it was given, though.

ThiefMaster · 2020-03-19T16:53:01Z

Any chance the limit could be made configurable? Or even better, returning exact (whole string) matches before applying the limit? Not finding some class that just contains a very vague search term is one thing, but not finding the class that has this exact name is quite confusing. I encountered the issue for another class as well (Event), and after finding many classes starting with Event, I get results for other symbol types that aren't even exact (at least in a case-sensitive way) matches.

jakebailey · 2020-03-19T17:02:07Z

I'd sooner just make it match exact, then substring, then fuzzy, then limit it (without configuration), but all of this is a pretty large nested structure in memory that we'd have to scan multiple times. Maybe that doesn't matter. I'll spend a bit of time on it later.

jakebailey · 2020-03-19T21:31:06Z

0.5.43+ has the aforementioned behavior. If needed, I can modify the substring check to be a prefix only, in case the exact/substring matches are so many that fuzzy stops being useful. Or, bump the limit if we think we can handle more than 1000 of these at a time.

jrieken · 2020-03-20T08:22:09Z

Please don't match on prefixes but use this logic: #697 (comment). Then returning only 1000 elements seems very conservative, 10000 should be no problem

jakebailey · 2020-03-20T15:50:12Z

I am using that method, it's what closed this issue. See #1950.

The follow-up PR just prioritized exact and substring matches before hitting the limit. If there's a limit at all, then there's a chance the exact match never shows up if we're only fuzzy matching.

I'll see if bumping the limit seems sane, but there are still other users for the language server than just VS Code I'd prefer not to break. Symbol values contain full paths, it can turn into an incredible amount of data. If the user's workspace has a path with 50 characters and we send 10,000 symbols, that's at least half a megabyte of data just on path prefixes alone.

jakebailey added the enhancement New feature or request label Mar 5, 2019

jahan01 mentioned this issue Oct 28, 2019

Add fuzzy search to search by symbol microsoft/vscode#33746

Closed

jakebailey mentioned this issue Mar 18, 2020

Add loose fuzzy matching for workspace symbol queries #1950

Merged

jakebailey closed this as completed in #1950 Mar 18, 2020

jakebailey mentioned this issue Mar 19, 2020

Prefer exact and substring matches over fuzzy matches #1957

Merged

jakebailey mentioned this issue Sep 14, 2020

Symbol search matching in middle of word, or with fuzzy matching microsoft/pylance-release#362

Closed

michaelmagistro mentioned this issue Mar 13, 2021

"Open symbol by name" not fuzzy matching fabiospampinato/vscode-todo-plus#309

Closed

edditler mentioned this issue Mar 23, 2021

Fuzzy search for symbols hansec/fortran-language-server#195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Open symbol by name" not fuzzy matching #697

"Open symbol by name" not fuzzy matching #697

aniforprez commented Mar 5, 2019 •

edited

Loading

jakebailey commented Mar 5, 2019

aniforprez commented Jun 5, 2019

jakebailey commented Jun 6, 2019

jakebailey commented Jun 6, 2019

aniforprez commented Jun 6, 2019

jahan01 commented Oct 28, 2019

ThiefMaster commented Mar 11, 2020

jrieken commented Mar 11, 2020 •

edited

Loading

aniforprez commented Mar 11, 2020 via email

jrieken commented Mar 11, 2020

ThiefMaster commented Mar 11, 2020

jrieken commented Mar 11, 2020

aniforprez commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jrieken commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jrieken commented Mar 19, 2020

bpasero commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jakebailey commented Mar 19, 2020 •

edited

Loading

ThiefMaster commented Mar 19, 2020 •

edited

Loading

jakebailey commented Mar 19, 2020

jakebailey commented Mar 19, 2020 •

edited

Loading

jrieken commented Mar 20, 2020

jakebailey commented Mar 20, 2020 •

edited

Loading

"Open symbol by name" not fuzzy matching #697

"Open symbol by name" not fuzzy matching #697

Comments

aniforprez commented Mar 5, 2019 • edited Loading

jakebailey commented Mar 5, 2019

aniforprez commented Jun 5, 2019

jakebailey commented Jun 6, 2019

jakebailey commented Jun 6, 2019

aniforprez commented Jun 6, 2019

jahan01 commented Oct 28, 2019

ThiefMaster commented Mar 11, 2020

jrieken commented Mar 11, 2020 • edited Loading

aniforprez commented Mar 11, 2020 via email

jrieken commented Mar 11, 2020

ThiefMaster commented Mar 11, 2020

jrieken commented Mar 11, 2020

aniforprez commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jrieken commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jrieken commented Mar 19, 2020

bpasero commented Mar 19, 2020

ThiefMaster commented Mar 19, 2020

jakebailey commented Mar 19, 2020 • edited Loading

ThiefMaster commented Mar 19, 2020 • edited Loading

jakebailey commented Mar 19, 2020

jakebailey commented Mar 19, 2020 • edited Loading

jrieken commented Mar 20, 2020

jakebailey commented Mar 20, 2020 • edited Loading

aniforprez commented Mar 5, 2019 •

edited

Loading

jrieken commented Mar 11, 2020 •

edited

Loading

jakebailey commented Mar 19, 2020 •

edited

Loading

ThiefMaster commented Mar 19, 2020 •

edited

Loading

jakebailey commented Mar 19, 2020 •

edited

Loading

jakebailey commented Mar 20, 2020 •

edited

Loading