Skip to content
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.

"Open symbol by name" not fuzzy matching #697

Closed
aniforprez opened this issue Mar 5, 2019 · 25 comments · Fixed by #1950
Closed

"Open symbol by name" not fuzzy matching #697

aniforprez opened this issue Mar 5, 2019 · 25 comments · Fixed by #1950
Labels
enhancement New feature or request

Comments

@aniforprez
Copy link

aniforprez commented Mar 5, 2019

Using the "Open symbol by name" feature in VSCode, when fuzzy searching for a symbol with Microsoft Python Language Server enabled, the list of symbols returns absolutely nothing. In contrast, when using Jedi, I'm getting results.

With Jedi off:
python ls

With Jedi on:
jedi screenshot

Searching for the symbol exactly works just fine

@jakebailey jakebailey added the enhancement New feature or request label Mar 5, 2019
@jakebailey
Copy link
Member

Currently, the symbol index we've implemented uses case-insensitive substrings for matching, so this is "expected" but stands to be improved.

Note that this only applies to workspace symbols, which has a query string sent to the language server. Document symbols (and the outline) have no query, so VS Code is able to do the fuzzy matching itself over the result we return.

@aniforprez
Copy link
Author

Hey any ETA on this? This is the only thing holding me back from using this extension and it's a pretty big oversight. Searching through symbols over a huge repository without fuzzy search is a huge PITA

@jakebailey
Copy link
Member

No specific ETA, there are other things we're trying to improve and this hasn't been worked on. It's not an oversight per se, when I implemented the index I just did the minimum working implementation and knew fuzzy matching would be better.

To add this, we'll have to go look at the fuzzy logic the extension uses when looking through its ctags database (https://github.com/microsoft/vscode-python/blob/8f4a3070c3973ddfbb0841decfcb02cccf9f32af/src/client/workspaceSymbols/parser.ts#L155) and do something similar.

@jakebailey
Copy link
Member

This isn't a real solution, but you can also use VS Code's builtin search tool, which uses ripgrep under the hood (and is very effective, but of course will lack any Python-specific info).

@aniforprez
Copy link
Author

Ah it's a shame this isn't priority. I will wait for the day it is added because the extension is quite good except for this one grievous oversight. I'm aware of the search tool but it's quite a bit unwieldy and doesn't fuzzy match

@jahan01
Copy link

jahan01 commented Oct 28, 2019

I was following this one microsoft/vscode#33746, but should have followed it here. +1 for this feature

@ThiefMaster
Copy link

+1 on this: I'm very much used to this from PyCharm, so right now I basically have to choose between:

  • using pycharm, and not being able to do proper remote development
  • using vscode+JEDI, and not being able to autogenerate imports
  • using vscode+PLS, and not being able to jump to symbols in a fuzzy way

@jrieken
Copy link
Member

jrieken commented Mar 11, 2020

index I just did the minimum working implementation and knew fuzzy matching would be better.

We recommend minimal work from extensions and strongly suggest to simply search for all query characters in their order inside "symbol strings". We'll then do the rest, our implemention of that, for reference and copying:

export function isPatternInWord(patternLow: string, patternPos: number, patternLen: number, wordLow: string, wordPos: number, wordLen: number): boolean {
	while (patternPos < patternLen && wordPos < wordLen) {
		if (patternLow[patternPos] === wordLow[wordPos]) {
			patternPos += 1;
		}
		wordPos += 1;
	}
	return patternPos === patternLen; // pattern must be exhausted
}

https://github.com/microsoft/vscode/blob/21dc66054203ab742d36be9f0ef6ecb774ae62f2/src/vs/base/common/filters.ts#L506-L514

@aniforprez
Copy link
Author

aniforprez commented Mar 11, 2020 via email

@jrieken
Copy link
Member

jrieken commented Mar 11, 2020

Yeah, no fuzzy matching required - only very relaxed filtering (as described above) and in fact with small'ish results <5000 no filtering is required

@ThiefMaster
Copy link

I think CamelHumps-style matching is very useful. For example, I have classes named RHUserDashboard and RHManageEventBase. Ideally I'd like to find the former when searching for #RHDash and the latter with #RHEvent.

@jrieken
Copy link
Member

jrieken commented Mar 11, 2020

@ThiefMaster I do agree - I have described is more relaxed, to ensure that VS Code will receive as much results as possible, so that we can then filter and sort them - which also includes camel-hump logic

@aniforprez
Copy link
Author

YES thank you!

@ThiefMaster
Copy link

Works fine now! :)

Is there any way to specify that you want to search for an exact name btw? For example, I have hundreds of classes named RHSomething, so it's almost impossible to jump to the class named just RH atm. I tried putting it in quotes or adding a regex-style $ at the end but that didn't help.

@jrieken
Copy link
Member

jrieken commented Mar 19, 2020

Is there any way to specify that you want to search for an exact name btw

Type the exact name 🤓 - it filters and scores based on the input text but scoring should put better matches atop. So the closer the match, the more to the top things should be

@ThiefMaster
Copy link

I tried that. Doesn't work for this particular name. Is there a minimum length (looks like 3 or 4 chars) for this to work?

@jrieken
Copy link
Member

jrieken commented Mar 19, 2020

Hm, should work with all length... Are you trying this with latest insiders? @bpasero is currently reworking this and it might be a regression or feature-request

@bpasero
Copy link
Member

bpasero commented Mar 19, 2020

Fyi the new work is not yet enabled in insiders.

@ThiefMaster
Copy link

Yes, latest insiders. If someone wants to try to reproduce it, this is the codebase where it happens.

@jakebailey
Copy link
Member

jakebailey commented Mar 19, 2020

We have our own workspace symbol query limit internally (1000) to prevent sending back too many results. It's possible RH is so small that we're not getting to the exact match (or a perfect substring match) before finding a fuzzy-ish match and then it's not in the list. I'm not sure if the UI displays how many results it was given, though.

@ThiefMaster
Copy link

ThiefMaster commented Mar 19, 2020

Any chance the limit could be made configurable? Or even better, returning exact (whole string) matches before applying the limit? Not finding some class that just contains a very vague search term is one thing, but not finding the class that has this exact name is quite confusing. I encountered the issue for another class as well (Event), and after finding many classes starting with Event, I get results for other symbol types that aren't even exact (at least in a case-sensitive way) matches.

@jakebailey
Copy link
Member

I'd sooner just make it match exact, then substring, then fuzzy, then limit it (without configuration), but all of this is a pretty large nested structure in memory that we'd have to scan multiple times. Maybe that doesn't matter. I'll spend a bit of time on it later.

@jakebailey
Copy link
Member

jakebailey commented Mar 19, 2020

0.5.43+ has the aforementioned behavior. If needed, I can modify the substring check to be a prefix only, in case the exact/substring matches are so many that fuzzy stops being useful. Or, bump the limit if we think we can handle more than 1000 of these at a time.

@jrieken
Copy link
Member

jrieken commented Mar 20, 2020

Please don't match on prefixes but use this logic: #697 (comment). Then returning only 1000 elements seems very conservative, 10000 should be no problem

@jakebailey
Copy link
Member

jakebailey commented Mar 20, 2020

I am using that method, it's what closed this issue. See #1950.

The follow-up PR just prioritized exact and substring matches before hitting the limit. If there's a limit at all, then there's a chance the exact match never shows up if we're only fuzzy matching.

I'll see if bumping the limit seems sane, but there are still other users for the language server than just VS Code I'd prefer not to break. Symbol values contain full paths, it can turn into an incredible amount of data. If the user's workspace has a path with 50 characters and we send 10,000 symbols, that's at least half a megabyte of data just on path prefixes alone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants