-
Notifications
You must be signed in to change notification settings - Fork 131
"Open symbol by name" not fuzzy matching #697
Comments
Currently, the symbol index we've implemented uses case-insensitive substrings for matching, so this is "expected" but stands to be improved. Note that this only applies to workspace symbols, which has a query string sent to the language server. Document symbols (and the outline) have no query, so VS Code is able to do the fuzzy matching itself over the result we return. |
Hey any ETA on this? This is the only thing holding me back from using this extension and it's a pretty big oversight. Searching through symbols over a huge repository without fuzzy search is a huge PITA |
No specific ETA, there are other things we're trying to improve and this hasn't been worked on. It's not an oversight per se, when I implemented the index I just did the minimum working implementation and knew fuzzy matching would be better. To add this, we'll have to go look at the fuzzy logic the extension uses when looking through its ctags database (https://github.com/microsoft/vscode-python/blob/8f4a3070c3973ddfbb0841decfcb02cccf9f32af/src/client/workspaceSymbols/parser.ts#L155) and do something similar. |
This isn't a real solution, but you can also use VS Code's builtin search tool, which uses ripgrep under the hood (and is very effective, but of course will lack any Python-specific info). |
Ah it's a shame this isn't priority. I will wait for the day it is added because the extension is quite good except for this one grievous oversight. I'm aware of the search tool but it's quite a bit unwieldy and doesn't fuzzy match |
I was following this one microsoft/vscode#33746, but should have followed it here. +1 for this feature |
+1 on this: I'm very much used to this from PyCharm, so right now I basically have to choose between:
|
We recommend minimal work from extensions and strongly suggest to simply search for all query characters in their order inside "symbol strings". We'll then do the rest, our implemention of that, for reference and copying: export function isPatternInWord(patternLow: string, patternPos: number, patternLen: number, wordLow: string, wordPos: number, wordLen: number): boolean {
while (patternPos < patternLen && wordPos < wordLen) {
if (patternLow[patternPos] === wordLow[wordPos]) {
patternPos += 1;
}
wordPos += 1;
}
return patternPos === patternLen; // pattern must be exhausted
} |
Ah thanks Johannes. I was wondering why the extension had to do the fuzzy
matching. Please Python Language Server team I hope this issue can be
addressed soon
…On Wed, 11 Mar, 2020, 17:17 Johannes Rieken, ***@***.***> wrote:
index I just did the minimum working implementation and knew fuzzy
matching would be better.
We recommend minimal work from extensions and strong suggest to simply
search for all query characters in their order inside symbol strings. We'll
then do the rest, our implemention of that, for reference and copying:
export function isPatternInWord(patternLow: string, patternPos: number, patternLen: number, wordLow: string, wordPos: number, wordLen: number): boolean {
while (patternPos < patternLen && wordPos < wordLen) {
if (patternLow[patternPos] === wordLow[wordPos]) {
patternPos += 1;
}
wordPos += 1;
}
return patternPos === patternLen; // pattern must be exhausted
}
https://github.com/microsoft/vscode/blob/21dc66054203ab742d36be9f0ef6ecb774ae62f2/src/vs/base/common/filters.ts#L506-L514
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#697?email_source=notifications&email_token=AAHSO3FKT5AKGSXHCXAFWDTRG524ZA5CNFSM4G3ZGTAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOPHIQI#issuecomment-597587009>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHSO3A2E4EQPEYIP3A5WLLRG524ZANCNFSM4G3ZGTAA>
.
|
Yeah, no fuzzy matching required - only very relaxed filtering (as described above) and in fact with small'ish results <5000 no filtering is required |
I think CamelHumps-style matching is very useful. For example, I have classes named |
@ThiefMaster I do agree - I have described is more relaxed, to ensure that VS Code will receive as much results as possible, so that we can then filter and sort them - which also includes camel-hump logic |
YES thank you! |
Works fine now! :) Is there any way to specify that you want to search for an exact name btw? For example, I have hundreds of classes named |
Type the exact name 🤓 - it filters and scores based on the input text but scoring should put better matches atop. So the closer the match, the more to the top things should be |
I tried that. Doesn't work for this particular name. Is there a minimum length (looks like 3 or 4 chars) for this to work? |
Hm, should work with all length... Are you trying this with latest insiders? @bpasero is currently reworking this and it might be a regression or feature-request |
Fyi the new work is not yet enabled in insiders. |
Yes, latest insiders. If someone wants to try to reproduce it, this is the codebase where it happens. |
We have our own workspace symbol query limit internally (1000) to prevent sending back too many results. It's possible |
Any chance the limit could be made configurable? Or even better, returning exact (whole string) matches before applying the limit? Not finding some class that just contains a very vague search term is one thing, but not finding the class that has this exact name is quite confusing. I encountered the issue for another class as well ( |
I'd sooner just make it match exact, then substring, then fuzzy, then limit it (without configuration), but all of this is a pretty large nested structure in memory that we'd have to scan multiple times. Maybe that doesn't matter. I'll spend a bit of time on it later. |
0.5.43+ has the aforementioned behavior. If needed, I can modify the substring check to be a prefix only, in case the exact/substring matches are so many that fuzzy stops being useful. Or, bump the limit if we think we can handle more than 1000 of these at a time. |
Please don't match on prefixes but use this logic: #697 (comment). Then returning only 1000 elements seems very conservative, 10000 should be no problem |
I am using that method, it's what closed this issue. See #1950. The follow-up PR just prioritized exact and substring matches before hitting the limit. If there's a limit at all, then there's a chance the exact match never shows up if we're only fuzzy matching. I'll see if bumping the limit seems sane, but there are still other users for the language server than just VS Code I'd prefer not to break. Symbol values contain full paths, it can turn into an incredible amount of data. If the user's workspace has a path with 50 characters and we send 10,000 symbols, that's at least half a megabyte of data just on path prefixes alone. |
Using the "Open symbol by name" feature in VSCode, when fuzzy searching for a symbol with Microsoft Python Language Server enabled, the list of symbols returns absolutely nothing. In contrast, when using Jedi, I'm getting results.
With Jedi off:
With Jedi on:
Searching for the symbol exactly works just fine
The text was updated successfully, but these errors were encountered: