Symbol search matching in middle of word, or with fuzzy matching #362

rkpatel33 · 2020-09-12T01:13:32Z

In the old Python Language Server, the project-wide symbol search Go to symbol in workspace would fuzzy match the middle of the word, eg:

However, Pylance only seems to match from the start of the symbol name, which makes it super limiting when you know roughly what you are looking for but not exactly:

Ideally it would match multiple words:

Or even better, fuzzy match as Sublime text does (this image is from VSC, but would work in Sublime, which I miss a lot):

Not sure if the current behavior was intended or whether its just an oversight.

The text was updated successfully, but these errors were encountered:

erictraut · 2020-09-12T02:41:52Z

Thanks for the input. Pylance already applies fuzzy searching and allows for substrings in the middle, but I agree that it needs some tuning. It currently requires a pretty high "comparison score" for the item to be included in the list. It also takes into account the length of string you have typed relative to the minimum levenshtein distance between the type string and the symbol name.

Here's the algorithm if you're interested in the details:

https://github.com/microsoft/pyright/blob/1b0f26713addcfab91e7f1d802f08cbd3a37bb9a/packages/pyright-internal/src/common/stringUtils.ts#L19

Let us know if you're aware of better algorithms or would like to suggest tweaks to the existing algorithm.

jakebailey · 2020-09-14T18:06:19Z

In MPLS, I copied the algorithm for fuzzy-ish matching from VS Code at request: microsoft/python-language-server#697 (comment) microsoft/python-language-server#1950

erictraut · 2020-09-14T20:12:49Z

I'm not wedded to any particular fuzzy matching algorithm, so feel free to swap it out @jakebaily. Of course, whatever algorithm we choose needs to be fast because some operations can involve 10K+ of these comparisons.

ThiefMaster · 2020-09-15T11:28:43Z

This is incredibly important to have. The logic from the old language server or from PyCharm are both great (with the latter being better IMO).

jakebailey · 2020-09-15T18:42:12Z

If we do copy that method, it'd probably be best to only use it for the document symbols search, I think. The loose algorithm recommended by the VS Code team is intended to be very loose such that more symbols are returned that could be similar, then are filtered by VS Code again with their own fuzzy matcher. Some of other uses of the existing algorithm are a bit different in that they want to limit the number of matches more strictly (i.e. try to not show too many imports or spend too much time in other modules).

ThiefMaster · 2020-09-15T18:45:45Z

What's important IMO is to be able to search e.g. for EventMan or RHEvManBase and find a class named RHEventManagementBase. because in both cases I might not know the exact name but enough to make a good fuzzy search. Or when searching for Mixin I would expect it to show me all symbols containing Mixin (in that capitalization) first.

Another case that worked really poorly with the old language server (but seems to be fine with pylance - please don't break it when improving fuzzy search :)): When I searched for RH, I didn't always get the match for the class named just RH first. And with hundreds of RHSomething classes in my project, this made it pretty much impossible to find that class.

In terms of making results as useful as possible, I would also like the sorting logic to be smart. Currently vscode's fuzzy filename search is a bit weird, see the screenshot below. I'm clearly looking for a last file in the results, but it shows me much worse matches first:

When searching for symbols a situation like this should ideally not happen :)

jakebailey · 2020-09-15T22:37:21Z

Yes, that's what the new algorithm would do; it's basically searching for the letters you typed in that order in any symbol's name, as compared to an edit distance.

I'm surprised RH wouldn't have shown first in MPLS if it's an exact match or a substring; they should have been preferred. Once we return the symbols, VS Code pretty much does whatever it wants, though.

File searching is out of our control, that's entirely up to VS Code and we don't implement it. If you think the file search is weird, you probably need to file an issue on their tracker.

ThiefMaster · 2020-11-12T23:21:29Z

Any news on this?

In fact, "Pylance already applies fuzzy searching and allows for substrings in the middle" does not seem to be the case at all. While looking at the debug output for #603 a search for whitespace returns an empty result, even thought there's e.g. a strip_whitespace function.

jakebailey · 2020-11-12T23:35:24Z

The code hasn't changed here yet, no (other things have taken priority), but we should be finding that result as it's a pure substring.

heejaechang · 2020-11-13T07:06:41Z

so code is here. currently, cut off is set to 0.5. we can either lower this cutoff number or use what @jakebailey linked (code from vscode - isPatternInWord

that being said, I think users probably will get better experiences by us moving to "isPatternInWord" for any features that we know the host (vscode or vs) does its own filtering on top of what we returned. (ex, workspace symbols, completion including auto-import with min pattern length)

otherwise, keep existing similarity check (add import code action)

...

by the way, these 2 are trying to solve slightly different problems I believe. isPatternInWord, try to find the pattern, but not a mis-spelling. But "similarity check" tries to find mis-spelling but not pattern.

for example,

def Helo1234():
    pass

def Hemo1234():
    pass

Hel <= completion here

for "Hel", we currently return both "Helo" and "Hem" since "Hem" is similar to "Hel", but the user will never see it since the host will filter it out (since it doesn't match the pattern). also, the same for workspace symbol, results that don't contain the pattern will never be shown to users even if we return the item.

...
There is this option
"editor.suggest.filterGraceful": false

but this one is more for char ordering "Hel" vs "eHl" not for "Hel" vs "Hem"

jakebailey · 2020-11-13T18:31:47Z

Switching to the MPLS algorithm for symbol searching that was recommended by VSC is my preference; completions will use a different method. I meant to do it sooner but other things came up.

savannahostrowski · 2021-06-15T17:03:09Z

I think we forgot to close this when it was fixed in November. Please use the latest version of Pylance and drop a note here if you find any issues.

jakebailey · 2021-06-15T17:09:10Z

This should be different; this issue is for the workspace symbol search, and the previous change was made for completions.

jakebailey · 2021-06-15T17:30:37Z

My mistake; this was changed during a refactor and apparently I missed it.

starball5 · 2023-02-18T23:11:38Z

Possibly related VS Code issue tickets: microsoft/vscode#33746 and microsoft/vscode#156865.

Possibly related Stack Overflow Question: How can I search symbols with partial words in VS Code with IntelliSense?

github-actions bot added the triage label Sep 12, 2020

judej added the enhancement New feature or request label Sep 14, 2020

github-actions bot removed the triage label Sep 14, 2020

jakebailey self-assigned this Sep 15, 2020

heejaechang mentioned this issue Nov 13, 2020

Less aggressive filtering of code completions - class methods #608

Closed

heejaechang mentioned this issue Nov 13, 2020

Improve suggestion results microsoft/pyright#1168

Merged

savannahostrowski closed this as completed Jun 15, 2021

jakebailey reopened this Jun 15, 2021

jakebailey closed this as completed Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symbol search matching in middle of word, or with fuzzy matching #362

Symbol search matching in middle of word, or with fuzzy matching #362

rkpatel33 commented Sep 12, 2020

erictraut commented Sep 12, 2020

jakebailey commented Sep 14, 2020

erictraut commented Sep 14, 2020 •

edited

Loading

ThiefMaster commented Sep 15, 2020

jakebailey commented Sep 15, 2020

ThiefMaster commented Sep 15, 2020 •

edited

Loading

jakebailey commented Sep 15, 2020

ThiefMaster commented Nov 12, 2020

jakebailey commented Nov 12, 2020

heejaechang commented Nov 13, 2020 •

edited

Loading

jakebailey commented Nov 13, 2020

savannahostrowski commented Jun 15, 2021

jakebailey commented Jun 15, 2021 •

edited

Loading

jakebailey commented Jun 15, 2021

starball5 commented Feb 18, 2023 •

edited

Loading

Symbol search matching in middle of word, or with fuzzy matching #362

Symbol search matching in middle of word, or with fuzzy matching #362

Comments

rkpatel33 commented Sep 12, 2020

erictraut commented Sep 12, 2020

jakebailey commented Sep 14, 2020

erictraut commented Sep 14, 2020 • edited Loading

ThiefMaster commented Sep 15, 2020

jakebailey commented Sep 15, 2020

ThiefMaster commented Sep 15, 2020 • edited Loading

jakebailey commented Sep 15, 2020

ThiefMaster commented Nov 12, 2020

jakebailey commented Nov 12, 2020

heejaechang commented Nov 13, 2020 • edited Loading

jakebailey commented Nov 13, 2020

savannahostrowski commented Jun 15, 2021

jakebailey commented Jun 15, 2021 • edited Loading

jakebailey commented Jun 15, 2021

starball5 commented Feb 18, 2023 • edited Loading

erictraut commented Sep 14, 2020 •

edited

Loading

ThiefMaster commented Sep 15, 2020 •

edited

Loading

heejaechang commented Nov 13, 2020 •

edited

Loading

jakebailey commented Jun 15, 2021 •

edited

Loading

starball5 commented Feb 18, 2023 •

edited

Loading