Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol search matching in middle of word, or with fuzzy matching #362

Closed
rkpatel33 opened this issue Sep 12, 2020 · 15 comments
Closed

Symbol search matching in middle of word, or with fuzzy matching #362

rkpatel33 opened this issue Sep 12, 2020 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@rkpatel33
Copy link

In the old Python Language Server, the project-wide symbol search Go to symbol in workspace would fuzzy match the middle of the word, eg:

image

However, Pylance only seems to match from the start of the symbol name, which makes it super limiting when you know roughly what you are looking for but not exactly:

image

Ideally it would match multiple words:

image

Or even better, fuzzy match as Sublime text does (this image is from VSC, but would work in Sublime, which I miss a lot):

image

Not sure if the current behavior was intended or whether its just an oversight.

@erictraut
Copy link
Contributor

Thanks for the input. Pylance already applies fuzzy searching and allows for substrings in the middle, but I agree that it needs some tuning. It currently requires a pretty high "comparison score" for the item to be included in the list. It also takes into account the length of string you have typed relative to the minimum levenshtein distance between the type string and the symbol name.

Here's the algorithm if you're interested in the details:

https://github.com/microsoft/pyright/blob/1b0f26713addcfab91e7f1d802f08cbd3a37bb9a/packages/pyright-internal/src/common/stringUtils.ts#L19

Let us know if you're aware of better algorithms or would like to suggest tweaks to the existing algorithm.

@judej judej added the enhancement New feature or request label Sep 14, 2020
@github-actions github-actions bot removed the triage label Sep 14, 2020
@jakebailey
Copy link
Member

In MPLS, I copied the algorithm for fuzzy-ish matching from VS Code at request: microsoft/python-language-server#697 (comment) microsoft/python-language-server#1950

@erictraut
Copy link
Contributor

erictraut commented Sep 14, 2020

I'm not wedded to any particular fuzzy matching algorithm, so feel free to swap it out @jakebaily. Of course, whatever algorithm we choose needs to be fast because some operations can involve 10K+ of these comparisons.

@ThiefMaster
Copy link

This is incredibly important to have. The logic from the old language server or from PyCharm are both great (with the latter being better IMO).

@jakebailey jakebailey self-assigned this Sep 15, 2020
@jakebailey
Copy link
Member

If we do copy that method, it'd probably be best to only use it for the document symbols search, I think. The loose algorithm recommended by the VS Code team is intended to be very loose such that more symbols are returned that could be similar, then are filtered by VS Code again with their own fuzzy matcher. Some of other uses of the existing algorithm are a bit different in that they want to limit the number of matches more strictly (i.e. try to not show too many imports or spend too much time in other modules).

@ThiefMaster
Copy link

ThiefMaster commented Sep 15, 2020

What's important IMO is to be able to search e.g. for EventMan or RHEvManBase and find a class named RHEventManagementBase. because in both cases I might not know the exact name but enough to make a good fuzzy search. Or when searching for Mixin I would expect it to show me all symbols containing Mixin (in that capitalization) first.

Another case that worked really poorly with the old language server (but seems to be fine with pylance - please don't break it when improving fuzzy search :)): When I searched for RH, I didn't always get the match for the class named just RH first. And with hundreds of RHSomething classes in my project, this made it pretty much impossible to find that class.

In terms of making results as useful as possible, I would also like the sorting logic to be smart. Currently vscode's fuzzy filename search is a bit weird, see the screenshot below. I'm clearly looking for a last file in the results, but it shows me much worse matches first:

image

When searching for symbols a situation like this should ideally not happen :)

@jakebailey
Copy link
Member

Yes, that's what the new algorithm would do; it's basically searching for the letters you typed in that order in any symbol's name, as compared to an edit distance.

I'm surprised RH wouldn't have shown first in MPLS if it's an exact match or a substring; they should have been preferred. Once we return the symbols, VS Code pretty much does whatever it wants, though.

File searching is out of our control, that's entirely up to VS Code and we don't implement it. If you think the file search is weird, you probably need to file an issue on their tracker.

@ThiefMaster
Copy link

Any news on this?

In fact, "Pylance already applies fuzzy searching and allows for substrings in the middle" does not seem to be the case at all. While looking at the debug output for #603 a search for whitespace returns an empty result, even thought there's e.g. a strip_whitespace function.

@jakebailey
Copy link
Member

The code hasn't changed here yet, no (other things have taken priority), but we should be finding that result as it's a pure substring.

@heejaechang
Copy link
Contributor

heejaechang commented Nov 13, 2020

so code is here. currently, cut off is set to 0.5. we can either lower this cutoff number or use what @jakebailey linked (code from vscode - isPatternInWord

that being said, I think users probably will get better experiences by us moving to "isPatternInWord" for any features that we know the host (vscode or vs) does its own filtering on top of what we returned. (ex, workspace symbols, completion including auto-import with min pattern length)

otherwise, keep existing similarity check (add import code action)

...

by the way, these 2 are trying to solve slightly different problems I believe. isPatternInWord, try to find the pattern, but not a mis-spelling. But "similarity check" tries to find mis-spelling but not pattern.

for example,

def Helo1234():
    pass

def Hemo1234():
    pass

Hel <= completion here

for "Hel", we currently return both "Helo" and "Hem" since "Hem" is similar to "Hel", but the user will never see it since the host will filter it out (since it doesn't match the pattern). also, the same for workspace symbol, results that don't contain the pattern will never be shown to users even if we return the item.

...
There is this option
"editor.suggest.filterGraceful": false

image

but this one is more for char ordering "Hel" vs "eHl" not for "Hel" vs "Hem"

@jakebailey
Copy link
Member

Switching to the MPLS algorithm for symbol searching that was recommended by VSC is my preference; completions will use a different method. I meant to do it sooner but other things came up.

@savannahostrowski
Copy link
Contributor

I think we forgot to close this when it was fixed in November. Please use the latest version of Pylance and drop a note here if you find any issues.

@jakebailey
Copy link
Member

jakebailey commented Jun 15, 2021

This should be different; this issue is for the workspace symbol search, and the previous change was made for completions.

@jakebailey jakebailey reopened this Jun 15, 2021
@jakebailey
Copy link
Member

My mistake; this was changed during a refactor and apparently I missed it.

image

@starball5
Copy link

starball5 commented Feb 18, 2023

Possibly related VS Code issue tickets: microsoft/vscode#33746 and microsoft/vscode#156865.

Possibly related Stack Overflow Question: How can I search symbols with partial words in VS Code with IntelliSense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants