-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbol search matching in middle of word, or with fuzzy matching #362
Comments
Thanks for the input. Pylance already applies fuzzy searching and allows for substrings in the middle, but I agree that it needs some tuning. It currently requires a pretty high "comparison score" for the item to be included in the list. It also takes into account the length of string you have typed relative to the minimum levenshtein distance between the type string and the symbol name. Here's the algorithm if you're interested in the details: Let us know if you're aware of better algorithms or would like to suggest tweaks to the existing algorithm. |
In MPLS, I copied the algorithm for fuzzy-ish matching from VS Code at request: microsoft/python-language-server#697 (comment) microsoft/python-language-server#1950 |
I'm not wedded to any particular fuzzy matching algorithm, so feel free to swap it out @jakebaily. Of course, whatever algorithm we choose needs to be fast because some operations can involve 10K+ of these comparisons. |
This is incredibly important to have. The logic from the old language server or from PyCharm are both great (with the latter being better IMO). |
If we do copy that method, it'd probably be best to only use it for the document symbols search, I think. The loose algorithm recommended by the VS Code team is intended to be very loose such that more symbols are returned that could be similar, then are filtered by VS Code again with their own fuzzy matcher. Some of other uses of the existing algorithm are a bit different in that they want to limit the number of matches more strictly (i.e. try to not show too many imports or spend too much time in other modules). |
What's important IMO is to be able to search e.g. for Another case that worked really poorly with the old language server (but seems to be fine with pylance - please don't break it when improving fuzzy search :)): When I searched for In terms of making results as useful as possible, I would also like the sorting logic to be smart. Currently vscode's fuzzy filename search is a bit weird, see the screenshot below. I'm clearly looking for a last file in the results, but it shows me much worse matches first: When searching for symbols a situation like this should ideally not happen :) |
Yes, that's what the new algorithm would do; it's basically searching for the letters you typed in that order in any symbol's name, as compared to an edit distance. I'm surprised File searching is out of our control, that's entirely up to VS Code and we don't implement it. If you think the file search is weird, you probably need to file an issue on their tracker. |
Any news on this? In fact, "Pylance already applies fuzzy searching and allows for substrings in the middle" does not seem to be the case at all. While looking at the debug output for #603 a search for |
The code hasn't changed here yet, no (other things have taken priority), but we should be finding that result as it's a pure substring. |
so code is here. currently, cut off is set to 0.5. we can either lower this cutoff number or use what @jakebailey linked (code from vscode - isPatternInWord that being said, I think users probably will get better experiences by us moving to "isPatternInWord" for any features that we know the host (vscode or vs) does its own filtering on top of what we returned. (ex, workspace symbols, completion including auto-import with min pattern length) otherwise, keep existing similarity check (add import code action) ... by the way, these 2 are trying to solve slightly different problems I believe. isPatternInWord, try to find the pattern, but not a mis-spelling. But "similarity check" tries to find mis-spelling but not pattern. for example, def Helo1234():
pass
def Hemo1234():
pass
Hel <= completion here for "Hel", we currently return both "Helo" and "Hem" since "Hem" is similar to "Hel", but the user will never see it since the host will filter it out (since it doesn't match the pattern). also, the same for workspace symbol, results that don't contain the pattern will never be shown to users even if we return the item. ... but this one is more for char ordering "Hel" vs "eHl" not for "Hel" vs "Hem" |
Switching to the MPLS algorithm for symbol searching that was recommended by VSC is my preference; completions will use a different method. I meant to do it sooner but other things came up. |
I think we forgot to close this when it was fixed in November. Please use the latest version of Pylance and drop a note here if you find any issues. |
This should be different; this issue is for the workspace symbol search, and the previous change was made for completions. |
Possibly related VS Code issue tickets: microsoft/vscode#33746 and microsoft/vscode#156865. Possibly related Stack Overflow Question: How can I search symbols with partial words in VS Code with IntelliSense? |
In the old Python Language Server, the project-wide symbol search
Go to symbol in workspace
would fuzzy match the middle of the word, eg:However, Pylance only seems to match from the start of the symbol name, which makes it super limiting when you know roughly what you are looking for but not exactly:
Ideally it would match multiple words:
Or even better, fuzzy match as Sublime text does (this image is from VSC, but would work in Sublime, which I miss a lot):
Not sure if the current behavior was intended or whether its just an oversight.
The text was updated successfully, but these errors were encountered: