Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve suggestion results #1168

Merged
merged 13 commits into from
Nov 19, 2020
Merged

Conversation

cdce8p
Copy link
Contributor

@cdce8p cdce8p commented Nov 13, 2020

A first attempt to improve the suggestion list. See: microsoft/pylance-release#608

Instead of the Levenshtein Distance I though that Longest Common Subsequence might work better. The only drawback being that that user must only enter chars that appear in the symbol he/she is looking for and they must be in the correct order.

An example result would look like this:
Screen Shot 2020-11-13 at 15 18 38

Please let me know if that's something worth exploring further.

--
Algorithm used: Similar to isPatternInWord from VS Code

@ghost
Copy link

ghost commented Nov 13, 2020

CLA assistant check
All CLA requirements met.

@erictraut
Copy link
Collaborator

We've received feedback from many other pyright/pylance users that they don't want to be restricted to only characters that appear within the symbol, so I think we'd receive a significant backlash if we went with this approach. The approach needs to accommodate small typos (different order, different capitalization, small deviations in characters).

@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 13, 2020

Capitalization is already taken care of, since it compares all lower case strings.
Regarding the other points: Maybe a combined approach would work. I'll think about it.

@heejaechang
Copy link
Collaborator

heejaechang commented Nov 13, 2020

returning result that doesn't contain given pattern will be filtered out by vscode so regardless whether LS returns them user will never be able to see those?

you can try that by enabling LSP logging and see what shows up in completion. (microsoft/pylance-release#362 (comment))

by the way, I think we don't need to do anything more than what vscode recommended (https://github.com/microsoft/vscode/blob/21dc66054203ab742d36be9f0ef6ecb774ae62f2/src/vs/base/common/filters.ts#L506-L514) for vscode since it does its own filter over data returned.

not sure about other editors that support LSP such as atom, vim-lsp and etc. for those, we might use different algorithm?

@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 14, 2020

I tried a few things and basically observed what @heejaechang mentioned already. VSC does exact pattern matching for the typed string. Setting "editor.suggest.filterGraceful": true (the default) allows for probably one permutation but that's it. Other results are filtered although Pyright provided them (using Levenshtein Distance). Additionally if the user provided chars that don't appear in the symbol, it is also filtered.

Having said thar, I don't see much value in sticking with the current approach. Then again it's not my decision to make.
We could also use a combined method which I suggested previously. However this would probably come with at least a small performance hit, depending on the project size.

@erictraut
Copy link
Collaborator

Yeah, if that's the case, then I agree that sticking with the current approach doesn't provide much value.

@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 14, 2020

I've change the algorithm to the one @heejaechang suggested and VS Code uses: isPatternInWord (Link). Before that I also did some tests with a recursive one to allow for permutations. I'll include it below, however I'm not sure if it's worth it.

export function computeCompletionSimilarity(typedValue: string, symbolName: string): number {
    if (recurseSimilarity(typedValue.toLocaleLowerCase(), symbolName.toLocaleLowerCase())) {
        return 1;
    }
    return 0;
}

function recurseSimilarity(typedLower: string, symbolLower: string): boolean {
    if (typedLower.length === 0) {
        return true;
    }
    const index = symbolLower.indexOf(typedLower[0]);
    if (index === -1) {
        return false;
    }
    if (typedLower.length === 1) {
        return true;
    } else if (typedLower.length === 2) {
        symbolLower = symbolLower.slice(index + 1);
        return recurseSimilarity(typedLower.slice(1), symbolLower);
    } else {
        symbolLower = symbolLower.slice(index + 1);
        return (
            recurseSimilarity(typedLower.slice(1), symbolLower) ||
            recurseSimilarity(typedLower[2] + typedLower[1] + typedLower.slice(3), symbolLower)
        );
    }
}

@cdce8p cdce8p changed the title WIP: Improve suggestion results Improve suggestion results Nov 15, 2020
@heejaechang
Copy link
Collaborator

I agree that we don't do permutation for now. for me, code LGTM, but I will leave it to @erictraut to decide.

* Leave computeCompletionSimilarity in place
* Moved code to isPatternInSymbol
* Change function signature to return boolean
* Replaced references
* Moved tests for computeCompletionSimilarity and
  isPatternInWord to existing stringUtils.test.ts
* Re-added leven package
@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 16, 2020

I've pushed a new commit with all requested changes, see commit description.
Should I remove all references to similarityLimit, now that it isn't used anymore?

@jakebailey
Copy link
Member

similarityLimit appears to be a local in a bunch of modules, I think, so would be safe to remove if truly not used. It's the exported code that we wouldn't want to remove (given we do reference things from pylance).

@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 16, 2020

@jakebailey Should it remain in function signatures of exported classes?

@jakebailey
Copy link
Member

Oh, my mistake. I see, it's a parameter. I would just leave it and we can perform the cleanup later as it's a part of the API.

@heejaechang
Copy link
Collaborator

it looks good to me? if there is no objection, I can accept the PR.

@heejaechang
Copy link
Collaborator

@cdce8p we will take the PR soon. but probably for the next release (Pylance) due to some change we need to make for add import code actions.

@cdce8p
Copy link
Contributor Author

cdce8p commented Nov 17, 2020

@heejaechang Thanks for letting me know. Looking forward to using it when it's released!

@heejaechang heejaechang merged commit a9d2528 into microsoft:master Nov 19, 2020
@cdce8p cdce8p deleted the fuzzy-matching branch November 19, 2020 06:15
@jakebailey
Copy link
Member

Thanks for this; it's not likely to come out in Pylance next week (holiday week), but should make it afterward. 🙂

heejaechang added a commit to heejaechang/pyright that referenced this pull request Nov 3, 2021
* quick and dirty proof of concept impl

* fixed various issue

* tweaks

* adding 1 partial stub file improved ds-1 analysis time about 50%

* fix test failure

* made pylance fs to handle existing stub case and py.typed cases.

* put virtual prefix for virtual file in logging.

* added sanity pylance file system and import resolver tests

* removed createFileSystem and moved PylanceFileSystem to pyright

* moved import resolver tests to pyright

* revert back to private

* addressed PR feedbacks

* addressed PR feedback

* fixed file header comments

* revert back the change

* initial partial stub package working

* some clean ups

* don't generate new string unnecessarily

* added test for isVirtual
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants