Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add fuzzy matching algorithm for tab completions #1855

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

Snehil-Shah
Copy link
Contributor

@Snehil-Shah Snehil-Shah commented Mar 12, 2024

Resolves #1845

Description

What is the purpose of this pull request?

This pull request:

  • adds fuzzy matching algorithm for tab completions
  • write a custom completer engine with navigation and highlighted completions

The algorithm revolves around scoring each completion based on the number of matching characters, and the distance between the matching characters in the completion & input.
It easily catches minor spelling mistakes and matches 'similar looking' prefixes (at least from how much I have tested).

Complexity (assuming startsWith takes linear time):

Before: O(n) per completion, where n = length of input string.
After: O(m+n) per completion, where n = length of input string & m = length of completion

Update: The algorithm is now updated.

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

right now fuzzy match will only work for completing expressions and not for incomplete workspaces, file system, tutorial and require statements, as for some reason they are not using the abstracted filterByPrefix method. Instead they are reimplementing it (see image).

complete_workspace.js

Would do the refactoring once the algorithm is finalized, to fix this!

Update: fixed!

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

I tried other popular fuzzy auto-completion algorithms too (not many) but completions didn't feel as relevant as I expected from a code completer.
The Levenshtein algorithm, for example, favors suggestions based on how 'similar looking' two strings are, meaning two strings with a higher difference in length would be scored worse (as it requires more insertions), which is not ideal for code completion.

What I haven't done much yet, is explore existing 'code completions' algorithms in IDEs/Editors and see how they are implementing them. Please tell me if I should do that.

The current implementation does work really well though based on the limited cases I've tested it through.

Update: Updated the algorithm!

And AFAIK, node.js and ipython don't have fuzzy auto-completion yet. (correct me if I'm wrong)

Checklist

Please ensure the following tasks are completed before submitting this pull request.


@stdlib-js/reviewers

Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>
@kgryte kgryte added Needs Review A pull request which needs code review. REPL Issue or pull request specific to the project REPL. labels Mar 15, 2024
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah
Copy link
Contributor Author

@kgryte Regarding visually highlighting the matching characters in the completions (ys -> yes).

I am a bit confused about how we can achieve this as we are relying on built-in readline module's completer for tab completions. I don't think we can alter on how the completer displays the completions.

One of the (rather hacky) ways I tried doing this is by including the bold outputs in the resulting array itself that is returned by the completer callback.
This didn't go well with readline completer's feature where say I type common and the possible completions are commonKeys and commonKeysIn, If I click TAB, instead of showing both the completions, it auto-completes it to commonKeys as its the longest common substring of all completions.
So in our case, if the first letter (say c) of input is common in all completions, {ascii-code}c{ascii-code} (our bold output) becomes the longest common substring of all completions causing completer to just auto-complete it instead of displaying all possible completions.

Is there any other way I can go around implementing this??

And regarding the algorithm, afaik this isn't based on any prior implementation/art. It's a simple algorithm that I wrote from scratch that matches based on how much the characters in the completion deviate from the input string (distance).
The magic 0.25 is a weight I have iteratively reached at, and 0.7 is the threshold score above which matches are permissible.

@kgryte
Copy link
Member

kgryte commented Mar 24, 2024

Hmm...yeah, I hadn't thought about the completer controlling the terminal output. My initial thought was escape sequences, but then you are right, I can see how that might be an issue with auto-completion.

Well...one possibility is to just to write our own completer. We don't have to use the built-in. Could write it from scratch. May be worth exploring the Node.js source implementation to see how it handles the completer logic.

@kgryte
Copy link
Member

kgryte commented Mar 24, 2024

For IPython, looks like they use jedi for auto-completion. In Node.js, fuzzy-matching is not supported.

@Snehil-Shah
Copy link
Contributor Author

For IPython, looks like they use jedi for auto-completion. In Node.js, fuzzy-matching is not supported.

Fuzzy matching is not supported in IPython either from what I know. Am I mistaken?

@kgryte
Copy link
Member

kgryte commented Mar 24, 2024

@kgryte
Copy link
Member

kgryte commented Mar 24, 2024

I haven't personally used jedi in IPython, so YMMV.

@Snehil-Shah
Copy link
Contributor Author

@kgryte understood! I'll explore writing our own completer, and studying jedi's fuzzy matching algorithm👍

@kgryte
Copy link
Member

kgryte commented Mar 24, 2024

Don't stray too far down the rabbit hole! It may be vast. 😅

@kgryte
Copy link
Member

kgryte commented Apr 2, 2024

right now fuzzy match will only work for completing expressions and not for incomplete workspaces, file system, tutorial and require statements, as for some reason they are not using the abstracted filterByPrefix method.

This was intentional, as matching is "inlined" due to additional loop logic (e.g., complete_require.js). It should, however, be possible to swap out startsWith with fuzzyMatch, with the latter being a separate module, rather than a private function.

We'll want fuzzy matching to be enabled via a setting. This is another thing which users may have preferences over.

And as a first pass, we can merge basic functionality in first and then do character highlighting in a follow-up PR, as that is likely to require yet more refactoring and is not strictly necessary in order to get the benefits of fuzzy matching.

@kgryte
Copy link
Member

kgryte commented Apr 2, 2024

In the JSDoc comment of fuzzyMatch, it would be good to explain the algorithm/approach a bit and include example values for each stage of the algorithm. It can be a bit hard to grok what you are doing just by looking at for loops.

@Snehil-Shah
Copy link
Contributor Author

@kgryte I am almost done with the character highlighting part though, can I push the changes and we can iterate over it after?

@kgryte
Copy link
Member

kgryte commented Apr 2, 2024

@Snehil-Shah Sure. In general, going forward, it would be good to split tasks which can be split into separate PRs, as any additional complexity will increase the risk of PRs getting bogged down in review and refactoring.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah Snehil-Shah marked this pull request as ready for review April 2, 2024 13:07
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah Snehil-Shah requested a review from kgryte April 5, 2024 16:30
@Snehil-Shah
Copy link
Contributor Author

@kgryte Done with the implementation.
Fuzzy completions is now controllable via settings, and also added a max height to the completions output, as, in cases where the output was taller than the terminal (causing scrolling), moveCursor wasn't working as expected.

Current behaviours of TAB completions:

  • going up to the line, brings back the original line and going to any completion automatically updates the line.
undo.mp4
  • Hitting TAB is like a toggle button to bring up and hide the completions panel.

  • I tried separating the exact completions from the fuzzy ones and IMO they look worse:

image
double

It was better before, and now that we have a setting to disable it, in my opinion, we can normally display them just after the exact completions as it doesn't hurt to have more completions that the user might be potentially going for, given the exact ones are up top already. And if the user doesn't want it, they can disable it via settings.
But if you have a differing opinion, we can stick to your recommendation of using exact prefixes by default and when nothing matches, showing fuzzy completions.👍

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

Re: line separation. I don't think I agree here. By mixing the exact match and fuzzy results, you are making the user search for the exact matches. They, themselves, have to identify the boundaries between fuzzy and exact match. And that is not ideal UX, especially as the fuzzy matches create visual noise.

Take your example above. For the exact matches, all matching text is lower case, but, for the fuzzy matches, you have a mix of lowercase, camelcase, and title case. The fuzzy completions are just inherently noisier, as there is no pattern. And because of that noise, you are imposing additional cognitive load on users to identify the "correct" match and to find how their query matches any one specific fuzzy match. In short, you are making users work. And users, generally, try to avoid work.

The line separation makes it clear what is exact and what is fuzzy. And a line separation provides structure to results for which, displayed together, lack obvious structure, until you spend time understanding order and implicit grouping. IMO, grouping together is worse UX.

On another note, I am wondering if we should limit the number of results in order to avoid choice paralysis. Did a quick search (ref 1 and ref 2), and seems generally agreed to limit fuzzy results to no more than 10 items. Anything more and users spend more time searching than auto-completing.

@Snehil-Shah
Copy link
Contributor Author

@kgryte I understand.
Regarding limiting the results, are we talking about just limiting the fuzzy results or all completions in general? Because if we limit even exact prefixes, then we are losing potential completions that the user is actually trying to find as the exact prefixes are sorted lexicographically and not based on some order of relevancy. in short, on what basis will we be discarding exact prefixes by limiting overall completions to 10?
Limiting fuzzy results does make sense though as we have a basis for discarding completions (ie. score).
And what are your thoughts on showing fuzzy matches only when there are no exact prefixes? It would lead to lesser and cleaner completions and better perf..

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

Agreed. Restricting number of completion candidates only makes sense for fuzzy completions.

And yeah, I still lean toward fuzzy completions only when no exact prefix matches.

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

Also, fuzzy completions only when no exact match is the less invasive choice to start. If users start clamoring for fuzzy completions even when there are exact prefix matches, should be straightforward to refactor to match the current behavior. And we have this PR as a reference.

@Snehil-Shah
Copy link
Contributor Author

@kgryte great. Will go ahead with fuzzy completions only when there's no exact matches.
And now that we are doing this, does it make sense to limit results?
When we are displaying fuzzy completions, it means there were no exact matches. Presumably, that means the fuzzy completions too wouldn't be that much. Plus as there are no exact completions, there's a chance the user really needs help there, and limiting fuzzy completions would be counter intuitive imo. What do you think?

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

Maybe make the number of displayed fuzzy completions a setting?

@Snehil-Shah
Copy link
Contributor Author

@kgryte we can do that. although I wonder if it's that user-friendly given it's generally abstracted to just a true or false in most editors🤔

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

Another option would be to provide a setting (or settings) for users to adjust the magic parameters which are used to score results. That would achieve the same end as limiting the number of results. But, then again, we could change the underlying scoring algo in the future.

Regardless, I am not sure that users benefit all that much from having more than 10 fuzzy choices. One option to make it more "powerful" or relevant is to weight items in a user's history higher. But this makes the implementation more complex.

I am tempted to just say limit to 10. If people want more, we can easily make this adjustable.

@Snehil-Shah
Copy link
Contributor Author

@kgryte I get it. will go ahead and keep the setting a Boolean, and will limit the fuzzy results to 10. Is that right?
Regarding weighing the user's history, that's smart, maybe we can explore that in a future endeavour..

@kgryte
Copy link
Member

kgryte commented Apr 5, 2024

will go ahead and keep the setting a Boolean, and will limit the fuzzy results to 10. Is that right?

Yes.

Regarding weighing the user's history, that's smart, maybe we can explore that in a future endeavour..

Yes, but definitely not in this PR. 😅

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah
Copy link
Contributor Author

Snehil-Shah commented Apr 6, 2024

@kgryte Done

Only fuzzy:

fuzzy

Only exact:

exact

Currently writing some tests for the completer

Btw I think I found a small bug in auto-closing brackets (in develop). when hitting the up arrow from inside the auto-closed pair, it removes the auto-inserted brackets on the right of the cursor automatically. And for some reason, this only happens from the 2nd command onwards.

Before:

(['<|>'])

After hitting UP:

(['<|>

Tried to debug and no, it's not caused by auto-delete-pairs, and as it is happening from the 2nd command onwards, I presume it's a node bug.. I found a similar bug when using TAB key as noted in completer_engine.js (Line 422)..
It's a small bug though, just thought I would point it out..

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah
Copy link
Contributor Author

@kgryte Done with the tests..

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Snehil-Shah and others added 6 commits April 26, 2024 01:15
Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@Snehil-Shah
Copy link
Contributor Author

@kgryte Done with this PR:

The completer works as expected with the pager and is also responsive to terminal resizes.
Did a bunch of refactoring and updated the tests too. This should be ready for review.

FYI: while running the tests, I was getting these warnings about SIGWINCH listeners.

image

I added a removeListener when closing the REPL and the warnings seem to go away:

image

Should I push this?

@kgryte
Copy link
Member

kgryte commented Apr 26, 2024

Ah, yes, forgot to add the remove listener call in the other PR. Rather than make that a part of this PR, I suggest submitting a patch PR which we can merge sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review A pull request which needs code review. REPL Issue or pull request specific to the project REPL.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: add fuzzy auto-completion in REPL
2 participants