feat: add fuzzy matching algorithm for tab completions #1855

Snehil-Shah · 2024-03-12T23:51:00Z

Resolves #1845

Description

What is the purpose of this pull request?

This pull request:

adds fuzzy matching algorithm for tab completions
write a custom completer engine with navigation and highlighted completions

The algorithm revolves around scoring each completion based on the number of matching characters, and the distance between the matching characters in the completion & input.
It easily catches minor spelling mistakes and matches 'similar looking' prefixes (at least from how much I have tested).

Complexity (assuming startsWith takes linear time):

Before: O(n) per completion, where n = length of input string.
After: O(m+n) per completion, where n = length of input string & m = length of completion

Update: The algorithm is now updated.

Related Issues

Does this pull request have any related issues?

This pull request:

resolves [RFC]: add fuzzy auto-completion in REPL #1845
relates to [Idea]: building a better Node.js REPL google-summer-of-code#1

Questions

Any questions for reviewers of this pull request?

right now fuzzy match will only work for completing expressions and not for incomplete workspaces, file system, tutorial and require statements, as for some reason they are not using the abstracted filterByPrefix method. Instead they are reimplementing it (see image).

Would do the refactoring once the algorithm is finalized, to fix this!

Update: fixed!

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

I tried other popular fuzzy auto-completion algorithms too (not many) but completions didn't feel as relevant as I expected from a code completer.
The Levenshtein algorithm, for example, favors suggestions based on how 'similar looking' two strings are, meaning two strings with a higher difference in length would be scored worse (as it requires more insertions), which is not ideal for code completion.

What I haven't done much yet, is explore existing 'code completions' algorithms in IDEs/Editors and see how they are implementing them. Please tell me if I should do that.

The current implementation does work really well though based on the limited cases I've tested it through.

Update: Updated the algorithm!

And AFAIK, node.js and ipython don't have fuzzy auto-completion yet. (correct me if I'm wrong)

Checklist

Please ensure the following tasks are completed before submitting this pull request.

Read, understood, and followed the contributing guidelines.

@stdlib-js/reviewers

Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2024-03-24T09:35:20Z

@kgryte Regarding visually highlighting the matching characters in the completions (ys -> yes).

I am a bit confused about how we can achieve this as we are relying on built-in readline module's completer for tab completions. I don't think we can alter on how the completer displays the completions.

One of the (rather hacky) ways I tried doing this is by including the bold outputs in the resulting array itself that is returned by the completer callback.
This didn't go well with readline completer's feature where say I type common and the possible completions are commonKeys and commonKeysIn, If I click TAB, instead of showing both the completions, it auto-completes it to commonKeys as its the longest common substring of all completions.
So in our case, if the first letter (say c) of input is common in all completions, {ascii-code}c{ascii-code} (our bold output) becomes the longest common substring of all completions causing completer to just auto-complete it instead of displaying all possible completions.

Is there any other way I can go around implementing this??

And regarding the algorithm, afaik this isn't based on any prior implementation/art. It's a simple algorithm that I wrote from scratch that matches based on how much the characters in the completion deviate from the input string (distance).
The magic 0.25 is a weight I have iteratively reached at, and 0.7 is the threshold score above which matches are permissible.

kgryte · 2024-03-24T09:43:05Z

Hmm...yeah, I hadn't thought about the completer controlling the terminal output. My initial thought was escape sequences, but then you are right, I can see how that might be an issue with auto-completion.

Well...one possibility is to just to write our own completer. We don't have to use the built-in. Could write it from scratch. May be worth exploring the Node.js source implementation to see how it handles the completer logic.

kgryte · 2024-03-24T09:49:05Z

For IPython, looks like they use jedi for auto-completion. In Node.js, fuzzy-matching is not supported.

Snehil-Shah · 2024-03-24T10:01:34Z

For IPython, looks like they use jedi for auto-completion. In Node.js, fuzzy-matching is not supported.

Fuzzy matching is not supported in IPython either from what I know. Am I mistaken?

kgryte · 2024-03-24T10:04:01Z

For IPython, looks like you can enable jedi: https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.completer.html#IPython.core.completer.Completer.use_jedi. In which case, I believe you can get fuzzy auto-complete in IPython: https://jedi.readthedocs.io/en/latest/docs/api.html?highlight=fuzzy#jedi.Script.complete.

kgryte · 2024-03-24T10:04:47Z

I haven't personally used jedi in IPython, so YMMV.

Snehil-Shah · 2024-03-24T10:14:32Z

@kgryte understood! I'll explore writing our own completer, and studying jedi's fuzzy matching algorithm👍

kgryte · 2024-03-24T10:15:42Z

Don't stray too far down the rabbit hole! It may be vast. 😅

kgryte · 2024-04-02T06:54:44Z

right now fuzzy match will only work for completing expressions and not for incomplete workspaces, file system, tutorial and require statements, as for some reason they are not using the abstracted filterByPrefix method.

This was intentional, as matching is "inlined" due to additional loop logic (e.g., complete_require.js). It should, however, be possible to swap out startsWith with fuzzyMatch, with the latter being a separate module, rather than a private function.

We'll want fuzzy matching to be enabled via a setting. This is another thing which users may have preferences over.

And as a first pass, we can merge basic functionality in first and then do character highlighting in a follow-up PR, as that is likely to require yet more refactoring and is not strictly necessary in order to get the benefits of fuzzy matching.

kgryte · 2024-04-02T06:55:58Z

In the JSDoc comment of fuzzyMatch, it would be good to explain the algorithm/approach a bit and include example values for each stage of the algorithm. It can be a bit hard to grok what you are doing just by looking at for loops.

Snehil-Shah · 2024-04-02T08:02:13Z

@kgryte I am almost done with the character highlighting part though, can I push the changes and we can iterate over it after?

kgryte · 2024-04-02T08:41:22Z

@Snehil-Shah Sure. In general, going forward, it would be good to split tasks which can be split into separate PRs, as any additional complexity will increase the risk of PRs getting bogged down in review and refactoring.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2024-04-05T16:30:40Z

@kgryte Done with the implementation.
Fuzzy completions is now controllable via settings, and also added a max height to the completions output, as, in cases where the output was taller than the terminal (causing scrolling), moveCursor wasn't working as expected.

Current behaviours of TAB completions:

going up to the line, brings back the original line and going to any completion automatically updates the line.

undo.mp4

Hitting TAB is like a toggle button to bring up and hide the completions panel.
I tried separating the exact completions from the fuzzy ones and IMO they look worse:

It was better before, and now that we have a setting to disable it, in my opinion, we can normally display them just after the exact completions as it doesn't hurt to have more completions that the user might be potentially going for, given the exact ones are up top already. And if the user doesn't want it, they can disable it via settings.
But if you have a differing opinion, we can stick to your recommendation of using exact prefixes by default and when nothing matches, showing fuzzy completions.👍

kgryte · 2024-04-05T19:59:48Z

Re: line separation. I don't think I agree here. By mixing the exact match and fuzzy results, you are making the user search for the exact matches. They, themselves, have to identify the boundaries between fuzzy and exact match. And that is not ideal UX, especially as the fuzzy matches create visual noise.

Take your example above. For the exact matches, all matching text is lower case, but, for the fuzzy matches, you have a mix of lowercase, camelcase, and title case. The fuzzy completions are just inherently noisier, as there is no pattern. And because of that noise, you are imposing additional cognitive load on users to identify the "correct" match and to find how their query matches any one specific fuzzy match. In short, you are making users work. And users, generally, try to avoid work.

The line separation makes it clear what is exact and what is fuzzy. And a line separation provides structure to results for which, displayed together, lack obvious structure, until you spend time understanding order and implicit grouping. IMO, grouping together is worse UX.

On another note, I am wondering if we should limit the number of results in order to avoid choice paralysis. Did a quick search (ref 1 and ref 2), and seems generally agreed to limit fuzzy results to no more than 10 items. Anything more and users spend more time searching than auto-completing.

Snehil-Shah · 2024-04-05T20:28:45Z

@kgryte I understand.
Regarding limiting the results, are we talking about just limiting the fuzzy results or all completions in general? Because if we limit even exact prefixes, then we are losing potential completions that the user is actually trying to find as the exact prefixes are sorted lexicographically and not based on some order of relevancy. in short, on what basis will we be discarding exact prefixes by limiting overall completions to 10?
Limiting fuzzy results does make sense though as we have a basis for discarding completions (ie. score).
And what are your thoughts on showing fuzzy matches only when there are no exact prefixes? It would lead to lesser and cleaner completions and better perf..

kgryte · 2024-04-05T20:50:19Z

Agreed. Restricting number of completion candidates only makes sense for fuzzy completions.

And yeah, I still lean toward fuzzy completions only when no exact prefix matches.

kgryte · 2024-04-05T20:53:03Z

Also, fuzzy completions only when no exact match is the less invasive choice to start. If users start clamoring for fuzzy completions even when there are exact prefix matches, should be straightforward to refactor to match the current behavior. And we have this PR as a reference.

Snehil-Shah · 2024-04-05T20:57:37Z

@kgryte great. Will go ahead with fuzzy completions only when there's no exact matches.
And now that we are doing this, does it make sense to limit results?
When we are displaying fuzzy completions, it means there were no exact matches. Presumably, that means the fuzzy completions too wouldn't be that much. Plus as there are no exact completions, there's a chance the user really needs help there, and limiting fuzzy completions would be counter intuitive imo. What do you think?

kgryte · 2024-04-05T21:00:36Z

Maybe make the number of displayed fuzzy completions a setting?

Snehil-Shah · 2024-04-05T21:10:14Z

@kgryte we can do that. although I wonder if it's that user-friendly given it's generally abstracted to just a true or false in most editors🤔

kgryte · 2024-04-05T21:16:52Z

Another option would be to provide a setting (or settings) for users to adjust the magic parameters which are used to score results. That would achieve the same end as limiting the number of results. But, then again, we could change the underlying scoring algo in the future.

Regardless, I am not sure that users benefit all that much from having more than 10 fuzzy choices. One option to make it more "powerful" or relevant is to weight items in a user's history higher. But this makes the implementation more complex.

I am tempted to just say limit to 10. If people want more, we can easily make this adjustable.

Snehil-Shah · 2024-04-05T21:23:02Z

@kgryte I get it. will go ahead and keep the setting a Boolean, and will limit the fuzzy results to 10. Is that right?
Regarding weighing the user's history, that's smart, maybe we can explore that in a future endeavour..

kgryte · 2024-04-05T21:26:03Z

will go ahead and keep the setting a Boolean, and will limit the fuzzy results to 10. Is that right?

Yes.

Regarding weighing the user's history, that's smart, maybe we can explore that in a future endeavour..

Yes, but definitely not in this PR. 😅

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2024-04-06T11:03:04Z

@kgryte Done

Only fuzzy:

Only exact:

Currently writing some tests for the completer

Btw I think I found a small bug in auto-closing brackets (in develop). when hitting the up arrow from inside the auto-closed pair, it removes the auto-inserted brackets on the right of the cursor automatically. And for some reason, this only happens from the 2nd command onwards.

Before:

(['<|>'])

After hitting UP:

(['<|>

Tried to debug and no, it's not caused by auto-delete-pairs, and as it is happening from the 2nd command onwards, I presume it's a node bug.. I found a similar bug when using TAB key as noted in completer_engine.js (Line 422)..
It's a small bug though, just thought I would point it out..

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2024-04-07T21:25:05Z

@kgryte Done with the tests..

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2024-04-26T01:30:44Z

@kgryte Done with this PR:

The completer works as expected with the pager and is also responsive to terminal resizes.
Did a bunch of refactoring and updated the tests too. This should be ready for review.

FYI: while running the tests, I was getting these warnings about SIGWINCH listeners.

I added a removeListener when closing the REPL and the warnings seem to go away:

Should I push this?

kgryte · 2024-04-26T03:32:21Z

Ah, yes, forgot to add the remove listener call in the other PR. Rather than make that a part of this PR, I suggest submitting a patch PR which we can merge sooner.

feat: add fuzzy matching algorithm for tab completions

dc73372

Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>

kgryte added Needs Review A pull request which needs code review. REPL Issue or pull request specific to the project REPL. labels Mar 15, 2024

kgryte reviewed Mar 15, 2024

View reviewed changes

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js Outdated Show resolved Hide resolved

kgryte reviewed Mar 15, 2024

View reviewed changes

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js Outdated Show resolved Hide resolved

kgryte reviewed Mar 15, 2024

View reviewed changes

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js Outdated Show resolved Hide resolved

kgryte reviewed Mar 15, 2024

View reviewed changes

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js Outdated Show resolved Hide resolved

kgryte reviewed Mar 15, 2024

View reviewed changes

lib/node_modules/@stdlib/repl/lib/filter_by_prefix.js Outdated Show resolved Hide resolved

refactor: fix function jsdoc

3cae6a7

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah force-pushed the repl-fuzzy branch from d2afe68 to 3cae6a7 Compare March 24, 2024 09:03

This was referenced Mar 27, 2024

[RFC]: add fuzzy auto-completion in REPL #1845

Open

[RFC]: building a better Node.js REPL stdlib-js/google-summer-of-code#69

Closed

Merge branch 'stdlib-js:develop' into repl-fuzzy

abe7525

Snehil-Shah added 4 commits April 2, 2024 12:32

feat: rewrite the completer engine

dcb6014

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

feat: improve fuzzy algorithm and sort by relevancy

bef5dd3

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

feat: add support for highlighted completions in terminal mode

0aee34d

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

fix: wrong completion previews displayed for fuzzy completion

7efe1a7

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah marked this pull request as ready for review April 2, 2024 13:07

fix: limit height of completions & fix completions with prefixes

237fb7f

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah requested a review from kgryte April 5, 2024 16:30

Snehil-Shah added 2 commits April 6, 2024 10:22

feat: fuzzy completions only when no exact completions

576f402

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

fix: bug in completer

89b5001

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

test: add tests for tab completions

c464944

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah added 3 commits April 8, 2024 22:17

docs: fix test name and comments

6d6d7c9

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

fix: tune fuzzy algorithm

2c15d15

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

docs: document setting

0e54cf5

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah mentioned this pull request Apr 22, 2024

feat: add pager to allow scrolling of long outputs in the REPL #2162

Merged

1 task

Snehil-Shah and others added 6 commits April 26, 2024 01:15

Merge branch 'develop' into repl-fuzzy

7128fac

Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>

fix: abnormal completer behavior

fa22afb

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

refactor: move flag to the completer engine namespace

7b8074a

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

refactor: abstract logics into private methods

cc2f3c5

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

fix: make completer SIGWINCH aware

85efce4

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

test: update tests for TTY

023c31a

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fuzzy matching algorithm for tab completions #1855

feat: add fuzzy matching algorithm for tab completions #1855

Snehil-Shah commented Mar 12, 2024 •

edited

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024 •

edited

kgryte commented Mar 24, 2024

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024

kgryte commented Mar 24, 2024 •

edited

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024

kgryte commented Apr 2, 2024

kgryte commented Apr 2, 2024

Snehil-Shah commented Apr 2, 2024

kgryte commented Apr 2, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 6, 2024 •

edited

Snehil-Shah commented Apr 7, 2024

Snehil-Shah commented Apr 26, 2024

kgryte commented Apr 26, 2024

feat: add fuzzy matching algorithm for tab completions #1855

Are you sure you want to change the base?

feat: add fuzzy matching algorithm for tab completions #1855

Conversation

Snehil-Shah commented Mar 12, 2024 • edited

Description

Related Issues

Questions

Other

Checklist

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024 • edited

kgryte commented Mar 24, 2024

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024

kgryte commented Mar 24, 2024 • edited

Snehil-Shah commented Mar 24, 2024

kgryte commented Mar 24, 2024

kgryte commented Apr 2, 2024

kgryte commented Apr 2, 2024

Snehil-Shah commented Apr 2, 2024

kgryte commented Apr 2, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 5, 2024

kgryte commented Apr 5, 2024

Snehil-Shah commented Apr 6, 2024 • edited

Snehil-Shah commented Apr 7, 2024

Snehil-Shah commented Apr 26, 2024

kgryte commented Apr 26, 2024

Snehil-Shah commented Mar 12, 2024 •

edited

kgryte commented Mar 24, 2024 •

edited

kgryte commented Mar 24, 2024 •

edited

Snehil-Shah commented Apr 6, 2024 •

edited