Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you open to sorting the fuzzy matches? #17

Closed
airblade opened this issue Oct 23, 2013 · 13 comments
Closed

Are you open to sorting the fuzzy matches? #17

airblade opened this issue Oct 23, 2013 · 13 comments

Comments

@airblade
Copy link

The README states that Selecta doesn't change the order of items piped in. Is this a hard and fast constraint?

I ask because I use Selecta to find files and it would be nice (for me, at least) for Selecta to sort matches by some metric of goodness of fit. For example, prioritising matches where the match includes the first letter of each path component (a/c/... hits app/controllers/... before app/helpers/search_helper.)

Quite possibly everybody would have a different idea of how they would like to sort matches so maybe this is a non-starter.

@garybernhardt
Copy link
Owner

This has made me see something that I missed before: there are (at least) two notions of sorting.

  1. There's a choice sort order that's independent of matching. At first, Selecta sorted results by length. That was removed because multiple people pointed out that the sort can be done before piping to Selecta, and the same result is achieved.
  2. There's match-quality ordering (for lack of a better term), which is what you're addressing: Selecta can decide that one match is better than another based on the matching process itself, rather than a property of the choice string.

Although I'm convinced that (1) is bad, I'm also pretty convinced that (2) could be good. The first question, then, is what is the algorithm for match sorting? Here are two implementations that are well-regarded:

Neither is enticing me to read it. :/ Any ideas about how to formalize your prioritization scheme? Is it something as simple as "prefer choices that match characters earlier in the string"?

@marcomorain
Copy link
Contributor

I can think of many different things you might want to priorities:

  • longer runs of matches so "cat" matches "catch" ahead of "chart"
  • the path example that @airblade mentions
  • you might also want to match add snake_case and CamalCase to @airblade's example - "SuperJavaFactory" should score highly for the input "sjf".

@arronmabrey
Copy link
Contributor

Here is a project I came across a while back, when looking for a type of weighted matching algorithm that behaved like Quicksilver.

https://github.com/rmm5t/liquidmetal

Example

selecta

vs
liquidmetal

@garybernhardt Plus it's javascript, your favorite ;-)

@arronmabrey
Copy link
Contributor

Also the liquidmetal readme links to Quicksilver.js which produces very similar results, with the algorithm implementation looking less complex.

@airblade
Copy link
Author

@garybernhardt I don't have any formal scheme. Intuitively I feel the following should work:

  • prefer runs of matching letters to isolated letters
  • prefer letters at the start of "words" ...where application_controller is two "words" (and AbstractFactoryFactory is three "words", though I wouldn't need this myself).

Another implementation is topfunky's PeepOpen, though it's currently closed source. (Its algorithm takes into account metadata, but that could be safely ignored here without affecting the results.)

@garybernhardt
Copy link
Owner

@airblade, that certainly sounds reasonable. The general version of your first point is probably "prefers shortest matching substring" (rather than just direct runs of matching letters).

I've spent some time digging through the two JS examples linked, but it feels like trying to decipher undocumented, super-imperative C. (Unsurprising, considering that's exactly what it was ported from.)

@airblade
Copy link
Author

@garybernhardt Agreed. I'm writing a Ruby scorer which is simple enough for me to understand. Although it's nearly ready I don't think I'll quite get there before I have to head off for a week.

@airblade
Copy link
Author

@garybernhardt Here it is: https://github.com/airblade/respecta. Were you to use it, I'd appreciate guidance on how best to make it callable from Selecta.

@garybernhardt
Copy link
Owner

I just pushed a new scoring algorithm in e2663f0. It does the general form of the first half of @airblade's proposal (small substrings matching the query score higher than large ones). It doesn't do consider "words" yet; I want to see how this new algorithm goes first.

As a bonus, this is far faster than the old regex-based method, and doesn't fall over for large queries over large files.

@garybernhardt
Copy link
Owner

@airblade, I took a quick look at respecta, but it returns 0 for empty query, and it blew up on other inputs. I'd be happy to consider replacement algorithms, of course. One thing to note: performance matters a lot here. Even adding one method call per choice can make a measurable difference. There's a new benchmark.rb script that emits runtimes for various benchmarks so you can compare any candidate algorithms against the current one.

@marcomorain
Copy link
Contributor

I just hit a case where the sorting did not work for me, so I thought I should add it to this discussion:
I am using selecta like this:

alias gr='git checkout `git recent | selecta`'

The first list below is the initial list of matches. The second is the same list after I have typed push. The branch that I wanted to checkout was the 3rd one in the list (my 2nd most recent branch). When I enter push the list puts the branch I want at the bottom of the list. This is not really a bug, but an example of how sorting can do the opposite of what you want.

marc@astrotrain ~/dev/swrve/event_processor $ gr
>
junit-4.11
release-62
4172-push-system-service
release-61
fix-flapping-dau-unit-test
4214-downcase-sync-fix
app-config-schema
4146-babble-push-notes
4146-dashboard-campaigns
4216-remove-sub-stack-restrictions
push-alpha
redesign-r61
4194-event-details-dups
3910-babble
push-demo
4146-churchill-service
4186-sentry-rack-env
3811-bdb-backups
better-dev-annotate
users-controller
> push
push-demo
push-alpha
4146-babble-push-notes
4172-push-system-service

@airblade
Copy link
Author

airblade commented Nov 4, 2013

Perhaps the sorting behaviour could be disabled with a command line flag, assuming it's on by default.

@garybernhardt
Copy link
Owner

It sucks to have command line flags that people have to remember, but it's becoming clear that some applications of Selecta really need the ranking algorithm while others really don't.

The question, then, is whether it's on by default. I can see arguments for either way: no ranking by default means that Selecta does a simple thing unless you tell it to do a complex thing. Ranking on by default means that file selection is a smoother experience (and I'm pretty sure that's the most common use case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants