Are you open to sorting the fuzzy matches? #17

airblade · 2013-10-23T12:21:24Z

The README states that Selecta doesn't change the order of items piped in. Is this a hard and fast constraint?

I ask because I use Selecta to find files and it would be nice (for me, at least) for Selecta to sort matches by some metric of goodness of fit. For example, prioritising matches where the match includes the first letter of each path component (a/c/... hits app/controllers/... before app/helpers/search_helper.)

Quite possibly everybody would have a different idea of how they would like to sort matches so maybe this is a non-starter.

The text was updated successfully, but these errors were encountered:

garybernhardt · 2013-10-23T21:10:27Z

This has made me see something that I missed before: there are (at least) two notions of sorting.

There's a choice sort order that's independent of matching. At first, Selecta sorted results by length. That was removed because multiple people pointed out that the sort can be done before piping to Selecta, and the same result is achieved.
There's match-quality ordering (for lack of a better term), which is what you're addressing: Selecta can decide that one match is better than another based on the matching process itself, rather than a property of the choice string.

Although I'm convinced that (1) is bad, I'm also pretty convinced that (2) could be good. The first question, then, is what is the algorithm for match sorting? Here are two implementations that are well-regarded:

TextMate's (the calculate_rank function): https://github.com/textmate/textmate/blob/master/Frameworks/text/src/ranker.cc
Command-T's (the recursive_match function): http://git.wincent.com/command-t.git/blob/refs/heads/master:/ruby/command-t/match.c

Neither is enticing me to read it. :/ Any ideas about how to formalize your prioritization scheme? Is it something as simple as "prefer choices that match characters earlier in the string"?

marcomorain · 2013-10-23T21:51:27Z

I can think of many different things you might want to priorities:

longer runs of matches so "cat" matches "catch" ahead of "chart"
the path example that @airblade mentions
you might also want to match add snake_case and CamalCase to @airblade's example - "SuperJavaFactory" should score highly for the input "sjf".

arronmabrey · 2013-10-23T22:13:21Z

Here is a project I came across a while back, when looking for a type of weighted matching algorithm that behaved like Quicksilver.

https://github.com/rmm5t/liquidmetal

Example

selecta

vs
liquidmetal

@garybernhardt Plus it's javascript, your favorite ;-)

arronmabrey · 2013-10-23T22:29:44Z

Also the liquidmetal readme links to Quicksilver.js which produces very similar results, with the algorithm implementation looking less complex.

airblade · 2013-10-24T07:58:36Z

@garybernhardt I don't have any formal scheme. Intuitively I feel the following should work:

prefer runs of matching letters to isolated letters
prefer letters at the start of "words" ...where application_controller is two "words" (and AbstractFactoryFactory is three "words", though I wouldn't need this myself).

Another implementation is topfunky's PeepOpen, though it's currently closed source. (Its algorithm takes into account metadata, but that could be safely ignored here without affecting the results.)

garybernhardt · 2013-10-25T00:56:14Z

@airblade, that certainly sounds reasonable. The general version of your first point is probably "prefers shortest matching substring" (rather than just direct runs of matching letters).

I've spent some time digging through the two JS examples linked, but it feels like trying to decipher undocumented, super-imperative C. (Unsurprising, considering that's exactly what it was ported from.)

airblade · 2013-10-25T15:12:43Z

@garybernhardt Agreed. I'm writing a Ruby scorer which is simple enough for me to understand. Although it's nearly ready I don't think I'll quite get there before I have to head off for a week.

airblade · 2013-10-26T11:24:15Z

@garybernhardt Here it is: https://github.com/airblade/respecta. Were you to use it, I'd appreciate guidance on how best to make it callable from Selecta.

garybernhardt · 2013-10-29T21:08:02Z

I just pushed a new scoring algorithm in e2663f0. It does the general form of the first half of @airblade's proposal (small substrings matching the query score higher than large ones). It doesn't do consider "words" yet; I want to see how this new algorithm goes first.

As a bonus, this is far faster than the old regex-based method, and doesn't fall over for large queries over large files.

garybernhardt · 2013-10-29T21:10:57Z

@airblade, I took a quick look at respecta, but it returns 0 for empty query, and it blew up on other inputs. I'd be happy to consider replacement algorithms, of course. One thing to note: performance matters a lot here. Even adding one method call per choice can make a measurable difference. There's a new benchmark.rb script that emits runtimes for various benchmarks so you can compare any candidate algorithms against the current one.

marcomorain · 2013-11-04T10:36:03Z

I just hit a case where the sorting did not work for me, so I thought I should add it to this discussion:
I am using selecta like this:

alias gr='git checkout `git recent | selecta`'

The first list below is the initial list of matches. The second is the same list after I have typed push. The branch that I wanted to checkout was the 3rd one in the list (my 2nd most recent branch). When I enter push the list puts the branch I want at the bottom of the list. This is not really a bug, but an example of how sorting can do the opposite of what you want.

marc@astrotrain ~/dev/swrve/event_processor $ gr
>
junit-4.11
release-62
4172-push-system-service
release-61
fix-flapping-dau-unit-test
4214-downcase-sync-fix
app-config-schema
4146-babble-push-notes
4146-dashboard-campaigns
4216-remove-sub-stack-restrictions
push-alpha
redesign-r61
4194-event-details-dups
3910-babble
push-demo
4146-churchill-service
4186-sentry-rack-env
3811-bdb-backups
better-dev-annotate
users-controller

> push
push-demo
push-alpha
4146-babble-push-notes
4172-push-system-service

airblade · 2013-11-04T10:45:41Z

Perhaps the sorting behaviour could be disabled with a command line flag, assuming it's on by default.

garybernhardt · 2013-11-04T18:47:30Z

It sucks to have command line flags that people have to remember, but it's becoming clear that some applications of Selecta really need the ranking algorithm while others really don't.

The question, then, is whether it's on by default. I can see arguments for either way: no ranking by default means that Selecta does a simple thing unless you tell it to do a complex thing. Ranking on by default means that file selection is a smoother experience (and I'm pretty sure that's the most common use case).

garybernhardt closed this as completed Oct 29, 2013

davepeck mentioned this issue Nov 25, 2014

Sorting search results atom/fuzzy-finder#21

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you open to sorting the fuzzy matches? #17

Are you open to sorting the fuzzy matches? #17

airblade commented Oct 23, 2013

garybernhardt commented Oct 23, 2013

marcomorain commented Oct 23, 2013

arronmabrey commented Oct 23, 2013

arronmabrey commented Oct 23, 2013

airblade commented Oct 24, 2013

garybernhardt commented Oct 25, 2013

airblade commented Oct 25, 2013

airblade commented Oct 26, 2013

garybernhardt commented Oct 29, 2013

garybernhardt commented Oct 29, 2013

marcomorain commented Nov 4, 2013

airblade commented Nov 4, 2013

garybernhardt commented Nov 4, 2013

Are you open to sorting the fuzzy matches? #17

Are you open to sorting the fuzzy matches? #17

Comments

airblade commented Oct 23, 2013

garybernhardt commented Oct 23, 2013

marcomorain commented Oct 23, 2013

arronmabrey commented Oct 23, 2013

arronmabrey commented Oct 23, 2013

airblade commented Oct 24, 2013

garybernhardt commented Oct 25, 2013

airblade commented Oct 25, 2013

airblade commented Oct 26, 2013

garybernhardt commented Oct 29, 2013

garybernhardt commented Oct 29, 2013

marcomorain commented Nov 4, 2013

airblade commented Nov 4, 2013

garybernhardt commented Nov 4, 2013