-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are you open to sorting the fuzzy matches? #17
Comments
This has made me see something that I missed before: there are (at least) two notions of sorting.
Although I'm convinced that (1) is bad, I'm also pretty convinced that (2) could be good. The first question, then, is what is the algorithm for match sorting? Here are two implementations that are well-regarded:
Neither is enticing me to read it. :/ Any ideas about how to formalize your prioritization scheme? Is it something as simple as "prefer choices that match characters earlier in the string"? |
I can think of many different things you might want to priorities: |
Here is a project I came across a while back, when looking for a type of weighted matching algorithm that behaved like Quicksilver. https://github.com/rmm5t/liquidmetal Example @garybernhardt Plus it's javascript, your favorite ;-) |
Also the liquidmetal readme links to Quicksilver.js which produces very similar results, with the algorithm implementation looking less complex. |
@garybernhardt I don't have any formal scheme. Intuitively I feel the following should work:
Another implementation is topfunky's PeepOpen, though it's currently closed source. (Its algorithm takes into account metadata, but that could be safely ignored here without affecting the results.) |
@airblade, that certainly sounds reasonable. The general version of your first point is probably "prefers shortest matching substring" (rather than just direct runs of matching letters). I've spent some time digging through the two JS examples linked, but it feels like trying to decipher undocumented, super-imperative C. (Unsurprising, considering that's exactly what it was ported from.) |
@garybernhardt Agreed. I'm writing a Ruby scorer which is simple enough for me to understand. Although it's nearly ready I don't think I'll quite get there before I have to head off for a week. |
@garybernhardt Here it is: https://github.com/airblade/respecta. Were you to use it, I'd appreciate guidance on how best to make it callable from Selecta. |
I just pushed a new scoring algorithm in e2663f0. It does the general form of the first half of @airblade's proposal (small substrings matching the query score higher than large ones). It doesn't do consider "words" yet; I want to see how this new algorithm goes first. As a bonus, this is far faster than the old regex-based method, and doesn't fall over for large queries over large files. |
@airblade, I took a quick look at respecta, but it returns 0 for empty query, and it blew up on other inputs. I'd be happy to consider replacement algorithms, of course. One thing to note: performance matters a lot here. Even adding one method call per choice can make a measurable difference. There's a new benchmark.rb script that emits runtimes for various benchmarks so you can compare any candidate algorithms against the current one. |
I just hit a case where the sorting did not work for me, so I thought I should add it to this discussion:
The first list below is the initial list of matches. The second is the same list after I have typed
|
Perhaps the sorting behaviour could be disabled with a command line flag, assuming it's on by default. |
It sucks to have command line flags that people have to remember, but it's becoming clear that some applications of Selecta really need the ranking algorithm while others really don't. The question, then, is whether it's on by default. I can see arguments for either way: no ranking by default means that Selecta does a simple thing unless you tell it to do a complex thing. Ranking on by default means that file selection is a smoother experience (and I'm pretty sure that's the most common use case). |
The README states that Selecta doesn't change the order of items piped in. Is this a hard and fast constraint?
I ask because I use Selecta to find files and it would be nice (for me, at least) for Selecta to sort matches by some metric of goodness of fit. For example, prioritising matches where the match includes the first letter of each path component (
a/c/...
hitsapp/controllers/...
beforeapp/helpers/search_helper
.)Quite possibly everybody would have a different idea of how they would like to sort matches so maybe this is a non-starter.
The text was updated successfully, but these errors were encountered: