Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort perfect match first #2285

Closed
5 of 10 tasks
xeruf opened this issue Dec 14, 2020 · 14 comments
Closed
5 of 10 tasks

Sort perfect match first #2285

xeruf opened this issue Dec 14, 2020 · 14 comments

Comments

@xeruf
Copy link
Contributor

xeruf commented Dec 14, 2020

  • I have read through the manual page (man fzf)
  • I have the latest version of fzf - 0.24.4 (f502725)
  • I have searched through the existing issues

Info

  • OS
    • Linux
    • Mac OS X
    • Windows
    • Etc.
  • Shell
    • bash
    • zsh
    • fish

Problem / Steps to reproduce

image

I would expect number 2 to come out first here. My hunch is that space and minus are treated equally - whereas I think a space (or tab for that matter) is a much stronger word delimiter than a minus.

I encountered this during package search, where there is some extra info before and after the package name delimited with tabs, and thus a perfect match doesn't seem so perfect to fzf.

@junegunn
Copy link
Owner

Mine gives dragon as the top match. You have something in your $FZF_DEFAULT_OPTS?

@xeruf
Copy link
Contributor Author

xeruf commented Dec 14, 2020

Sorry, you are right:

printf 'fire-dragon 1\ndragon 2' | fzf --tiebreak=end --height=2

even though I can now understand why the algorithm selects it, it is not intuitive. The tiebreak end is there mainly for filenames, but it shouldn't prefer a longer entry because of that. Unfortunately there is no option to weigh two tiebreakers (end and length), they can only be ordered.

@junegunn
Copy link
Owner

junegunn commented Dec 14, 2020

  • --tiebreak is unintuitive and hard to understand especially with the new algorithm so I don't recommend relying on it. I even considered removing it but I kept it mostly for backward compatibility.
  • fzf was designed to be a context-free text filter rather than a path picker, so I decided that it should not be particularly optimized for file paths. It's a design choice. Maybe there are other programs that work better with file paths than fzf, but that is not a problem fzf wants to solve.
  • Consider not using --tiebreak option. The default option (length,index) is actually quite good for file paths.
  • There is a nice trick to make it prefer matches at the latter part of the path. We can take advantage of the fact that most file paths end with an extension, so simply put . at the end of your query. `abc.
    # Compare "the-end" vs. "the-end."
    fzf --height 10 --query 'the-end.' << EOF
    the-end-is-near/foo-bar.sh
    foo-bar/the-end-is-near.sh
    EOF

@xeruf
Copy link
Contributor Author

xeruf commented Dec 14, 2020

My point here is that I use fzf for paths in the overwhelming majority, and in that case it makes sense to use end rather than length as the tiebreaker (short paths such as under /etc often aren't what I'm looking for when searching for example for an rc file), that's why I set that as default option.
My fix here was to reverse the tiebreaks in that particular script.

That tip does indeed sound nice, though when searching through rc files is not particularly useful, as they rarely have an extension by convention.

The question remains whether the end-tiebreaker may be tweaked in a way that it yields the more expected output. I guess currently it goes by percentage or position from beginning, when it might just be better to count from behind, where in this example both are equal and thus the length-tiebreaker would select the correct option ;)

@junegunn
Copy link
Owner

when it might just be better to count from behind

We did that, and it didn't turn out great.

0541c0d

Sadly, there is no one-size-fits-all option. It's better to understand how the default ranking algorithm works and tweak the query string to assist this dumb filter to do a better job. fzf implements all major Readline key bindings so it's easy to move the cursor around and tweak the search terms once you get used to it.

> foo bar┃

# ALT-B
> foo ┃bar

# CTRL-U
> ┃bar

# Prepend /.
> /.┃bar

# CTRL-E
> /.bar┃

# CTRL-Y
> /.barfoo ┃

@xeruf
Copy link
Contributor Author

xeruf commented Dec 15, 2020

printf 'fire-dragon\ndragon' | fzf --tiebreak=end --height=2

Just realized that even in this case entering "dragon" will select "fire-dragon", so I can't enter anything more to get a better search result.

The behavior is the same with a space:

printf 'fire dragon\ndragon' | fzf --tiebreak=end --height=2

But removing the delimiter sorts "dragon" first as soon as I enter a single 'd':

printf 'firedragon\ndragon' | fzf --tiebreak=end --height=2

So I'm assuming you are splitting on non-word characters and the matching against the terms.
Let me suggest some tweaks to the algorithm, even before the tiebreakers come into play:

  • a split by '-' should be weaker than other non-word characters, because it often shows run-together words whereas underscore, space, tab & co show a stronger separation
  • the percentage of the term matched should be factored in

@xeruf
Copy link
Contributor Author

xeruf commented Dec 15, 2020

We did that, and it didn't turn out great.

0541c0d

What do you mean? Did you revert it?

@junegunn
Copy link
Owner

I can't enter anything more to get a better search result.

  • dragon ^d
  • dragon !f

Let me suggest some tweaks to the algorithm

Thanks for the input, but I'm not interested in fine-tuning the scoring algorithm because like I said above, fzf is a text filter and the criteria may not make sense with different types of input.

What do you mean? Did you revert it?

Yes, please read the commit message.

@xeruf
Copy link
Contributor Author

xeruf commented Dec 15, 2020

Let me suggest some tweaks to the algorithm

Thanks for the input, but I'm not interested in fine-tuning the scoring algorithm because like I said above, fzf is a text filter and the criteria may not make sense with different types of input.

I thought these changes would be pretty general.

If you don't want to do more tweaking, I would suggest to at least pick the issue title back up and always prefer a perfect match.

The percentage-match could also be an additional tiebreaker.

What do you mean? Did you revert it?

Yes, please read the commit message.

oh woops, I read it the other way around, sorry ^^

@xeruf
Copy link
Contributor Author

xeruf commented Jan 23, 2022

I am still interested in looking into improving this - look at this:
printf 'fire-dragonflame\ndragon' | fzf --tiebreak=index --height=2
image
I am wondering why the algorithm has to resort to the tiebreaker here at all?
Very similar to my my real-world issue in #2537.

@junegunn
Copy link
Owner

Because, unfortunately, the current scoring algorithm doesn't have a mechanism to prefer the latter. The scores calculated by the algorithm are just the same for the two cases.

Not saying there isn't room for improvement. We could try fine-tuning the algorithm.

@xeruf
Copy link
Contributor Author

xeruf commented Jul 26, 2022

How about taking inspiration from https://github.com/jhawthorn/fzy/blob/master/ALGORITHM.md#fzys-scoring ?
I love the scoring in fzy but the features of fzf.

@junegunn
Copy link
Owner

junegunn commented Jul 29, 2022

Very similar to my my real-world issue in #2537

So your real problem here is that the default length tiebreak is not taking --nth into account (see). If you use the default length tiebreak, fzf will prefer dragon over fire-dragonflame because it's shorter. In the context of fuzzy matching, "the perfect match" is always the shortest one, so with length tiebreak, we can say that fzf already prefers the perfect match.

It looks like the algorithm of fzy isn't very different from that of fzf. The document you linked above is severely outdated and it describes the algorithm of fzf 0.14.0 and below. fzy doesn't have --nth or --tiebreak, so the situation isn't too different if you use the default options of fzf.

@junegunn
Copy link
Owner

junegunn commented Jul 29, 2022

But we could consider adding a bonus point to a character at the end of a word so for a query dragon, dragon light is preferred over dragonflight (before --tiebreak)

image

If I'm not mistaken, fzy isn't doing that either.

image

junegunn added a commit that referenced this issue Aug 2, 2022
Favors the line with shorter matched chunk. A chunk is a set of
consecutive non-whitespace characters.

Unlike the default `length`, this new scheme works well with tabular input.

  # length prefers item #1, because the whole line is shorter,
  # chunk prefers item #2, because the matched chunk ("foo") is shorter
  fzf --height=6 --header-lines=2 --tiebreak=chunk --reverse --query=fo << "EOF"
  N | Field1 | Field2 | Field3
  - | ------ | ------ | ------
  1 | hello  | foobar | baz
  2 | world  | foo    | bazbaz
  EOF

If the input does not contain any spaces, `chunk` is equivalent to
`length`. But we're not going to set it as the default because it is
computationally more expensive.

Close #2285
Close #2537
- Not the exact solution to --tiebreak=length not taking --nth into account,
  but this should work. And the added benefit is that it works well even
  when --nth is not provided.
- Adding a bonus point to the last character of a word didn't turn out great.
  The order of the result suddenly changes when you type in the last
  character in the word producing a jarring effect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants