Taking into account the hierarchy of inputs #3255

grothesque · 2023-04-18T18:13:46Z

I have read through the manual page (man fzf)
I have the latest version of fzf
I have searched through the existing issues

Info

OS
- Linux
- Mac OS X
- Windows
- Etc.
Shell
- bash
- zsh
- fish

Problem / Steps to reproduce

Many thanks for this excellent tool! After using it for a while and searching the existing issues, I would like to point out a possible direction in which the matching of fzf could be improved.

Like many people, I often use fzf to filter lists of file/directory names. For example, there could be the following directories

projects/foo
projects/foo/doc
projects/foo/src
projects/foo/tests
notes/some/category/foo

When fzf is launched with the above list of choices and the user searches for “foo”, the results will be presented in the above order, i.e. the three subdirectories of projects/foo will be considered more relevant than the notes on "foo". However, it could be argued that for a hierarchy of directories and files the subdirectories of projects/foo are already covered by the match of their parent. After all, they do not provide any further reason to match.

In the above example with only five items there is no problem, but with thousands of matching items it is easy to miss top-level matches that are shown behind subitems of other top-level matches.

Is there a way to solve this issue by configuration of current fzf? If not, perhaps we could discuss here possible solutions?

The text was updated successfully, but these errors were encountered:

dr0bz · 2023-04-25T06:52:21Z

Hi @grothesque,
what you need is sorting of fzf results. It's done by fzf --tiebreak=.... In this case you should use fzf --tiebreak=end.

Just set export FZF_DEFAULT_OPTS="--tiebreak=end" in your .bashrc, .zshrc or whatever shell you are using.

man fzf excerpt:

Best regards,
dr0bz

junegunn · 2023-12-26T14:25:38Z

@dr0bz Thanks for the comment. Yes, --tiebreak=end can help in this case, but it looks like the current implementation needs improvement as it doesn't work smoothly with the above example.

For fo, it chooses notes/some/category/foo as expected.

But if we add another o to the query,

This feels quite wrong, let me see what I can do.

See #3255 (comment)

grothesque · 2024-04-16T14:56:19Z

@junegunn and @dr0bz, thanks for your suggestions.

First, I'd like to comment on the inconsistency noted by @junegunn when using --tiebreak=end.

This is with fzf 0.38.0 from Debian. I'm running the command

echo -e 'projects/foo\nprojects/foo/doc\nprojects/foo/src\nprojects/foo/tests\nnotes/some/category/foo' | fzf

Without any option, when fo has been typed, fzf suggests projects/foo (Supposedly because --tiebreak=length is the default). I would expect the same behavior with --tiebreak=end,length, but instead it suggests notes/some/category/foo, just like with --tiebreak=end only. Strangely, typing the full foo selects projects/foo independently of the tiebreak setting.

The --tiebreak=end suggestion is a good start, but it's not quite a solution to the real problem that I had in mind. Let me try to demonstrate it with a real-world example:

Let's say I'm searching the filesystem for stuff that relates to "tinyarray". So I can use the excellent fdfind command like this: fdfind tinyarray. Among the many lines it outputs are the following ones:

12/tinyarray-src/
12/tinyarray-src/test_tinyarray.py

From the point of view of searching a hierarchical file system, the second match is redundant. Worse, that directory could contain hundreds of files (that may or may not contain tinyarray in their basename).

That's why fdfind has the --prune option:

       --prune
              Do not traverse into matching directories.

I would find it extremely useful if there was a way to teach fdfind | fzf (with "tinyarray" typed) to give the highest scores to the lines that are output by fdfind --prune tinyarray.

I guess that this would require some special treatment of directory separator characters on the part of fzf. However, I believe that file paths are an important enough application of fzf to justify an exception.

junegunn · 2024-04-17T14:13:29Z

This is with fzf 0.38.0 from Debian.

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

grothesque · 2024-04-17T14:48:04Z

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

Well, OK, but could you please also have a look at the second (longer) part of my comment where I explain what I actually meant when I opened this issue?

junegunn · 2024-04-17T17:05:59Z

From the point of view of searching a hierarchical file system, the second match is redundant.

I feel quite the opposite. I'm usually not looking for intermediate nodes. Anyway, in that case, the default tiebreak of length should work well because the parent nodes have shorter names. Something like a mixture of length and end? I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

FWIW, you might want to experiment with a patch I posted at #3608 (comment) and see how it works for you.

grothesque · 2024-04-18T16:19:02Z

Thanks for having a look. The patch you link to looks interesting: perhaps it's possible to implement what I have in mind by assigning a zero (or very low) score to lines whose match does not involve the last path component? (That would rely on the assumption that lines for parent directories are present independently as well.)

I feel quite the opposite. I'm usually not looking for intermediate nodes.
(...)
I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

Sure, that's a reasonable and consistent design!

I find the "prunning" approach very useful when looking for anything related to a person or a project. If there's a directory "Pictures/fred-birthday" that contains 100 files, and I search for "fred", I don't want other results to be overshadowed by the many individual files in that directory. Length as a criterion doesn't really help: There could be a single line that matches as well, but is very long. Still this way of operation may not be very appropriate for fzf: the "prunned" results are a mixture of files and directories, and fzf's main application in my experience is file selection on the command line.

Please feel free to close this issue.

junegunn added a commit that referenced this issue Dec 26, 2023

Fix unexpected result of --tiebreak=end

519de7c

See #3255 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking into account the hierarchy of inputs #3255

Taking into account the hierarchy of inputs #3255

grothesque commented Apr 18, 2023

dr0bz commented Apr 25, 2023

junegunn commented Dec 26, 2023

grothesque commented Apr 16, 2024

junegunn commented Apr 17, 2024

grothesque commented Apr 17, 2024

junegunn commented Apr 17, 2024 •

edited

grothesque commented Apr 18, 2024

Taking into account the hierarchy of inputs #3255

Taking into account the hierarchy of inputs #3255

Comments

grothesque commented Apr 18, 2023

Info

Problem / Steps to reproduce

dr0bz commented Apr 25, 2023

junegunn commented Dec 26, 2023

grothesque commented Apr 16, 2024

junegunn commented Apr 17, 2024

grothesque commented Apr 17, 2024

junegunn commented Apr 17, 2024 • edited

grothesque commented Apr 18, 2024

junegunn commented Apr 17, 2024 •

edited