Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking into account the hierarchy of inputs #3255

Open
5 of 10 tasks
grothesque opened this issue Apr 18, 2023 · 7 comments
Open
5 of 10 tasks

Taking into account the hierarchy of inputs #3255

grothesque opened this issue Apr 18, 2023 · 7 comments

Comments

@grothesque
Copy link

  • I have read through the manual page (man fzf)
  • I have the latest version of fzf
  • I have searched through the existing issues

Info

  • OS
    • Linux
    • Mac OS X
    • Windows
    • Etc.
  • Shell
    • bash
    • zsh
    • fish

Problem / Steps to reproduce

Many thanks for this excellent tool! After using it for a while and searching the existing issues, I would like to point out a possible direction in which the matching of fzf could be improved.

Like many people, I often use fzf to filter lists of file/directory names. For example, there could be the following directories

projects/foo
projects/foo/doc
projects/foo/src
projects/foo/tests
notes/some/category/foo

When fzf is launched with the above list of choices and the user searches for “foo”, the results will be presented in the above order, i.e. the three subdirectories of projects/foo will be considered more relevant than the notes on "foo". However, it could be argued that for a hierarchy of directories and files the subdirectories of projects/foo are already covered by the match of their parent. After all, they do not provide any further reason to match.

In the above example with only five items there is no problem, but with thousands of matching items it is easy to miss top-level matches that are shown behind subitems of other top-level matches.

Is there a way to solve this issue by configuration of current fzf? If not, perhaps we could discuss here possible solutions?

@dr0bz
Copy link

dr0bz commented Apr 25, 2023

Hi @grothesque,
what you need is sorting of fzf results. It's done by fzf --tiebreak=.... In this case you should use fzf --tiebreak=end.

Just set export FZF_DEFAULT_OPTS="--tiebreak=end" in your .bashrc, .zshrc or whatever shell you are using.

man fzf excerpt:
image

Best regards,
dr0bz

@junegunn
Copy link
Owner

@dr0bz Thanks for the comment. Yes, --tiebreak=end can help in this case, but it looks like the current implementation needs improvement as it doesn't work smoothly with the above example.

For fo, it chooses notes/some/category/foo as expected.
image

But if we add another o to the query,

image

This feels quite wrong, let me see what I can do.

@grothesque
Copy link
Author

@junegunn and @dr0bz, thanks for your suggestions.

First, I'd like to comment on the inconsistency noted by @junegunn when using --tiebreak=end.

This is with fzf 0.38.0 from Debian. I'm running the command

echo -e 'projects/foo\nprojects/foo/doc\nprojects/foo/src\nprojects/foo/tests\nnotes/some/category/foo' | fzf 

Without any option, when fo has been typed, fzf suggests projects/foo (Supposedly because --tiebreak=length is the default). I would expect the same behavior with --tiebreak=end,length, but instead it suggests notes/some/category/foo, just like with --tiebreak=end only. Strangely, typing the full foo selects projects/foo independently of the tiebreak setting.


The --tiebreak=end suggestion is a good start, but it's not quite a solution to the real problem that I had in mind. Let me try to demonstrate it with a real-world example:

Let's say I'm searching the filesystem for stuff that relates to "tinyarray". So I can use the excellent fdfind command like this: fdfind tinyarray. Among the many lines it outputs are the following ones:

12/tinyarray-src/
12/tinyarray-src/test_tinyarray.py

From the point of view of searching a hierarchical file system, the second match is redundant. Worse, that directory could contain hundreds of files (that may or may not contain tinyarray in their basename).

That's why fdfind has the --prune option:

       --prune
              Do not traverse into matching directories.

I would find it extremely useful if there was a way to teach fdfind | fzf (with "tinyarray" typed) to give the highest scores to the lines that are output by fdfind --prune tinyarray.

I guess that this would require some special treatment of directory separator characters on the part of fzf. However, I believe that file paths are an important enough application of fzf to justify an exception.

@junegunn
Copy link
Owner

This is with fzf 0.38.0 from Debian.

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

@grothesque
Copy link
Author

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

Well, OK, but could you please also have a look at the second (longer) part of my comment where I explain what I actually meant when I opened this issue?

@junegunn
Copy link
Owner

junegunn commented Apr 17, 2024

From the point of view of searching a hierarchical file system, the second match is redundant.

I feel quite the opposite. I'm usually not looking for intermediate nodes. Anyway, in that case, the default tiebreak of length should work well because the parent nodes have shorter names. Something like a mixture of length and end? I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

FWIW, you might want to experiment with a patch I posted at #3608 (comment) and see how it works for you.

@grothesque
Copy link
Author

Thanks for having a look. The patch you link to looks interesting: perhaps it's possible to implement what I have in mind by assigning a zero (or very low) score to lines whose match does not involve the last path component? (That would rely on the assumption that lines for parent directories are present independently as well.)

I feel quite the opposite. I'm usually not looking for intermediate nodes.
(...)
I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

Sure, that's a reasonable and consistent design!

I find the "prunning" approach very useful when looking for anything related to a person or a project. If there's a directory "Pictures/fred-birthday" that contains 100 files, and I search for "fred", I don't want other results to be overshadowed by the many individual files in that directory. Length as a criterion doesn't really help: There could be a single line that matches as well, but is very long. Still this way of operation may not be very appropriate for fzf: the "prunned" results are a mixture of files and directories, and fzf's main application in my experience is file selection on the command line.

Please feel free to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants