-
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Optimize checking for empty directories when a directory has subdirectories #183
Conversation
#171 removes the conditional compilation of the function so as long as your changes also work on Windows, the compilation will succeed. It also adds a Windows CI target so as long as it's merged first there'll be a check that the build doesn't get broken for any OS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to mull on this PR a bit more, but about windows support, perhaps you could keep the old behavior for non-unix systems?
There was no behavior for non-unix systems. When As for a link count of 2. The The only exception I am aware of is BTRFS does not track link counts on directories. See https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Project_ideas.html#Track_link_count_for_directories. Since BTRFS always returns a link count of 1 the expensive branch of listing the directory is always used. Edit: Doing some more research. It seems a CIFS filesystems mounted over the network will always return a link count of 0. CIFS will always take the expensive branch. |
Ah nice, I knew there had to be a more efficient way to do this... Good catch 👍 |
Yes, so keep the old behavior and use it for non-unix, unless it doesn't work? Am I missing something? Anyways, I think perhaps we should introduce some benchmarks so we're not doing this blindly (criterion is preferred imo, mainly because I'm familiar with it), because if checking for filesystem to determine branch turns out to be slower than this then we're just shooting ourselves in the foot. Maybe there is no need for a fs check, but if there is I'd like if we proved to ourselves that it was a trivial cost. |
There is no need for a fs check. The exceptions I mentioned above for BTRFS and CIFS under report the link count so the expensive branch of doing a read_dir is always run. |
Test setup was 100 directories each containing 100 files and 1 subdirectory. On my M1 Mac Mini. Running release build. Command Without checking nlinks (main branch): With checking nlinks (empty-directory branch): Edit - Purging filesystem cache between runs: Command: Without checking nlinks (main branch): With checking nlinks (empty-directory branch): |
Signed-off-by: Christina Sørensen <christina@cafkafk.com>
@cfxegbert could you take a look at f0989ec, I'm running out the door right now, but I tried to document these additions, feel free to reword it completely. |
I would only add that the link count on a directory on most unix filesystems is two plus the number of subdirectories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this as much as I could on x86_64 GNU/Linux, and it does seem to work. It's a cool performance improvement, thanks for your contribution!
On a unix system a directory will have a link count of two when it does not have any subdirectories. This adds a check to
is_empty_dir
to check the link count to avoid the expensive listing of a directory.Also noticed there is not a windows version of
is_empty_dir
so eza will most likely not compile on windows. I am not a windows developer and have no way to build or test a windows version.