Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimize checking for empty directories when a directory has subdirectories #183

Merged
merged 6 commits into from
Sep 1, 2023

Conversation

cfxegbert
Copy link
Contributor

On a unix system a directory will have a link count of two when it does not have any subdirectories. This adds a check to is_empty_dir to check the link count to avoid the expensive listing of a directory.

Also noticed there is not a windows version of is_empty_dir so eza will most likely not compile on windows. I am not a windows developer and have no way to build or test a windows version.

@daviessm
Copy link
Contributor

Also noticed there is not a windows version of is_empty_dir so eza will most likely not compile on windows. I am not a windows developer and have no way to build or test a windows version.

#171 removes the conditional compilation of the function so as long as your changes also work on Windows, the compilation will succeed. It also adds a Windows CI target so as long as it's merged first there'll be a check that the build doesn't get broken for any OS.

Copy link
Member

@cafkafk cafkafk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to mull on this PR a bit more, but about windows support, perhaps you could keep the old behavior for non-unix systems?

@cfxegbert
Copy link
Contributor Author

cfxegbert commented Aug 29, 2023

I have to mull on this PR a bit more, but about windows support, perhaps you could keep the old behavior for non-unix systems?

There was no behavior for non-unix systems. When is_empty_dir was added only a unix version was added. As @daviessm pointed out #171 seems to be addressing the windows compilation issue. I will add the original function as a windows conditional compilation.

As for a link count of 2. The find command and many other utilities that walk the directory tree take advantage of the link count. https://github.com/c9/node-gnu-tools/blob/master/findutils-src/find/find.c#L1286-L1295

The only exception I am aware of is BTRFS does not track link counts on directories. See https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Project_ideas.html#Track_link_count_for_directories. Since BTRFS always returns a link count of 1 the expensive branch of listing the directory is always used.

Edit: Doing some more research. It seems a CIFS filesystems mounted over the network will always return a link count of 0. CIFS will always take the expensive branch.

@PThorpe92
Copy link
Member

Ah nice, I knew there had to be a more efficient way to do this... Good catch 👍

@cafkafk
Copy link
Member

cafkafk commented Aug 29, 2023

I have to mull on this PR a bit more, but about windows support, perhaps you could keep the old behavior for non-unix systems?

There was no behavior for non-unix systems. When is_empty_dir was added only a unix version was added.

Yes, so keep the old behavior and use it for non-unix, unless it doesn't work? Am I missing something?

Anyways, I think perhaps we should introduce some benchmarks so we're not doing this blindly (criterion is preferred imo, mainly because I'm familiar with it), because if checking for filesystem to determine branch turns out to be slower than this then we're just shooting ourselves in the foot.

Maybe there is no need for a fs check, but if there is I'd like if we proved to ourselves that it was a trivial cost.

@cfxegbert
Copy link
Contributor Author

Maybe there is no need for a fs check, but if there is I'd like if we proved to ourselves that it was a trivial cost.

There is no need for a fs check. The exceptions I mentioned above for BTRFS and CIFS under report the link count so the expensive branch of doing a read_dir is always run.

@cfxegbert
Copy link
Contributor Author

cfxegbert commented Aug 29, 2023

Test setup was 100 directories each containing 100 files and 1 subdirectory. On my M1 Mac Mini. Running release build.

Command hyperfine -w 10 "target/release/eza --icons /tmp/testdir"

Without checking nlinks (main branch):
Time (mean ± σ): 12.8 ms ± 0.5 ms [User: 3.7 ms, System: 8.7 ms]
Range (min … max): 12.1 ms … 13.7 ms 202 runs

With checking nlinks (empty-directory branch):
Time (mean ± σ): 5.1 ms ± 0.3 ms [User: 2.0 ms, System: 2.8 ms]
Range (min … max): 4.7 ms … 6.3 ms 398 runs

Edit - Purging filesystem cache between runs:

Command: hyperfine -w 10 -p "purge; target/release/eza --icons" "target/release/eza --icons /tmp/testdir"

Without checking nlinks (main branch):
Time (mean ± σ): 20.0 ms ± 4.7 ms [User: 4.1 ms, System: 11.5 ms]
Range (min … max): 17.2 ms … 32.9 ms 11 runs

With checking nlinks (empty-directory branch):
Time (mean ± σ): 10.1 ms ± 2.4 ms [User: 2.4 ms, System: 4.9 ms]
Range (min … max): 9.0 ms … 17.3 ms 11 runs

Signed-off-by: Christina Sørensen <christina@cafkafk.com>
@cafkafk
Copy link
Member

cafkafk commented Aug 29, 2023

@cfxegbert could you take a look at f0989ec, I'm running out the door right now, but I tried to document these additions, feel free to reword it completely.

@cfxegbert
Copy link
Contributor Author

@cfxegbert could you take a look at f0989ec, I'm running out the door right now, but I tried to document these additions, feel free to reword it completely.

I would only add that the link count on a directory on most unix filesystems is two plus the number of subdirectories.

@cfxegbert cfxegbert requested a review from cafkafk August 31, 2023 19:53
Copy link
Member

@cafkafk cafkafk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this as much as I could on x86_64 GNU/Linux, and it does seem to work. It's a cool performance improvement, thanks for your contribution!

@cafkafk cafkafk merged commit 36923df into eza-community:main Sep 1, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants