Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fd without pattern is much slower than find #304

Closed
vandenoever opened this issue Jun 14, 2018 · 15 comments
Closed

fd without pattern is much slower than find #304

vandenoever opened this issue Jun 14, 2018 · 15 comments

Comments

@vandenoever
Copy link

vandenoever commented Jun 14, 2018

I often use find to list files. Doing this in fd is much slower. Here are some results for listing all files in my home directory with warm and cold cache. The directory is on btrfs on Linux 4.15.15. find is about twice as fast in both cases and uses about 10% of the cpu cycles that fd uses.

$ hyperfine --warmup 3 'fd -HI -t f . ~' 'find ~ -type f'
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):      5.668 s ±  0.151 s    [User: 8.523 s, System: 12.736 s]
 
  Range (min … max):    5.282 s …  5.788 s
 
Benchmark #2: find ~ -type f

  Time (mean ± σ):      2.640 s ±  0.222 s    [User: 779.7 ms, System: 1837.9 ms]
 
  Range (min … max):    2.407 s …  3.060 s
 
Summary

  'find ~ -type f' ran
    2.15x faster than 'fd -HI -t f . ~'

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . ~' 'find ~ -type f' 
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):     28.027 s ±  1.714 s    [User: 10.159 s, System: 34.685 s]
 
  Range (min … max):   24.161 s … 29.585 s
 
Benchmark #2: find ~ -type f

  Time (mean ± σ):     15.151 s ±  0.482 s    [User: 1.386 s, System: 4.884 s]
 
  Range (min … max):   14.414 s … 15.703 s
 
Summary

  'find ~ -type f' ran
    1.85x faster than 'fd -HI -t f . ~'

Here is a screenshot of the CPU and IO during the run with cold cache.

par

@BurntSushi
Copy link

@vandenoever Is there any way you can reproduce these results on a public corpus?

@vandenoever
Copy link
Author

I can try if you've a suggestion for a corpus.

Some statistics on my files: 1.4M files in 363Gbytes, so on average the files are 264kbytes. See histogram.

histo

@BurntSushi
Copy link

I'd try the Linux kernel. If that fails to show a difference, then I'd try the chromium source tree.

@vandenoever
Copy link
Author

Here's the measurements for linux-4.17 source code. The effect with cold cache is less pronounced but still large.

warm

$ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     156.2 ms ±   3.0 ms    [User: 270.4 ms, System: 298.3 ms]
 
  Range (min … max):   151.3 ms … 164.7 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     353.9 ms ±  10.5 ms    [User: 258.3 ms, System: 281.8 ms]
 
  Range (min … max):   338.9 ms … 370.6 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      76.7 ms ±   1.3 ms    [User: 24.0 ms, System: 52.2 ms]
 
  Range (min … max):    75.3 ms …  80.7 ms
 
Summary

  'find . -type f' ran
    2.04x faster than 'fd -HI -t f . .'
    4.61x faster than 'fd -HI -j1 -t f . .'

cold

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     987.8 ms ±  15.8 ms    [User: 394.4 ms, System: 1279.4 ms]
 
  Range (min … max):   962.6 ms … 1007.4 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):      1.788 s ±  0.016 s    [User: 413.6 ms, System: 920.6 ms]
 
  Range (min … max):    1.758 s …  1.806 s
 
Benchmark #3: find . -type f

  Time (mean ± σ):     736.7 ms ±  11.2 ms    [User: 49.0 ms, System: 174.7 ms]
 
  Range (min … max):   725.2 ms … 756.9 ms
 
Summary

  'find . -type f' ran
    1.34x faster than 'fd -HI -t f . .'
    2.43x faster than 'fd -HI -j1 -t f . .'

@sharkdp
Copy link
Owner

sharkdp commented Jun 14, 2018

Thank you very much for reporting this. Which version of fd are you using? There have been some performance improvements regarding "-HI searches" in 6.2 (see #191). Also, I have reported a performance regression in the ignore crate (see BurntSushi/ripgrep#835), which definitely affects fd in "-HI searches". The regression has been fixed in the meantime, however, the fix has not been released, so I was not able to integrate it into fd.

I can not reproduce these results on my home directory:

▶ fd --version
fd 7.0.0

▶ hyperfine --warmup 3 'fd -HI -t f . ~' 'find . ~ -type f' -i
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):      1.395 s ±  0.045 s    [User: 4.666 s, System: 5.592 s]
 
  Range (min … max):    1.346 s …  1.469 s
 
Benchmark #2: find . ~ -type f

  Time (mean ± σ):      2.095 s ±  0.055 s    [User: 613.8 ms, System: 1455.1 ms]
 
  Range (min … max):    2.030 s …  2.187 s
 
  Warning: Ignoring non-zero exit code.
 
Summary

  'fd -HI -t f . ~' ran
    1.50x faster than 'find . ~ -type f'

▶ fd -HI -t f . ~ | wc -l
849174

Will try the Linux Kernel soon. Did you clone https://github.com/torvalds/linux or just download the 4.17 source code?

@vandenoever
Copy link
Author

vandenoever commented Jun 14, 2018

I'm running 7.0.0. I got 4.17 as tar.xz from kernel.org. What's your average file size?

Perhaps it's related to number of files per directory. My home dir has 1.4M files in 150k directories so an average of 10 files per directory.

Why does your benchmark command have a non-zero exit code?

@vandenoever
Copy link
Author

vandenoever commented Jun 14, 2018

@sharkdp: is fd also faster with -j1? Even though it's faster the CPU cost is 5x higher (10.3s vs 2.1s).

@sharkdp
Copy link
Owner

sharkdp commented Jun 15, 2018

Why does your benchmark command have a non-zero exit code?

This can be ignored. I have a single test-folder (for fd) which is read-only and owned by root. This is why find returns a non-zero exit code. It does not affect the benchmark (fd can't enter that directory as well - and it's empty anyways).

Perhaps it's related to number of files per directory. My home dir has 1.4M files in 150k directories so an average of 10 files per directory.

Mine is 920k files in 180k folders (5 files per directory). Why do you think it could be related to the ratio files/directories (I'm not saying it isn't)?

I got 4.17 as tar.xz from kernel.org.

I ran the benchmarks for the same dataset. While the effect is less dramatic, I can reproduce your results for a warm cache:

▶ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):      86.5 ms ±   7.2 ms    [User: 293.6 ms, System: 235.9 ms]
 
  Range (min … max):    73.7 ms …  96.8 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     285.3 ms ±   3.5 ms    [User: 191.8 ms, System: 252.3 ms]
 
  Range (min … max):   280.4 ms … 291.7 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      68.9 ms ±   1.0 ms    [User: 25.4 ms, System: 43.0 ms]
 
  Range (min … max):    67.5 ms …  71.3 ms
 
Summary

  'find . -type f' ran
    1.25x faster than 'fd -HI -t f . .'
    4.14x faster than 'fd -HI -j1 -t f . .'

However, for a cold cache, I get the following:

▶ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     254.3 ms ±   6.7 ms    [User: 298.8 ms, System: 612.6 ms]
 
  Range (min … max):   245.6 ms … 268.7 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):      1.220 s ±  0.017 s    [User: 375.6 ms, System: 587.6 ms]
 
  Range (min … max):    1.186 s …  1.245 s
 
Benchmark #3: find . -type f

  Time (mean ± σ):     638.9 ms ±  26.8 ms    [User: 54.8 ms, System: 169.1 ms]
 
  Range (min … max):   604.4 ms … 684.7 ms
 
Summary

  'fd -HI -t f . .' ran
    2.51x faster than 'find . -type f'
    4.80x faster than 'fd -HI -j1 -t f . .'

Interestingly, the Linux kernel has a files/directories ratio of about 14, so there might very well be a correlation.

fd is really designed for a multi-core setup. I believe it will always have a disadvantage over truly single-thread programs in a -j 1 setting because it actually still spawns a separate thread for searching which then sends results to a receiver-thread that outputs all results. Also, personally, I don't really care about the total amount of CPU cycles, but rather on the overall execution time.

Still, this seems to be a case where find is definitely faster than fd. I'm happy to investigate this further, but I will be away from home for two weeks, so this might take some time.

@vandenoever
Copy link
Author

There are two actions going on reading a directory and reading file metadata. fd seems to be slower when there's more files per directory. That might mean that there's something expensive going on when reading those directories, e.g. allocating a larger Vec to hold the files.

I'm not in a hurry to have this fixed. It was the first thing I noticed when I tried out fd.

Since fd uses more CPU, find is probably faster more often on slower machine, even if they're multicore.

Perhaps using a Mutex to coordinate the output instead of using channels is faster, because you can then perhaps reuse buffers and avoid allocations.

@sharkdp
Copy link
Owner

sharkdp commented Jun 15, 2018

Thank you for the useful tips!

I think the main issue might be the actual printing of the results in output::print_entry. Given a search pattern that limits the amount of output lines (e.g. ^usbnet with 2 results), fd is suddenly much faster:

▶ hyperfine --warmup 5 'fd -HI' 'fd -HI ^usbnet'  
Benchmark #1: fd -HI

  Time (mean ± σ):      72.4 ms ±   3.8 ms    [User: 243.2 ms, System: 83.6 ms]
 
  Range (min … max):    65.1 ms …  81.0 ms
 
Benchmark #2: fd -HI ^usbnet

  Time (mean ± σ):      28.1 ms ±   1.5 ms    [User: 124.8 ms, System: 67.0 ms]
 
  Range (min … max):    27.0 ms …  37.7 ms

... while find is slightly slower with a search pattern:

Benchmark #1: find .

  Time (mean ± σ):      67.8 ms ±   1.2 ms    [User: 24.2 ms, System: 43.1 ms]
 
  Range (min … max):    66.4 ms …  70.8 ms
 
Benchmark #2: find -iname "usbnet*"

  Time (mean ± σ):      79.2 ms ±   1.5 ms    [User: 33.5 ms, System: 45.2 ms]
 
  Range (min … max):    77.1 ms …  83.0 ms

I'm pretty sure we can find some ways to optimize output::print_entry. In the meantime, I hope that most of your searches actually have just a few results. After all, listing all files in a huge directory is probably not the most interesting use case 😄

Edit: another thing I have noticed in the past (although this will not affect the benchmarks above): If you really have a lot of search results, the output of fd is slowed down with enabled terminal colors. This due to (1) fd having to do additional work, because it splits the path into multiple colored parts and (2) because the terminal emulator has more work to do. Try comparing fd and fd --color=never in a huge directory. If you want to measure terminal output speed as well, you can use hyperfines new --show-output option.

@mqudsi
Copy link
Contributor

mqudsi commented Jun 15, 2018

I know we've ruled it out as the bottleneck, but to share a bit of greybeard folklore: back in my sysadmin days (and this was before microbenchmarking was the norm), I was always told to use ^ instead of . as supposedly pcre (or whatever lib it was that was king at the time) matched the former a tiny bit faster.

@BurntSushi
Copy link

Glancing at the printing functions, there is definitely some room to optimize I think. For my own use cases, I found that most of the standard path functions in std::path were noticeable on a profile, and dropping down into raw bytes (on Unix at least) made a difference. Generally speaking, I only ever use the standard path functions (and to_string_lossy) on Windows.

@sharkdp
Copy link
Owner

sharkdp commented Jun 16, 2018

@BurntSushi Exactly, that was what I was thinking about. As a bonus, this would also solve #295.

@sharkdp
Copy link
Owner

sharkdp commented Aug 19, 2018

I found a very simple optimization in 50a2bab which (possibly together with some other improvements) brings the multi-core version of fd-7.1.0 up to the same speed as find in the Linux-kernel benchmark:

▶ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):      69.1 ms ±   2.9 ms    [User: 226.5 ms, System: 87.7 ms]
 
  Range (min … max):    61.5 ms …  76.9 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     102.4 ms ±   4.3 ms    [User: 93.2 ms, System: 59.5 ms]
 
  Range (min … max):    98.9 ms … 115.6 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      68.9 ms ±   1.3 ms    [User: 25.8 ms, System: 42.7 ms]
 
  Range (min … max):    67.3 ms …  72.7 ms
 
Summary

  'find . -type f' ran
    1.00x faster than 'fd -HI -t f . .'
    1.49x faster than 'fd -HI -j1 -t f . .'

I still need to look at optimizations in output::print_entry, though.

@sharkdp
Copy link
Owner

sharkdp commented Feb 13, 2019

I'm going to close this for now. The current version of fd is still on par with find on this benchmark and much faster if the user is actually searching for something (and not just listing all of the 65,598 directory entries).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants