-
-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Without pattern 'find' ~8x faster than 'fd' #191
Comments
OS: Debian testing Installed fd and tried it on my camera folder. The results are here:
The time taken and cpu usage are very high contrary to the expectation. |
Interesting, thank you for reporting this. How many entries are found (without the search pattern)? Also, how many cores does your machine have? |
Directory size: 27G |
Machine details:
Directory size: 261 GiB
Even the numbers reported are different. |
@dtcyganok Thanks. Are you using the fd version from the official Arch repositories? Or did you build it yourself?
Please make sure to run
|
I am using the |
No change :-) This is just FYI. Hope you will be coming out with a solution for this problem soon. |
So fd is slower than find when you are not using any pattern? Anyway to avoid that? |
I might have an idea what's causing this. @dtcyganok Could you please run your initial benchmark and add |
This should be fixed now. Would be great if someone could confirm this. |
That fix looks like a trickery. Is the buffer size the main issue of the slowness? Are you sure the buffering mode does not end too early before the timeout is reached in most cases (with the default buffer time)? |
For the cases mentioned in this ticket -- yes. If there is no search pattern and there is a huge search tree it means that results will accumulate fast (100 ms is enough to gather ~ 100,000 entries). In my test case (no search pattern, ~ 130,000 results), adding the buffer size limitation speeds up the search by a factor of 10. |
@sharkdp Hmm, I think |
That could very well be the cause, thanks! Another potential cause could be the dynamic re-allocation for the buffer. |
If 1000 is a good limit for buffer size for all cases, then |
And another potential cause: sorting |
I don't think so. If there are less than 1000 search results,
That's not how it works. If we reach the time or size limit we just start streaming all results to the console (unsorted). |
Ah, my logic is in a tangle in the previous comment. Searching for 1000 results takes only ~20ms (less than 100ms) on my machine. Isn't the buffer time limit used for displaying sorted results when time allows? |
Correct. If the full search finishes faster than the I think with the new mechanism (N_max =
|
Yeah, for fast search (~1000 results: < 20ms for gathering&sorting results, and < 300ms total run time), most of the time is spent on printing, not gathering. The buffer time is neglectable, I think. UPDATE: Ah, the slow search means the matched files are not enough, not just the I/O is too slow. So the time limit is still needed. |
Tried the new version. Looks like there is no change. The results:
|
@alaymari Thank you for taking a look at this again. A couple of comments:
|
OK. Here is what I did:
I will try later on a bigger set. Right now:
|
I ran the bench tool in git [lfs] folder for a relatively big repo. It contains many 3d file assets. I can of course give some more generalized details upon request. NOTE: This was run on a macOS 10.13.2 system, not sure if this find differs than on other machines.
And another set done here with
This final set is a 1 to 1 comparison.
|
@alaymari Thanks! This issue is really about running @partounian Thanks! A comparison like this is not really fair because Here is how I run this particular type of benchmark: https://gist.github.com/sharkdp/f2dda4ea0af1563a3dbdae4e14d9496a |
@sharkdp the last example is actually what you've asked for :) |
Oh, thanks! I missed that. Good to know that it works for you. |
Sorry for the late reply. Would this help?
I am not seeing any difference with or without the pattern for |
@alaymari Could you please install bench and run this script? EDIT: Here is my run of the script
|
Installed haskell-stack.
On trying to setup, got this error. I know the problem does not belong here, still am posting the error message. I am completely clueless about haskell :-(
So, I am stuck there trying to install bench. |
Sorry cannot help as I am spoiled by homebrew. Maybe linuxbrew could help you. |
if you upgrade your stack version to the latest, you can use the latest resolver. |
If you still want to run the benchmark, you can also use my new tool hyperfine (in case that is easier to install). It features a > export base_path=...
> hyperfine --warmup 5 'fd -HI "" $base_path' 'find $base_path' |
OS: Linux 4.14.3-1-ARCH
FS: ext4
fd: 6.0.0
Simple test:
Also
fd
with pattern faster ~8x thanfd
without pattern:The text was updated successfully, but these errors were encountered: