Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the pattern before anything else, since it doesn't require metadata #434

Merged
merged 1 commit into from
May 8, 2019

Conversation

tavianator
Copy link
Collaborator

@tavianator tavianator commented Apr 26, 2019

This should partially address #432 by decreasing the number of stat() calls:

$ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 15.71    8.831948           7   1192279     46059 stat
$ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

$ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 19.01    0.946500           5    161649           newfstatat
$ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

$ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
Benchmark #1: ./fd-before '\.h$' /usr -j1 -S +1k
  Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
  Range (min … max):    4.327 s …  4.815 s    10 runs

Benchmark #2: ./fd-after '\.h$' /usr -j1 -S +1k
  Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
  Range (min … max):    2.568 s …  2.723 s    10 runs

Summary
  './fd-after '\.h$' /usr -j1 -S +1k' ran
    1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

$ hyperfine ./fd-{before,after}" '\.h$' /usr -j1"
Benchmark #1: ./fd-before '\.h$' /usr -j1
  Time (mean ± σ):      2.382 s ±  0.038 s    [User: 1.221 s, System: 1.286 s]
  Range (min … max):    2.325 s …  2.433 s    10 runs

Benchmark #2: ./fd-after '\.h$' /usr -j1
  Time (mean ± σ):      2.362 s ±  0.034 s    [User: 1.193 s, System: 1.294 s]
  Range (min … max):    2.307 s …  2.422 s    10 runs

Summary
  './fd-after '\.h$' /usr -j1' ran
    1.01 ± 0.02 times faster than './fd-before '\.h$' /usr -j1'

@sharkdp
Copy link
Owner

sharkdp commented Apr 28, 2019

Thank you very much for your contribution and for the detailed benchmarks. This looks great.

I want to take a more detailed look before I merge this, but this will probably take me a few days.

A small thing I noticed: You are comparing find|bfs /usr -name '*.h' -size +1k with the following fd call:

fd '.h$' -j1 /usr -S +1k

It will probably not make a huge difference, but the . will actually match any character instead of a literal . character. So this will also find .sh files, for example. In addition, fd uses smart-case by default, so it will also match against a capital H as last character. To make the comparison fair, I think you'd have to use the --case-sensitive/-s option and escape the .:

fd --case-sensitive '\.h$' -j1 /usr -S +1k

Even easier, if your are just searching for a file extension, we can use the --extension/-e option:

fd . -e h -j1 /usr -S +1k

@tavianator
Copy link
Collaborator Author

All good points. I updated the commit message with some improved benchmarks (on a different computer), and also moved the extension check up with the pattern check.

…data

This should partially address sharkdp#432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     15.71    8.831948           7   1192279     46059 stat
    $ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

    $ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     19.01    0.946500           5    161649           newfstatat
    $ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
      Range (min … max):    4.327 s …  4.815 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
      Range (min … max):    2.568 s …  2.723 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1 -S +1k' ran
        1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1
      Time (mean ± σ):      2.382 s ±  0.038 s    [User: 1.221 s, System: 1.286 s]
      Range (min … max):    2.325 s …  2.433 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1
      Time (mean ± σ):      2.362 s ±  0.034 s    [User: 1.193 s, System: 1.294 s]
      Range (min … max):    2.307 s …  2.422 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1' ran
        1.01 ± 0.02 times faster than './fd-before '\.h$' /usr -j1'
@sharkdp sharkdp merged commit 5cbd840 into sharkdp:master May 8, 2019
@sharkdp
Copy link
Owner

sharkdp commented May 8, 2019

Thank you very much!

@sharkdp sharkdp mentioned this pull request May 31, 2019
@tavianator tavianator deleted the check-name-first branch June 8, 2020 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants