Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walk: Encapsulate the buffering behavior in a struct #895

Merged
merged 2 commits into from
Dec 5, 2021

Conversation

tavianator
Copy link
Collaborator

The new ReceiverBuffer struct allows us to factor out the receiver
implementation into a number of helper methods. The new implementation
uses rx.{recv,recv_timeout} instead of a for loop, which enables us to
switch to streaming mode at the right time without waiting for more
results.

Fixes #868.

The new ReceiverBuffer struct allows us to factor out the receiver
implementation into a number of helper methods.  The new implementation
uses rx.{recv,recv_timeout} instead of a for loop, which enables us to
switch to streaming mode at the right time without waiting for more
results.

Fixes sharkdp#868.
src/walk.rs Outdated Show resolved Hide resolved
}

/// Wait for a result or state change.
fn poll(&mut self) -> Result<(), ExitCode> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird that this function is poll but it isn't actually part of a Future. I don't have a better name though. It's just a little unfortunate that it happens to share a name with the most important function of a Future.

}

/// Wait for a result or state change.
fn poll(&mut self) -> Result<(), ExitCode> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn poll(&mut self) -> Result<(), ExitCode> {
fn poll(&mut self) -> ControlFlow<ExitCode> {

Would probably be a better return type here, but that would require increasing the MSRV to 1.55.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize ControlFlow was stabilized. I'll leave it up to @sharkdp whether that MSRV bump would be acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth bumping MSRV just for this, but may be worth changing when we do bump it.

@tmccombs
Copy link
Collaborator

Have you done any benchmarks on this?

@tavianator
Copy link
Collaborator Author

@tmccombs Benchmarks seem totally neutral

This lets us avoid rust-lang/rust#39364, which
could potentially be seen now that we're using recv_timeout().
@tavianator tavianator merged commit 7fe4bfa into sharkdp:master Dec 5, 2021
@tavianator tavianator deleted the receiver-buffer branch December 5, 2021 16:56
@sharkdp
Copy link
Owner

sharkdp commented Dec 28, 2021

Thank you!

I'm always usually fine with bumping MSRV, because fd is not a library. And it feels like most people use rustup with a recent version of Rust anyways (instead of distribution-package).

@sharkdp
Copy link
Owner

sharkdp commented Dec 28, 2021

This PR (specifically a4bb734) introduces a pretty severe performance regression (20-30%) for me. For example:

Simple pattern

Benchmark 1: ./fd-8.3.0 '.*[0-9]\.jpg$' '/home/shark/Informatik/'
  Time (mean ± σ):     179.6 ms ±   1.7 ms    [User: 685.0 ms, System: 653.2 ms]
  Range (min … max):   176.5 ms … 181.6 ms    16 runs
 
Benchmark 2: ./fd-a4bb734482c7b '.*[0-9]\.jpg$' '/home/shark/Informatik/'
  Time (mean ± σ):     229.4 ms ±  14.8 ms    [User: 788.6 ms, System: 930.8 ms]
  Range (min … max):   206.9 ms … 253.0 ms    11 runs
 
Summary
  './fd-8.3.0 '.*[0-9]\.jpg$' '/home/shark/Informatik/'' ran
    1.28 ± 0.08 times faster than './fd-a4bb734482c7b '.*[0-9]\.jpg$' '/home/shark/Informatik/''

I found this with git bisect, i.e. none of the other commits before showed that kind of slowdown. I would appreciate if we could roll this back.

@tmccombs: I think you have also tried crossbeam-channel in the past? I vaguely remember that I tried it as well at some point and also saw a performance regression.

@tavianator
Copy link
Collaborator Author

@sharkdp Interesting, I did re-do the benchmarks after switching to crossbeam and found there was still no measurable difference:

Benchmark 1: ./fd-std . ~/code/linux
  Time (mean ± σ):     126.4 ms ±   1.9 ms    [User: 913.5 ms, System: 3832.9 ms]
  Range (min … max):   121.4 ms … 129.4 ms    23 runs
 
Benchmark 2: ./fd-crossbeam . ~/code/linux
  Time (mean ± σ):     126.3 ms ±   1.5 ms    [User: 971.0 ms, System: 3819.8 ms]
  Range (min … max):   123.7 ms … 128.8 ms    23 runs
 
Summary
  './fd-crossbeam . ~/code/linux' ran
    1.00 ± 0.02 times faster than './fd-std . ~/code/linux'

The pattern doesn't seem to affect it, whether there's many matches or few.

I wonder if there is a platform-specific thing going on, either OS or hardware. What platform(s) do you see the regression on? My tests were on a Threadripper 3960X running Linux.

@tavianator
Copy link
Collaborator Author

I also see no perf regression on an i7-6700HQ. So it's not that crossbeam is only fast on new hardware :)

@tavianator
Copy link
Collaborator Author

Anyway I can prepare a revert until we understand what's going on so it doesn't block a new release

tavianator added a commit to tavianator/fd that referenced this pull request Dec 28, 2021
@sharkdp [noticed][1] a quite severe performance degredation due to this
change.  Switch back to std::sync::mpsc until we can fix the performance
regression.

This reverts commit a4bb734.

[1]: sharkdp#895 (comment)
@tmccombs
Copy link
Collaborator

I never did benchmarks for crossbeam. I'm actually a little surprised if that is it, since crossbeam is supposed to be faster than std mpsc (although that might depend on how it is used)

@sharkdp
Copy link
Owner

sharkdp commented Dec 30, 2021

I wonder if there is a platform-specific thing going on, either OS or hardware. What platform(s) do you see the regression on? My tests were on a Threadripper 3960X running Linux.

Hmm. Strange. Im also on Linux, Intel Core i7-4700MQ. Maybe it depends on the target folder instead? I will try to do some more comprehensive benchmarks.

Thank you for opening the revert PR.

@tavianator
Copy link
Collaborator Author

@sharkdp I'd be curious to compare cargo flamegraph output between our machines

@sharkdp
Copy link
Owner

sharkdp commented Jan 6, 2022

Ok. Let's continue the discussion in #933.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fd waits for the whole traversal if the only matches arrive within max_buffer_time
3 participants