-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better backpressure #990
Better backpressure #990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect and getting better every day.
|
||
/// When should queue-based backpressure start? | ||
queue_start: f64, | ||
/// Maximum queue-based delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this queue based start consider IO types other than Write
/WriteUnwritten
when
deciding to apply backpressure?
I don't see backpressure checks in other IO types, but I'm wondering if there presence
in the work queue would trigger writes to slow down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch – you're correct that if the queue has a bunch of non-write jobs, the queue-based backpressure will take them into account when delaying writes (and backpressure doesn't delay anything else).
This is a little weird, but I'm inclined to leave it as-is; the fundamental goal here is to avoid killing the upstairs due to hitting MAX_ACTIVE_COUNT
, which counts every job in the queue.
(I'm also not sure how non-write jobs would manage to clog up the queue!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also think it's fine to leave here, just wanted to be sure I understood that
any IO on the queue can impact backpressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried very hard to get linux to give up on a disk and call it dead with this PR but wasn't able to. I think we should merge it. We should probably tune the backpressure more in a future PR. right now on an unencrypted dataset I can fill about 2 gigs of buffer, which is a fair chunk of data and also takes about 15 seconds to drain. You won't see buffers get that big with an encrypted dataset right now due to other encryption overhead. At any rate, exact final values we land on are up for debate and may change as perf changes, but the main thing is this PR fixes the hard-fail state of #902 so we should get it in
4c8dad7
to
f52d3c3
Compare
This PR makes a few changes to backpressure:
tokio::sync::Mutex
. This fixes an issue where N tasks could submit writes simultaneously, which defeated the previous backpressure implementationRunning
fio
tests in a VM with writes of various sizes, we see that