Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better backpressure #990

Merged
merged 7 commits into from
Oct 12, 2023
Merged

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Oct 6, 2023

This PR makes a few changes to backpressure:

  • Backpressure delays are only applied to writes. Every other guest operation actually waits for the operation to finish, which serves as built-in backpressure (because longer operations will stall the guest for longer)
  • The backpressure delay now requires holding a tokio::sync::Mutex. This fixes an issue where N tasks could submit writes simultaneously, which defeated the previous backpressure implementation
  • The delay is now computed as the max of two values:
    • The current implementation, which is based on DSW queue length
    • A new delay based on write bytes in flight
  • The latter prevents a host from queueing up too many large writes, which don't take much space in the queue but take a long time to execute – in some cases, long enough that the host will kick out the disk (disk times out and linux gives up on it if you send bunch of big writes and then ask to do a read #902)

Running fio tests in a VM with writes of various sizes, we see that

  • small writes are limited by queue length (the black line shows where queue length backpressure kicks in)
  • larger writes stabilize with fewer writes in the queue, due to the bytes-in-flight limitation
Screenshot 2023-10-06 at 1 29 56 PM

@mkeeter mkeeter mentioned this pull request Oct 6, 2023
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect and getting better every day.

upstairs/src/lib.rs Show resolved Hide resolved

/// When should queue-based backpressure start?
queue_start: f64,
/// Maximum queue-based delay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this queue based start consider IO types other than Write/WriteUnwritten when
deciding to apply backpressure?

I don't see backpressure checks in other IO types, but I'm wondering if there presence
in the work queue would trigger writes to slow down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch – you're correct that if the queue has a bunch of non-write jobs, the queue-based backpressure will take them into account when delaying writes (and backpressure doesn't delay anything else).

This is a little weird, but I'm inclined to leave it as-is; the fundamental goal here is to avoid killing the upstairs due to hitting MAX_ACTIVE_COUNT, which counts every job in the queue.

(I'm also not sure how non-write jobs would manage to clog up the queue!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also think it's fine to leave here, just wanted to be sure I understood that
any IO on the queue can impact backpressure.

@morlandi7 morlandi7 added this to the 3 milestone Oct 9, 2023
@faithanalog faithanalog self-requested a review October 12, 2023 00:37
Copy link
Contributor

@faithanalog faithanalog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried very hard to get linux to give up on a disk and call it dead with this PR but wasn't able to. I think we should merge it. We should probably tune the backpressure more in a future PR. right now on an unencrypted dataset I can fill about 2 gigs of buffer, which is a fair chunk of data and also takes about 15 seconds to drain. You won't see buffers get that big with an encrypted dataset right now due to other encryption overhead. At any rate, exact final values we land on are up for debate and may change as perf changes, but the main thing is this PR fixes the hard-fail state of #902 so we should get it in

@mkeeter mkeeter merged commit 37c89cc into oxidecomputer:main Oct 12, 2023
18 checks passed
@mkeeter mkeeter deleted the better-backpressure branch October 12, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

disk times out and linux gives up on it if you send bunch of big writes and then ask to do a read
4 participants