Better backpressure #990

mkeeter · 2023-10-06T17:44:25Z

This PR makes a few changes to backpressure:

Backpressure delays are only applied to writes. Every other guest operation actually waits for the operation to finish, which serves as built-in backpressure (because longer operations will stall the guest for longer)
The backpressure delay now requires holding a tokio::sync::Mutex. This fixes an issue where N tasks could submit writes simultaneously, which defeated the previous backpressure implementation
The delay is now computed as the max of two values:
- The current implementation, which is based on DSW queue length
- A new delay based on write bytes in flight
The latter prevents a host from queueing up too many large writes, which don't take much space in the queue but take a long time to execute – in some cases, long enough that the host will kick out the disk (disk times out and linux gives up on it if you send bunch of big writes and then ask to do a read #902)

Running fio tests in a VM with writes of various sizes, we see that

small writes are limited by queue length (the black line shows where queue length backpressure kicks in)
larger writes stabilize with fewer writes in the queue, due to the bytes-in-flight limitation

leftwo

Perfect and getting better every day.

upstairs/src/lib.rs

leftwo · 2023-10-06T18:33:45Z

upstairs/src/lib.rs

+
+    /// When should queue-based backpressure start?
+    queue_start: f64,
+    /// Maximum queue-based delay


Will this queue based start consider IO types other than Write/WriteUnwritten when
deciding to apply backpressure?

I don't see backpressure checks in other IO types, but I'm wondering if there presence
in the work queue would trigger writes to slow down?

Good catch – you're correct that if the queue has a bunch of non-write jobs, the queue-based backpressure will take them into account when delaying writes (and backpressure doesn't delay anything else).

This is a little weird, but I'm inclined to leave it as-is; the fundamental goal here is to avoid killing the upstairs due to hitting MAX_ACTIVE_COUNT, which counts every job in the queue.

(I'm also not sure how non-write jobs would manage to clog up the queue!)

Yeah, I also think it's fine to leave here, just wanted to be sure I understood that
any IO on the queue can impact backpressure.

faithanalog

I tried very hard to get linux to give up on a disk and call it dead with this PR but wasn't able to. I think we should merge it. We should probably tune the backpressure more in a future PR. right now on an unencrypted dataset I can fill about 2 gigs of buffer, which is a fair chunk of data and also takes about 15 seconds to drain. You won't see buffers get that big with an encrypted dataset right now due to other encryption overhead. At any rate, exact final values we land on are up for debate and may change as perf changes, but the main thing is this PR fixes the hard-fail state of #902 so we should get it in

mkeeter mentioned this pull request Oct 6, 2023

Add backpressure lock #988

Closed

leftwo approved these changes Oct 6, 2023

View reviewed changes

morlandi7 added this to the 3 milestone Oct 9, 2023

leftwo assigned mkeeter Oct 9, 2023

faithanalog self-requested a review October 12, 2023 00:37

faithanalog approved these changes Oct 12, 2023

View reviewed changes

faithanalog linked an issue Oct 12, 2023 that may be closed by this pull request

disk times out and linux gives up on it if you send bunch of big writes and then ask to do a read #902

Closed

mkeeter added 7 commits October 11, 2023 21:20

Add backpressure lock

974c884

Implement backpressure in terms of bytes in transit

3318ec8

More tweaks

608a7b5

why isn't this working?

5e52444

lol I'm dumb

7100cb2

epicycles

01e056c

Fix broken doctest

f52d3c3

mkeeter force-pushed the better-backpressure branch from 4c8dad7 to f52d3c3 Compare October 12, 2023 04:21

mkeeter merged commit 37c89cc into oxidecomputer:main Oct 12, 2023
18 checks passed

mkeeter deleted the better-backpressure branch October 12, 2023 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better backpressure #990

Better backpressure #990

mkeeter commented Oct 6, 2023

leftwo left a comment

leftwo Oct 6, 2023

mkeeter Oct 6, 2023

leftwo Oct 6, 2023

faithanalog left a comment

Better backpressure #990

Better backpressure #990

Conversation

mkeeter commented Oct 6, 2023

leftwo left a comment

Choose a reason for hiding this comment

leftwo Oct 6, 2023

Choose a reason for hiding this comment

mkeeter Oct 6, 2023

Choose a reason for hiding this comment

leftwo Oct 6, 2023

Choose a reason for hiding this comment

faithanalog left a comment

Choose a reason for hiding this comment