-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io: add scheduler #16633
io: add scheduler #16633
Conversation
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
This is done for consistency with other test helpers. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
3a19d2e
to
618ea33
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45102#018dbe0b-a897-4940-aefe-a041bf3a7fbc ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45104#018dbf38-b7db-410e-bccc-5ec8bd6cb5ff ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45173#018dc9a5-bdbd-432b-9643-f87be3bd89c5 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45267#018dd39c-9af5-4561-ba4d-8eedb9f4c26a |
Converting to draft while I fix a memory usage issue with a large test. |
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
e9b14de
to
1375e4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good! I found it a bit tough to dive into, but having stared for a while I think it mostly makes sense. I added some naming suggestions that I felt would've helped.
size_t waiters_{0}; | ||
seastar::semaphore nofiles_; | ||
intrusive_list<queue, &queue::cache_hook_> lru_; | ||
seastar::future<> monitor(queue*) noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mind moving this above the members? Not sure if others feel this way but I always find it way easier to navigate a header if the methods are above the members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no preference at all. fixed. i wonder if this should be added to our style-guide.md ?
src/v/io/scheduler.h
Outdated
* detailed description. | ||
*/ | ||
size_t waiters_{0}; | ||
seastar::semaphore nofiles_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/nit: is this "number of files/no. files" or "no files"? A comment describing what the semaphore protects would be appreciated. Or maybe name it available_file_units_
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it might be worth considering using ssx::semaphore
for improved debuggability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its number of open files. i've renamed it to be clearler.
switched to ssx::semaphore. forgot it exists!
src/v/io/scheduler.h
Outdated
* see scheduler::monitor implementation for more details. | ||
*/ | ||
seastar::future<> monitor_{seastar::make_ready_future<>()}; | ||
seastar::semaphore sem_{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add a comment describing what this semaphore is meant to be used for (and maybe any special considerations to be made when thinking about the nofile_ semaphore, if any)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thought: perhaps this should be named has_work_sem
or somesuch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed both comments
src/v/io/scheduler.h
Outdated
*/ | ||
seastar::future<> monitor_{seastar::make_ready_future<>()}; | ||
seastar::semaphore sem_{0}; | ||
seastar::semaphore_units<> units_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: at least jumping into this code for the first time, I think it'd be helpful to name this scheduler_units_
or nofiles_units_
or somesuch to set readers' expectations on what usage should look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/v/io/scheduler.cc
Outdated
co_return; | ||
} | ||
|
||
co_await queue->sem_.wait(std::max<size_t>(queue->sem_.current(), 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this. What's the significance of using a size-0 semaphore vs a condition variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment into the code to explain. basically a condition variable will drop wake-ups if there are no waiters. fix that with a predicate, but in this case the predicate isn't a simple flag or two to check. so we can use a semaphore to ensure that the loop will run again if its signaled while the worker is busy doing other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, thanks for clarifying. A while ago we found that a similar (maybe the same) concern had addressed in seastar recently.
Is this the scenario you're thinking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's interesting. Seems they addressed that 2 years ago and it hasn't yet solidified in my brain. Part of that is because I don't think pthread condition variable has this property. Anyway, that's good to know. I'll likely switch this back to condition variable then in a later PR.
src/v/io/scheduler.h
Outdated
* a queue to close in order to enforce the maximum open queue limit. | ||
*/ | ||
intrusive_list_hook sched_hook_; | ||
intrusive_list_hook cache_hook_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: again might just be a first-time-reader thing, but I think it'd be more easily followable naming this lru_hook_
or renaming lru_
to cache_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/v/io/scheduler.h
Outdated
*/ | ||
size_t waiters_{0}; | ||
seastar::semaphore nofiles_; | ||
intrusive_list<queue, &queue::cache_hook_> lru_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment suggestion: "LRU-ordered list of currently opened queues"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/* | ||
* this case is possible, if for example, `nofiles_` was | ||
* initialized with 0 units. | ||
*/ | ||
waiters_++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this: I suspect a subsequent change will allow changing the semaphore to be adjustable dynamically, so for now, it's odd to see. Assuming adjusting the semaphore will be the way we'd be able to unblock the get_units
call, shouldn't we also be decrementing waiters_
once we add ourselves to lru_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a comment about waiters being used to handle cases that arise when the semaphore units are dynamically adjusted.
shouldn't we also be decrementing waiters_ once we add ourselves to lru_?
I think that the net effect is that this is happening, but we can't easily have a worker decrement uncondtionally when it gets added to the lru list (even if it knows it contributed a +1) because its hard to know if another worker woekup and woke up some waiters (the cooperative wake up at the top is a while loop that wakes up as much as possible). not being careful would have us double count decrements.
i'm not convinced its 100% correct, but here is a simple example:
say we have open file limit as 0 and three queues show up. lru is empty, so waiters will bump to 3. then we deposit one unit into the semaphpore and a waiter wakes up. it'll add itself to the lru and loop back around and wake up waiters. the waiters are still 3, but the lru only has 1 entry, so so that limits the evictions to 1 (correctly, the currently running worker is the only open file and now itll self-close so another worker runs). waiters is decremented to 2 reflecting reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the example! Very helpful. I'd skipped over the fact that we can still proceed with opening without waiters_
being decremented, as long as open_file_limit_
is added to. I initially thought another monitor had to pop from the lru_
for us to proceed, which was confusing because lru_
here is empty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. i guess waiters has kind of a optimistic property in that it's incremented before the waiter waits, and decremented before the waiter releases.
src/v/io/tests/scheduler_test.cc
Outdated
for (auto& driver : drivers) { | ||
try { | ||
seastar::remove_file(driver->queue.path().string()).get(); | ||
} catch (...) { | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe stick this in a deferred block so it runs even if the test fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
*/ | ||
while (waiters_ > 0 && !lru_.empty()) { | ||
waiters_--; | ||
auto& queue = lru_.front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queue
local var shadows the parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, gross. good catch! at least they are different types. i wonder if that is why clangd wasn't flagging this in my ide.
queue->sem_.signal(); | ||
auto monitor = std::exchange( | ||
queue->monitor_, seastar::make_ready_future<>()); | ||
co_await std::move(monitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it feels like this could be easier to read if the gate is used to control the lifetime of the background fiber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean use a gate instead?
src/v/io/scheduler.cc
Outdated
lru_.pop_front(); | ||
queue.sem_.signal(); | ||
} | ||
units = co_await seastar::get_units(nofiles_, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I understand correctly that the invariant is that if the queue is in the lru_
list it should have units so if we are waking up the queue it's guaranteed that we will eventually acquire units on this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. but i think as you point out, a queue may still hold on to the units even if it isn't on the lru
list. this is a temporary state, and not being on the list is a signal to the queue monitor that when it runs it should release its units after stopping the io_queue and closing the file.
Adds a simple scheduler to the I/O subsystem. The purpose of the scheduler is to manage all of the active I/O queues (e.g. implementing global policies like preventing starvation or priority). However, this initial version of the scheduler only implements a policy that limits the total number of open file handles.
Backports Required
Release Notes