Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland: reduce scheduler overhead #56

Merged
merged 2 commits into from
Dec 29, 2023

Conversation

arighi
Copy link
Collaborator

@arighi arighi commented Dec 29, 2023

  • always consider the CPU where the scheduler is running as idle
  • fix a bug that was causing the user-space scheduler to spin and improve the logic to activate the user-space scheduler, preventing unnecessary activations

These changes make scx_rustland more reliable and they strongly reduce the scheduler overhead.

Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages are titled scx_userland. Can you please update them?

@@ -489,7 +505,7 @@ void BPF_STRUCT_OPS(rustland_update_idle, s32 cpu, bool idle)
* Moreover, kick the CPU to make it immediately ready to accept
* dispatched tasks.
*/
if (__sync_fetch_and_add(&nr_queued, 0)) {
if (nr_queued || nr_scheduled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth explaining how this is interlocked with userland component and a situation where userland component is not invoked even while there are tasks to process can't happen. nr_queued is basically used as a boolean here indicating "something happened since userland looked at it, so better call them at least once". And then while the userland is running it updates its internal state through nr_scheduled. As long as update_idle is called after each userland run, this should be safe, right?

Copy link
Collaborator Author

@arighi arighi Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htejun comment updated adding more details. Let me know if it seems clear enough. Thanks!

Edit: ...and changed scx_userland -> scx_rustland in the commit messages, thanks for noticing!

@arighi arighi force-pushed the scx-rustland-reduce-scheduler-overhead branch from 249b557 to e79b5fa Compare December 29, 2023 20:09
Considering the CPU where the user-space scheduler is running as busy
doesn't really provide any benefit, since the user-space scheduler is
constantly dispatching an amount of tasks equal to the amount of idle
CPUs and then yields (therefore its own CPU should be considered idle).

Considering the CPU where the user-space scheduler is running as busy
doesn't provide any benefit, as the scheduler consistently dispatches
tasks equal to the number of idle CPUs and then yields (therefore its
own CPU should be considered idle).

This also allows to reduce the overall user-space scheduler CPU
utilization, especially when the system is mostly idle, without
introducing any measurable performance regression.

Measuring the average CPU utilization of a (mostly) idle system over a
time period of 60 sec:

 - wihout this patch: 5.41% avg cpu util
 - with this patch:   2.26% avg cpu util

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We want to activate the user-space scheduler only when there are pending
tasks that require scheduling actions.

To do so we keep track of the queued tasks via nr_queued, that is
incremented in .enqueue() when a task is sent to the user-space
scheduler and decremented in .dispatch() when a task is dispatched.

However, we may trigger an unbalance if the same pid is sent to the
scheduler multiple times (because the scheduler store all the tasks by
their unique pid).

When this happens nr_queued is never decremented back to 0, leading the
user-space scheduler to constantly spin, even if there's no activity to
do.

To prevent this from happening split nr_queued into nr_queued and
nr_scheduled. The former will be updated by the BPF component every time
that a task is sent to the scheduler and it's up to the user-space
scheduler to reset the counter when the queue is fully dreained. The
latter is maintained by the user-space scheduler and represents the
amount of tasks that are still processed by the scheduler and are
waiting to be dispatched.

The sum of nr_queued + nr_scheduled will be called nr_waiting and we can
rely on this metric to determine if the user-space scheduler has some
pending work to do or not.

This change makes rust_rustland more reliable and it strongly reduces
the CPU usage of the user-space scheduler by eliminating a lot of
unnecessary activations.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
@arighi arighi force-pushed the scx-rustland-reduce-scheduler-overhead branch from e79b5fa to e90bc92 Compare December 29, 2023 20:15
@htejun htejun merged commit 474a149 into sched-ext:main Dec 29, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants