Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland: clarify and improve BPF / userspace interlocking #48

Merged
merged 4 commits into from
Dec 27, 2023

Conversation

arighi
Copy link
Collaborator

@arighi arighi commented Dec 27, 2023

A set of interlocking improvements for scx_rustland:

  • clarify interlocking between the BPF dispatcher and the user-space scheduler
  • introduce get/set_cpu_owner() primitives as a baseline to implement a better idle tracking in the future
  • move user-space scheduler activation from .enqueue() to .stopping() (and activate only when needed) to reduce unnecessary user-space activations
  • always bypass user-space scheduling for kthreads

I have also explored the possibility to move the user-space scheduler activation from .stopping() to .update_idle(), but the scheduler performs quite poorly with this change: it seems that CPUs are not fully used even in presence of CPU intensive workloads, as if the user-space scheduler isn't feeding the dispatcher fast enough. My guess is that activating the user-space scheduler in .update_idle() is a bit too late and we get bubbles in the whole scheduling pipeline, compared to do the activation in .stopping. But this is something that I'd like to explore more for the next set of improvements for scx_rustland.

BPF doesn't have full memory model yet, and while strict atomicity might
not be necessary in this context, it is advisable to enhance clarity in
the interlocking model.

To achieve this, provide the following primitives to operate on
usersched_needed:

  static void set_usersched_needed(void)

  static bool test_and_clear_usersched_needed(void)

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Provide the following primitives to get and set CPU ownership in the BPF
part. This improves code readability and these primitives can be used by
the BPF part as a baseline to implement a better CPU idle tracking in
the future.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Trigger the user-space scheduler only upon a task's CPU release event
(avoiding its activation during each enqueue event) and only if there
are tasks waiting to be processed by the user-space scheduler.

This should save unnecessary calls to the user-space scheduler, reducing
the overall overhead of the scheduler.

Moreover, rename nr_enqueues to nr_queued and store the amount of tasks
currently queued to the user-space scheduler (that are waiting to be
dispatched).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Adding extra overhead to any kthread can potentially slow down the
entire system, so make sure this never happens by dispatching all
kthreads directly on the same local CPU (not just the per-CPU kthreads),
bypassing the user-space scheduler.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the update_idle bubbles, I wonder whether what you need is scx_bpf_kick_cpu() on self. There's nothing triggering immediate dispatch otherwise. Another thing which might be interesting is that as the BPF and usersched parts get more tightly interlocked, it may make sense to have per-cpu usersched threads. Otherwise, some CPUs may go idle while waiting for usersched thread. If there's a dedicated usersched thread per cpu, every cpu can jump into the usersched thread when it runs out of things to do and then usersched can figure out synchronization (using raw spinlocks) and so on. While conceptually neat in that it really just moves per-cpu scheduling logic to userspace, this might be too much complexity for the purpose of this scheduler, so this is more dumping of the train of thought than anything.

*/
if (is_kthread(p) && p->nr_cpus_allowed == 1)
if (is_kthread(p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an immediate blocker but this might lead to not-so-great behaviors for things like unbound workqueue workers that can consume a lot of CPU cycles (e.g. encryption workers). They actually need scheduling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an immediate blocker but this might lead to not-so-great behaviors for things like unbound workqueue workers that can consume a lot of CPU cycles (e.g. encryption workers). They actually need scheduling.

Good point! Yes, in that case they definitely need scheduling, I'll revert that in the next set of changes.

@htejun htejun merged commit 990cd05 into sched-ext:main Dec 27, 2023
1 check passed
@arighi
Copy link
Collaborator Author

arighi commented Dec 28, 2023

For the update_idle bubbles, I wonder whether what you need is scx_bpf_kick_cpu() on self. There's nothing triggering immediate dispatch otherwise. Another thing which might be interesting is that as the BPF and usersched parts get more tightly interlocked, it may make sense to have per-cpu usersched threads. Otherwise, some CPUs may go idle while waiting for usersched thread. If there's a dedicated usersched thread per cpu, every cpu can jump into the usersched thread when it runs out of things to do and then usersched can figure out synchronization (using raw spinlocks) and so on. While conceptually neat in that it really just moves per-cpu scheduling logic to userspace, this might be too much complexity for the purpose of this scheduler, so this is more dumping of the train of thought than anything.

I'll do some experiments with scx_bpf_kick_cpu(), it should be enough to prevent the bubbles and it shouldn't add too much complexity. The per-cpu usersched is an interesting idea, I'll try to do some experiments as well, but unless it provides consistent performance improvements I'd stick with simplicity. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants