scx_rustland: clarify and improve BPF / userspace interlocking #48

arighi · 2023-12-27T13:35:42Z

A set of interlocking improvements for scx_rustland:

clarify interlocking between the BPF dispatcher and the user-space scheduler
introduce get/set_cpu_owner() primitives as a baseline to implement a better idle tracking in the future
move user-space scheduler activation from .enqueue() to .stopping() (and activate only when needed) to reduce unnecessary user-space activations
always bypass user-space scheduling for kthreads

I have also explored the possibility to move the user-space scheduler activation from .stopping() to .update_idle(), but the scheduler performs quite poorly with this change: it seems that CPUs are not fully used even in presence of CPU intensive workloads, as if the user-space scheduler isn't feeding the dispatcher fast enough. My guess is that activating the user-space scheduler in .update_idle() is a bit too late and we get bubbles in the whole scheduling pipeline, compared to do the activation in .stopping. But this is something that I'd like to explore more for the next set of improvements for scx_rustland.

BPF doesn't have full memory model yet, and while strict atomicity might not be necessary in this context, it is advisable to enhance clarity in the interlocking model. To achieve this, provide the following primitives to operate on usersched_needed: static void set_usersched_needed(void) static bool test_and_clear_usersched_needed(void) Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Provide the following primitives to get and set CPU ownership in the BPF part. This improves code readability and these primitives can be used by the BPF part as a baseline to implement a better CPU idle tracking in the future. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Trigger the user-space scheduler only upon a task's CPU release event (avoiding its activation during each enqueue event) and only if there are tasks waiting to be processed by the user-space scheduler. This should save unnecessary calls to the user-space scheduler, reducing the overall overhead of the scheduler. Moreover, rename nr_enqueues to nr_queued and store the amount of tasks currently queued to the user-space scheduler (that are waiting to be dispatched). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Adding extra overhead to any kthread can potentially slow down the entire system, so make sure this never happens by dispatching all kthreads directly on the same local CPU (not just the per-CPU kthreads), bypassing the user-space scheduler. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

htejun

For the update_idle bubbles, I wonder whether what you need is scx_bpf_kick_cpu() on self. There's nothing triggering immediate dispatch otherwise. Another thing which might be interesting is that as the BPF and usersched parts get more tightly interlocked, it may make sense to have per-cpu usersched threads. Otherwise, some CPUs may go idle while waiting for usersched thread. If there's a dedicated usersched thread per cpu, every cpu can jump into the usersched thread when it runs out of things to do and then usersched can figure out synchronization (using raw spinlocks) and so on. While conceptually neat in that it really just moves per-cpu scheduling logic to userspace, this might be too much complexity for the purpose of this scheduler, so this is more dumping of the train of thought than anything.

htejun · 2023-12-27T23:12:41Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

 	 */
-	if (is_kthread(p) && p->nr_cpus_allowed == 1)
+	if (is_kthread(p))


Not an immediate blocker but this might lead to not-so-great behaviors for things like unbound workqueue workers that can consume a lot of CPU cycles (e.g. encryption workers). They actually need scheduling.

Not an immediate blocker but this might lead to not-so-great behaviors for things like unbound workqueue workers that can consume a lot of CPU cycles (e.g. encryption workers). They actually need scheduling.

Good point! Yes, in that case they definitely need scheduling, I'll revert that in the next set of changes.

arighi · 2023-12-28T06:55:46Z

For the update_idle bubbles, I wonder whether what you need is scx_bpf_kick_cpu() on self. There's nothing triggering immediate dispatch otherwise. Another thing which might be interesting is that as the BPF and usersched parts get more tightly interlocked, it may make sense to have per-cpu usersched threads. Otherwise, some CPUs may go idle while waiting for usersched thread. If there's a dedicated usersched thread per cpu, every cpu can jump into the usersched thread when it runs out of things to do and then usersched can figure out synchronization (using raw spinlocks) and so on. While conceptually neat in that it really just moves per-cpu scheduling logic to userspace, this might be too much complexity for the purpose of this scheduler, so this is more dumping of the train of thought than anything.

I'll do some experiments with scx_bpf_kick_cpu(), it should be enough to prevent the bubbles and it shouldn't add too much complexity. The per-cpu usersched is an interesting idea, I'll try to do some experiments as well, but unless it provides consistent performance improvements I'd stick with simplicity. :)

arighi added 4 commits December 26, 2023 14:28

htejun approved these changes Dec 27, 2023

View reviewed changes

htejun merged commit 990cd05 into sched-ext:main Dec 27, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_rustland: clarify and improve BPF / userspace interlocking #48

scx_rustland: clarify and improve BPF / userspace interlocking #48

arighi commented Dec 27, 2023

htejun left a comment

htejun Dec 27, 2023

arighi Dec 28, 2023

arighi commented Dec 28, 2023

scx_rustland: clarify and improve BPF / userspace interlocking #48

scx_rustland: clarify and improve BPF / userspace interlocking #48

Conversation

arighi commented Dec 27, 2023

htejun left a comment

Choose a reason for hiding this comment

htejun Dec 27, 2023

Choose a reason for hiding this comment

arighi Dec 28, 2023

Choose a reason for hiding this comment

arighi commented Dec 28, 2023