scx_rustland: refactoring #68

arighi · 2024-01-04T15:03:20Z

Minor improvement:

scx_rustland: make dispatcher more robust
scx_rustland: remove unnecessary scx_bpf_dispatch_nr_slots() check
scx_rustland: introduce nr_failed_dispatches

Code refactoring:

scx_rustland: provide an abstraction layer for the BPF component
scx_rustland: update comments and documentation in the BPF part

There is basically no functional change in this pull request, just some code refactoring. The important part is the new abstraction to the BPF component that makes the scheduler code much more readable and understandable.

Maybe in the future we could also consider to move this BPF abstraction to a more generic place, with a proper API documentation, to give the possibility to implement scx user-space schedulers in a more high-level abstracted way.

No functional change, only a little polishing, including updates to comments and documentation to align with the latest changes in the code. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

In the dispatch callback we can dispatch tasks to any CPU, according to the scheduler decisions, so there's no reason to check for the available dispatch slots in the current CPU only, to determine if we need to stop dispatching tasks. Since the scheduler is aware of the idle state of the CPUs (via the CPU ownership map) it has all the information to automatically regulate the flow of dispatched tasks and not overflow the dispatch slots, therefore it is safe to remove this check. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

We always try to use the current CPU (from the .dispatch() callback) to run the user-space scheduler itself and if the current CPU is not usable (according to the cpumask) we just re-use the previouly used CPU. However, if the previously used CPU is also not usable, we may trigger the following error: sched_ext: runtime error (SCX_DSQ_LOCAL[_ON] verdict target cpu 4 not allowed for scx_rustland[256201]) Potentially this can also happen with any task, so improve the dispatch logic as following: - dispatch on the target CPU, if usable - otherwise dispatch on the previously used CPU, if usable - otherwise dispatch on the global DSQ Moreover, rename dispatch_on_cpu() -> dispatch_task() for better clarity. This should be enough to handle all the possible decisions made by the user-space scheduler, making the dispatcher more robust. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Byte-Lab

Looks great, thanks! I left a couple of nits, but this LGTM. Please feel free to merge once they're addressed.

Byte-Lab · 2024-01-04T15:25:48Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

+			dbg_msg("%s: pid=%d global", __func__, p->pid);
+			scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, slice_ns,
+					 enq_flags);


I wonder if this should eventually be an error condition in the scheduler? Unfortunately we don't yet have an ergonomic API for looking at a task's cpumask in user space so it might be a bit premature.

Also, on the other hand, maybe it would make the scheduler unnecessarily brittle in that a task could exit or something while it's being dispatched from user space. So this fallback is probably fine, but maybe we should add a comment explaining why this could happen?

I wonder if this should eventually be an error condition in the scheduler? Unfortunately we don't yet have an ergonomic API for looking at a task's cpumask in user space so it might be a bit premature.

Oh yes! Having an API to access cpumask would be super useful... we could use that in the scheduler to avoid selecting dispatching tasks on CPUs that can't be used.

Right now we could report that as a warning maybe? I don't think we should error the scheduler, it doesn't seem to be a too critical condition, considering that in the worst case scenario we can simply use the global DSQ and everything will work just fine.

Moreover even if we have access to the cpumask in the future we may still do something similar, unless we also provide some mechanisms to lock the cpumask.

Also, on the other hand, maybe it would make the scheduler unnecessarily brittle in that a task could exit or something while it's being dispatched from user space. So this fallback is probably fine, but maybe we should add a comment explaining why this could happen?

Absolutely, I'll add a comment. Maybe even introduce an extra counter, like nr_dispatch_failures or something similar?

Right now we could report that as a warning maybe? I don't think we should error the scheduler, it doesn't seem to be a too critical condition, considering that in the worst case scenario we can simply use the global DSQ and everything will work just fine.

Yeah, agreed. And I don't think it necessarily indicates a bad / corrupted scheduler or anything anyways given that everything is async, so this is inherently a bit racy.

Absolutely, I'll add a comment. Maybe even introduce an extra counter, like nr_dispatch_failures or something similar?

SGTM!

Byte-Lab · 2024-01-04T15:27:12Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

@@ -596,6 +624,12 @@ void BPF_STRUCT_OPS(rustland_disable, struct task_struct *p)

 /*
 * Heartbeat scheduler timer callback.
+ *
+ * If the system is completely idle the sched-ext wathcdog may incorrectly


Move the code responsible for interfacing with the BPF component into its own module and provide high-level abstractions for the user-space scheduler, hiding all the internal BPF implementation details. This makes the user-space scheduler code much more readable and it allows potential developers/contributors that want to focus at the pure scheduling details to modify the scheduler in a generic way, without having to worry about the internal BPF details. In the future we may even decide to provide the BPF abstraction as a separate crate, that could be used as a baseline to implement user-space schedulers in Rust. API overview ============ The main BPF interface is provided by BpfScheduler(). When this object is initialized it will take care of registering and initializing the BPF component. Then the scheduler can use the BpfScheduler() instance to receive tasks (in the form of QueuedTask object) and dispatch tasks (in the form of DispatchedTask objects), using respectively the methods dequeue_task() and dispatch_task(). The CPU ownership map can be accessed using the method get_cpu_pid(), this also allows to keep track of the idle and busy CPUs, with the corrsponding PIDs associated to them. BPF counters and statistics can be accessed using the methods nr_*_mut(), in particular nr_queued_mut() and nr_scheduled_mut() can be updated to notify the BPF component if the user-space scheduler has some pending work to do or not. Finally the methods read_bpf_exit_kind() and report_bpf_exit_kind() can be used respectively to read the exit code and exit message from the BPF component, when the scheduler is unregistered. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Byte-Lab

LGTM! Just one small typo

Byte-Lab · 2024-01-04T16:32:48Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

 		cpu = scx_bpf_task_cpu(p);
+		if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr)) {
+			/*
+			 * Both the designaged CPU and the previously used CPU


...and fixed! I should really run a syntax checker. And something that checks when I confuse scx_rustland with scx_userland as well. 🤣

Thanks!

Introduce a new counter to report the amount of failed dispatches: if the scheduler designates a target CPU for a task, and both the chosen CPU and the previously utilized one are unavailable when the task is dispatched, the task will be sent to the global DSQ, and the counter will be incremented. Also mark all the methods to access these statistics counters as optional. In the future we may also provide a "verbose" option and show these statistics only when the scheduler runs in verbose mode. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

arighi added 3 commits January 4, 2024 09:40

scx_rustland: update comments and documentation in the BPF part

6b1e7d9

No functional change, only a little polishing, including updates to comments and documentation to align with the latest changes in the code. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Byte-Lab approved these changes Jan 4, 2024

View reviewed changes

arighi force-pushed the scx-rustland-refactoring branch from d5865e2 to 50735e4 Compare January 4, 2024 16:28

Byte-Lab approved these changes Jan 4, 2024

View reviewed changes

arighi force-pushed the scx-rustland-refactoring branch from 50735e4 to 7813992 Compare January 4, 2024 16:36

arighi merged commit 96f3eb4 into main Jan 4, 2024
2 checks passed

htejun deleted the scx-rustland-refactoring branch January 10, 2024 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_rustland: refactoring #68

scx_rustland: refactoring #68

arighi commented Jan 4, 2024 •

edited

Byte-Lab left a comment

Byte-Lab Jan 4, 2024

arighi Jan 4, 2024

Byte-Lab Jan 4, 2024

Byte-Lab Jan 4, 2024

Byte-Lab left a comment

Byte-Lab Jan 4, 2024

arighi Jan 4, 2024

scx_rustland: refactoring #68

scx_rustland: refactoring #68

Conversation

arighi commented Jan 4, 2024 • edited

Byte-Lab left a comment

Choose a reason for hiding this comment

Byte-Lab Jan 4, 2024

Choose a reason for hiding this comment

arighi Jan 4, 2024

Choose a reason for hiding this comment

Byte-Lab Jan 4, 2024

Choose a reason for hiding this comment

Byte-Lab Jan 4, 2024

Choose a reason for hiding this comment

Byte-Lab left a comment

Choose a reason for hiding this comment

Byte-Lab Jan 4, 2024

Choose a reason for hiding this comment

arighi Jan 4, 2024

Choose a reason for hiding this comment

arighi commented Jan 4, 2024 •

edited