-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_rustland: enhancements #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Left one optional nit, and one suggestion for how we can make _get_task_cpu_slow()
a lot faster.
int i; | ||
|
||
for (i = 0; i < num_possible_cpus; i++) { | ||
cpu = (cpu + i) % num_possible_cpus; | ||
if (bpf_cpumask_test_cpu(cpu, p->cpus_ptr)) | ||
return cpu; | ||
} | ||
|
||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something else you could do is call bpf_cpumask_any_distribute(p->cpus_ptr)
to choose a random CPU from p->cpus_ptr
? Note that it returns a cpu >= num_cpus if none of the bits are set, so you'd have to do something like:
cpu = bpf_cpumask_any_distribute(p->cpus_ptr);
return cpu < num_possible_cpus ? cpu : -1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion! Thanks! I'll apply it and run some tests. I think we should still maintain the logic to try the prev_cpu before picking a random one using bpf_cpumask_any_distribute()
, that should reduce some unnecessary migrations, but I'll test/measure both and check if one shows some clear benefits vs the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agreed. I was thinking that bpf_cpumask_any_distribute()
could replace that for loop. But trying to choose prev_cpu first seems like a good idea.
@@ -375,6 +411,14 @@ static bool is_task_cpu_available(struct task_struct *p, u64 enq_flags) | |||
if (is_kthread(p) && p->nr_cpus_allowed == 1) | |||
return true; | |||
|
|||
/* | |||
* Do not bypass the user-space scheduler if there are pending | |||
* activity, otherwise, we may risk to disrupt the scheduler's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional suggestion:
s/we may risk to disrupt/we may risk disrupting
Moreover, I'm not yet fully convinced about the last 2 commits (change the default timeout_ms and slice_ns). While they can definitely provide some benefits for interactive workloads I think there's also value to keep them as they are, because that can help to better measure future improvements of the scheduler. Maybe I can add a command line option that would set them if a certain command line option is passed... |
Change TaskTree.push() to accept directly a Task object, rather than each individual attribute. Moreover, Task attributes don't need to be public, since both TaskTree and Task are only used locally. This makes the code more elegant and more readable. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Assign min_vruntime to the task before the weighted time slice is evaluated, then add the time slice. In this way we still ensure that the task's vruntime is in the range (min_vruntime + 1, min_vruntime + max_slice_ns], but we don't nullify the effect of the evaluated time slice if the starting vruntime of the task is too small. Also change update_enqueued() to return the evaluated weighted time slice (that can be used in the future). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Instead of just trying the target CPU and the previously used CPU, we could cycle among all the available CPUs (if both those CPUs cannot be used), before using the global DSQ. This allows to not de-prioritize too much tasks that can't be scheduled on the CPU selected by the scheduler (or their previously used CPU), and we can still dispatch them using SCX_DSQ_LOCAL_ON, like any other task. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When we fail to push a task to the queued BPF map we fallback to direct dispatch, but we can't use SCX_DSQ_LOCAL_ON. So, make sure to use SCX_DSQ_GLOBAL in this case to prevent scheduler crashes. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We allow tasks to bypass the user-space scheduler and be dispatched directly using a shortcut in the enqueue path, if their running CPU is immediately available or if the task is per-CPU kthread. However, the shortcut is disabled if the user-space scheduler has some pending activities to do (to avoid disrupting too much its decision). In this case the shortcut is disabled also for per-CPU kthreads and that may cause priority-inversion problems in the system, triggering some stall of some per-CPU kthreads (such as rcuog/N) and short system lockups, if the system is overloaded. Prevent this by always enabing the dispatch shortcut for per-CPU kthreads. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
bd5024c
to
295873a
Compare
Alright, I dropped some commits and updated this PR only with the "safe" fixes. For the more experimental ones I'd like to do more testing (even if the other changes make the scheduler more responsive, they also make it more unpredictable sometimes, so I would like to investigate more on that). |
Always report task comm, nr_queued and nr_scheduled in the log messages. Moreover, report also task name (comm) and cpu when possible. All these extra information can be really helpful to trace and debug scheduling issues. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Cache slice_ns into the main scheduler class to avoid accessing it via self.bpf.skel.rodata().slice_ns every single time. This also makes the scheduler code more clear and more abstracted from the BPF details. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Some fixes and enhancements to make scx_rustland more usable on a desktop / interactive system.