New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx-rustland: SMT improvements #94
Conversation
* Kick the CPU to make it immediately ready to accept | ||
* dispatched tasks. | ||
*/ | ||
scx_bpf_kick_cpu(cpu, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I wonder why this is. It might be useful to add an option to re-enable this, so that if ppl report stalls, they can try the option as a debug step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, looks like a good idea, and I'd also like to bisect it a bit and figure out when exactly this stopped being useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Hm... even with v0.1.5 dropping scx_bpf_kick_cpu()
doesn't seem to make any difference, any chance that maybe something changed in the kernel? (Going before v0.1.5 is a bit tricky because of the ABI change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if the fact that now I'm always dispatching from .dispatch()
with SCX_DSQ_LOCAL_ON | cpu
(using the cpu selected by the user-space scheduler) implicitly kicks the target cpu, therefore the kick in .update_idle()
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel side hasn't changed. Yeah, LOCAL_ON|cpu
will wake up the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think everything makes sense now. I was using SCX_DSQ_GLOBAL
for the scheduler and it wasn't waking up without the explicit kick cpu. Then I changed the scheduler to also use LOCAL_ON|cpu
, so if that implicitly kicks the cpu we don't need the kick in update_idle anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. No need to add the option then. That sounds like it should be reliable. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the commit message, so it's more clear why we dropped that.
Prior to commit 676bd88 ("bpf_rustland: do not dispatch the scheduler to the global DSQ"), the user-space scheduler was dispatched using SCX_DSQ_GLOBAL and we needed to explicitly kick idle CPUs from update_idle() to ensure that at least one CPU was available to run the user-space scheduler. Now that we are using SCX_DSQ_LOCAL_ON|cpu to dispatch the user-space scheduler, the target CPU is implicitly kicked. Therefore, the call to scx_bpf_kick_cpu() within .update_idle() becomes redundant and we can get rid of it. Fixes: 676bd88 ("bpf_rustland: do not dispatch the scheduler to the global DSQ") Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The user-space scheduler dispatches tasks in batches, with the batch size matching the number of idle CPUs. Commit 791bdbe ("scx_rustland: introduce SMT support") changed the order of idle CPUs, prioritizing dispatching tasks on the least busy cores (those with the most idle CPUs) before moving on to busier cores (those with the least idle CPUs). While this approach works well for a small number of tasks, it can lead to uneven performance as the number of tasks increases and all cores are saturated. Such uneven performance can be attributed to SMT interactions causing potential short lags and erratic system performance. In some cases, disabling SMT entirely results in better system responsiveness. To address this issue, instruct the scheduler to implicitly disable SMT and consistently dispatch tasks only on the first (or last) CPU of each core. This approach ensures an equal distribution of tasks among the available cores, preventing SMT disturbances and aligning with non-SMT performance, also when a significant amount of tasks are running. Additionally, the unused sibling CPUs within each core can be used as "spare" CPUs for the BPF dispatcher. This is particularly beneficial for tasks that cannot be dispatched on the target CPU selected by the scheduler, due to cpumask restrictions or congestion conditions. Therefore, this new approach allows to enhance system responsiveness on SMT systems, while simultaneously improving scheduler stability. Some preliminary results on an AMD Ryzen 7 5800X 8-Cores (SMT enabled): running my usual benchmark of measuring the fps of a videogame (Counter-Strike 2) during a parallel kernel build-induced system overload, shows an improvement of approximately 2x (from 8-10fps to 15-25fps vs 1-2fps with EEVDF). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
770e9b5
to
be1cb87
Compare
This PR contains an improvement in the SMT logic, it's actually a simplification and, performance-wise, it seem to deliver better results, so it's a win-win IMHO.
As mentioned in the git commit:
The other commit is more like a clean-up, apparently now we can drop the
scx_bpf_kick_cpu()
fromupdate_idle()
without any noticeable performance drop, so let's just get rid of it.