The user-space scheduler dispatches tasks in batches, with the batch
size matching the number of idle CPUs.
Commit 791bdbe ("scx_rustland: introduce SMT support") changed the order
of idle CPUs, prioritizing dispatching tasks on the least busy cores
(those with the most idle CPUs) before moving on to busier cores (those
with the least idle CPUs).
While this approach works well for a small number of tasks, it can lead
to uneven performance as the number of tasks increases and all cores are
saturated. Such uneven performance can be attributed to SMT interactions
causing potential short lags and erratic system performance. In some
cases, disabling SMT entirely results in better system responsiveness.
To address this issue, instruct the scheduler to implicitly disable SMT
and consistently dispatch tasks only on the first (or last) CPU of each
core. This approach ensures an equal distribution of tasks among the
available cores, preventing SMT disturbances and aligning with non-SMT
performance, also when a significant amount of tasks are running.
Additionally, the unused sibling CPUs within each core can be used as
"spare" CPUs for the BPF dispatcher. This is particularly beneficial for
tasks that cannot be dispatched on the target CPU selected by the
scheduler, due to cpumask restrictions or congestion conditions.
Therefore, this new approach allows to enhance system responsiveness on
SMT systems, while simultaneously improving scheduler stability.
Some preliminary results on an AMD Ryzen 7 5800X 8-Cores (SMT enabled):
running my usual benchmark of measuring the fps of a videogame
(Counter-Strike 2) during a parallel kernel build-induced system
overload, shows an improvement of approximately 2x (from 8-10fps to
15-25fps vs 1-2fps with EEVDF).
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>