-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_rustland: voluntary context switch boost #85
Conversation
Provide the number of voluntary context switches (nvcsw) for each task to the user-space scheduler. This extra information can then be used by the scheduler to enhance its decision-making process when scheduling tasks. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Improve priority boosting using voluntary context switches metric. Overview ======== The current criteria to apply the time slice boost (option `-b`) is to distinguish between newly created tasks and tasks that are already running: in order to prioritize interactive applications (games, multimedia, etc.) we apply a time slice usage penalty on newly created tasks, indirectly boosting the priority of tasks that are already running, which are likely to be the interactive applications that we aim to prioritize. Problem ======= This approach works well when the background workload forks a bunch of short-lived tasks (e.g., a parallel kernel build), but it fails to properly classify CPU-intensive background tasks (i.e., video/3D rendering, encryption, large data analysis, etc.), because these applications, typically, do not generate many short-lived processes. In presence of such workloads the time slice penalty is not enforced, resulting in a lack of any boost for interactive applications. Solution ======== A more effective critiria for distinguishing between interactive applications and background CPU-intensive applications is to examine the voluntary context switches: an application that periodically releases the CPU voluntarily is very likely to be interactive. Therefore, change the time slice boost logic to apply a bonus (scale down the accounted used time slice) to tasks that show an increase in their voluntary context switches counter over a time frame of 10 sec. Based on experimental results, this simple heurstic appears to be quite effective in classifying interactive tasks and prioritize them over potential background CPU-intensive tasks. Additionally, having a better criteria to identify interactive tasks allow to prioritize also newly created tasks, thereby enhancing the responsiveness of interactive shell sessions. This always ensures the prompt execution of system commands, even when the system is massively overloaded, unlike the previous time slice boost logic, which made interactive shell sessions less responsive by deprioritizing newly created tasks. Results ======= With this new logic in place it is possible to play a video game (e.g., Terraria) without experiencing any frame rate drop (60 fps), while a parallel CPU stress test (`stress-ng -c 32`) is running in the background. The same result can also be obtained with a parallel kernel build (`make -j 32`). Thus, there is no regression compared to the previous "ideal" test case. Even when mixing both workloads (`make -j 16` + `stress-ng -c 16`), Terraria can still be played without noticeable lag in the audio or video, maintaining a consistent 60 fps. In addition to that, shell commands are also very responsive. Following, the results (average and standard deviation of 10 runs) of two simple interactive shell commands, while both the `make -j 16` and `stress-ng -c 16` workloads are running in background: avg time "uname -r" "ps axuw > /dev/null" ========================================================= EEVDF 11.1ms 231.8ms scx_rustland 2.6ms 212.0ms stdev "uname -r" "ps axuw > /dev/null" ========================================================= EEVDF 2.28 23.41 scx_rustland 0.70 9.11 Tests conducted on a 8-cores laptop (11th Gen Intel i7-1195G7 @ 4.800GHz) with 16GB of RAM. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Add a brief troubleshooting section to the command line help. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
pub sum_exec_runtime: u64, // Total cpu time */ | ||
pub weight: u64, // Task static priority */ | ||
pub sum_exec_runtime: u64, // Total cpu time | ||
pub nvcsw: u64, // Voluntary context switches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to communicate this to the userspace scheduler would be sending how the task went off CPU along with the wakeup message and let userland figure out the more detailed statistics, which likely will make it easier to get more detailed understanding (e.g. runtime percentiles and whatnot).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's a good idea. Maybe I can store that info in a BPF_MAP_TYPE_TASK_STORAGE and update some counters in .running()
and .stopping()
, then pass this info to the user-space, so it's also everything self-contained and I don't have to rely on any other kernel statistics for that...
A significant improvement in scheduler responsiveness for low-latency application: in short, the idea is to take into account the amount of voluntary context switches to identify interactive tasks vs CPU-intensive background tasks, and use this criteria to reduce the accounted time slice of interactive tasks (implicitly boosting their priority).
This simple heuristic is much better than the previous approach that was de-prioritizing newly created tasks, as it allows to classify CPU-intensive workloads that are not spawning too many tasks vs the actual interactive low-latency applications. And it also allows avoids to de-prioritize interactive applications that need to fork short-lived tasks, e.g., interactive shell sessions, that receive a big responsiveness improvement with this change.
In conclusion:
stress-ng -c 32
running in backgroundmake -j 32
(faster than the default linux scheduler)