scx_rustland: voluntary context switch boost #85

arighi · 2024-01-11T17:35:15Z

A significant improvement in scheduler responsiveness for low-latency application: in short, the idea is to take into account the amount of voluntary context switches to identify interactive tasks vs CPU-intensive background tasks, and use this criteria to reduce the accounted time slice of interactive tasks (implicitly boosting their priority).

This simple heuristic is much better than the previous approach that was de-prioritizing newly created tasks, as it allows to classify CPU-intensive workloads that are not spawning too many tasks vs the actual interactive low-latency applications. And it also allows avoids to de-prioritize interactive applications that need to fork short-lived tasks, e.g., interactive shell sessions, that receive a big responsiveness improvement with this change.

In conclusion:

I can now play Terraria at 60 fps even with a stress-ng -c 32 running in background
I can run shell commands faster while the kernel is building in background make -j 32 (faster than the default linux scheduler)
For my typical personal workload (reading emails, browsing internet, using git, vim, and running shell commands while I recompile kernels in the background) this scheduler now seems to perform nearly as well as the default Linux scheduler, and in some cases even better!

Provide the number of voluntary context switches (nvcsw) for each task to the user-space scheduler. This extra information can then be used by the scheduler to enhance its decision-making process when scheduling tasks. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Improve priority boosting using voluntary context switches metric. Overview ======== The current criteria to apply the time slice boost (option `-b`) is to distinguish between newly created tasks and tasks that are already running: in order to prioritize interactive applications (games, multimedia, etc.) we apply a time slice usage penalty on newly created tasks, indirectly boosting the priority of tasks that are already running, which are likely to be the interactive applications that we aim to prioritize. Problem ======= This approach works well when the background workload forks a bunch of short-lived tasks (e.g., a parallel kernel build), but it fails to properly classify CPU-intensive background tasks (i.e., video/3D rendering, encryption, large data analysis, etc.), because these applications, typically, do not generate many short-lived processes. In presence of such workloads the time slice penalty is not enforced, resulting in a lack of any boost for interactive applications. Solution ======== A more effective critiria for distinguishing between interactive applications and background CPU-intensive applications is to examine the voluntary context switches: an application that periodically releases the CPU voluntarily is very likely to be interactive. Therefore, change the time slice boost logic to apply a bonus (scale down the accounted used time slice) to tasks that show an increase in their voluntary context switches counter over a time frame of 10 sec. Based on experimental results, this simple heurstic appears to be quite effective in classifying interactive tasks and prioritize them over potential background CPU-intensive tasks. Additionally, having a better criteria to identify interactive tasks allow to prioritize also newly created tasks, thereby enhancing the responsiveness of interactive shell sessions. This always ensures the prompt execution of system commands, even when the system is massively overloaded, unlike the previous time slice boost logic, which made interactive shell sessions less responsive by deprioritizing newly created tasks. Results ======= With this new logic in place it is possible to play a video game (e.g., Terraria) without experiencing any frame rate drop (60 fps), while a parallel CPU stress test (`stress-ng -c 32`) is running in the background. The same result can also be obtained with a parallel kernel build (`make -j 32`). Thus, there is no regression compared to the previous "ideal" test case. Even when mixing both workloads (`make -j 16` + `stress-ng -c 16`), Terraria can still be played without noticeable lag in the audio or video, maintaining a consistent 60 fps. In addition to that, shell commands are also very responsive. Following, the results (average and standard deviation of 10 runs) of two simple interactive shell commands, while both the `make -j 16` and `stress-ng -c 16` workloads are running in background: avg time "uname -r" "ps axuw > /dev/null" ========================================================= EEVDF 11.1ms 231.8ms scx_rustland 2.6ms 212.0ms stdev "uname -r" "ps axuw > /dev/null" ========================================================= EEVDF 2.28 23.41 scx_rustland 0.70 9.11 Tests conducted on a 8-cores laptop (11th Gen Intel i7-1195G7 @ 4.800GHz) with 16GB of RAM. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

Add a brief troubleshooting section to the command line help. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

htejun · 2024-01-11T17:45:47Z

scheds/rust/scx_rustland/src/bpf.rs

-    pub sum_exec_runtime: u64, // Total cpu time */
-    pub weight: u64,           // Task static priority */
+    pub sum_exec_runtime: u64, // Total cpu time
+    pub nvcsw: u64,            // Voluntary context switches


Another way to communicate this to the userspace scheduler would be sending how the task went off CPU along with the wakeup message and let userland figure out the more detailed statistics, which likely will make it easier to get more detailed understanding (e.g. runtime percentiles and whatnot).

Ah that's a good idea. Maybe I can store that info in a BPF_MAP_TYPE_TASK_STORAGE and update some counters in .running() and .stopping(), then pass this info to the user-space, so it's also everything self-contained and I don't have to rely on any other kernel statistics for that...

arighi added 3 commits January 11, 2024 14:10

scx_rustland: add a troubleshooting section

12d89e1

Add a brief troubleshooting section to the command line help. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

htejun approved these changes Jan 11, 2024

View reviewed changes

arighi merged commit e0bf232 into main Jan 11, 2024
2 checks passed

Byte-Lab deleted the scx-rustland-voluntary-context-switch-boost branch March 14, 2024 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_rustland: voluntary context switch boost #85

scx_rustland: voluntary context switch boost #85

arighi commented Jan 11, 2024

htejun Jan 11, 2024

arighi Jan 11, 2024

scx_rustland: voluntary context switch boost #85

scx_rustland: voluntary context switch boost #85

Conversation

arighi commented Jan 11, 2024

htejun Jan 11, 2024

Choose a reason for hiding this comment

arighi Jan 11, 2024

Choose a reason for hiding this comment