Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland: voluntary context switch boost #85

Merged
merged 3 commits into from
Jan 11, 2024

Conversation

arighi
Copy link
Collaborator

@arighi arighi commented Jan 11, 2024

A significant improvement in scheduler responsiveness for low-latency application: in short, the idea is to take into account the amount of voluntary context switches to identify interactive tasks vs CPU-intensive background tasks, and use this criteria to reduce the accounted time slice of interactive tasks (implicitly boosting their priority).

This simple heuristic is much better than the previous approach that was de-prioritizing newly created tasks, as it allows to classify CPU-intensive workloads that are not spawning too many tasks vs the actual interactive low-latency applications. And it also allows avoids to de-prioritize interactive applications that need to fork short-lived tasks, e.g., interactive shell sessions, that receive a big responsiveness improvement with this change.

In conclusion:

  • I can now play Terraria at 60 fps even with a stress-ng -c 32 running in background
  • I can run shell commands faster while the kernel is building in background make -j 32 (faster than the default linux scheduler)
  • For my typical personal workload (reading emails, browsing internet, using git, vim, and running shell commands while I recompile kernels in the background) this scheduler now seems to perform nearly as well as the default Linux scheduler, and in some cases even better!

Provide the number of voluntary context switches (nvcsw) for each task
to the user-space scheduler.

This extra information can then be used by the scheduler to enhance its
decision-making process when scheduling tasks.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Improve priority boosting using voluntary context switches metric.

Overview
========

The current criteria to apply the time slice boost (option `-b`) is to
distinguish between newly created tasks and tasks that are already
running: in order to prioritize interactive applications (games,
multimedia, etc.) we apply a time slice usage penalty on newly created
tasks, indirectly boosting the priority of tasks that are already
running, which are likely to be the interactive applications that we
aim to prioritize.

Problem
=======

This approach works well when the background workload forks a bunch of
short-lived tasks (e.g., a parallel kernel build), but it fails to
properly classify CPU-intensive background tasks (i.e., video/3D
rendering, encryption, large data analysis, etc.), because these
applications, typically, do not generate many short-lived processes.

In presence of such workloads the time slice penalty is not enforced,
resulting in a lack of any boost for interactive applications.

Solution
========

A more effective critiria for distinguishing between interactive
applications and background CPU-intensive applications is to examine the
voluntary context switches: an application that periodically releases
the CPU voluntarily is very likely to be interactive.

Therefore, change the time slice boost logic to apply a bonus (scale down
the accounted used time slice) to tasks that show an increase in their
voluntary context switches counter over a time frame of 10 sec.

Based on experimental results, this simple heurstic appears to be quite
effective in classifying interactive tasks and prioritize them over
potential background CPU-intensive tasks.

Additionally, having a better criteria to identify interactive tasks
allow to prioritize also newly created tasks, thereby enhancing the
responsiveness of interactive shell sessions.

This always ensures the prompt execution of system commands, even when
the system is massively overloaded, unlike the previous time slice boost
logic, which made interactive shell sessions less responsive by
deprioritizing newly created tasks.

Results
=======

With this new logic in place it is possible to play a video game (e.g.,
Terraria) without experiencing any frame rate drop (60 fps), while a
parallel CPU stress test (`stress-ng -c 32`) is running in the
background. The same result can also be obtained with a parallel kernel
build (`make -j 32`). Thus, there is no regression compared to the
previous "ideal" test case.

Even when mixing both workloads (`make -j 16` + `stress-ng -c 16`),
Terraria can still be played without noticeable lag in the audio or
video, maintaining a consistent 60 fps.

In addition to that, shell commands are also very responsive.

Following, the results (average and standard deviation of 10 runs) of
two simple interactive shell commands, while both the `make -j 16` and
`stress-ng -c 16` workloads are running in background:

  avg time           "uname -r"       "ps axuw > /dev/null"
  =========================================================
  EEVDF                 11.1ms                     231.8ms
  scx_rustland           2.6ms                     212.0ms

  stdev              "uname -r"       "ps axuw > /dev/null"
  =========================================================
  EEVDF                   2.28                       23.41
  scx_rustland            0.70                        9.11

Tests conducted on a 8-cores laptop (11th Gen Intel i7-1195G7 @
4.800GHz) with 16GB of RAM.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Add a brief troubleshooting section to the command line help.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
pub sum_exec_runtime: u64, // Total cpu time */
pub weight: u64, // Task static priority */
pub sum_exec_runtime: u64, // Total cpu time
pub nvcsw: u64, // Voluntary context switches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to communicate this to the userspace scheduler would be sending how the task went off CPU along with the wakeup message and let userland figure out the more detailed statistics, which likely will make it easier to get more detailed understanding (e.g. runtime percentiles and whatnot).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good idea. Maybe I can store that info in a BPF_MAP_TYPE_TASK_STORAGE and update some counters in .running() and .stopping(), then pass this info to the user-space, so it's also everything self-contained and I don't have to rely on any other kernel statistics for that...

@arighi arighi merged commit e0bf232 into main Jan 11, 2024
2 checks passed
@Byte-Lab Byte-Lab deleted the scx-rustland-voluntary-context-switch-boost branch March 14, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants