scx_lavd: overhaul the virtual deadline algorithm #443

multics69 · 2024-07-20T13:40:12Z

This PR contains the major overhaul of the virtual deadline algorithm and relevant code cleanup.

Instead of stretching the time-space upon enqueue, we now advance the current virtual clock reverse proportionally to the system load. Under overload, the clock goes slower, so latency-critical tasks have more chance to cut in the timeline.
Instead of estimating service time (vruntime in CFS) from runtime and run frequency, we now directly measure the service time for eligibility enforcement.
We drop the runtime factor in calculating latency criticality since it is already considered in calculating the task's virtual deadline. Instead, we additionally consider the task's starvation factor (i.e., how much a task starved from the average service time) in calculating the latency criticality. By incorporating the starvation factor, we can systematically avoid the watchdog time-out error from the scx framework.
Instead of inheriting the parent's properties for a forked task, a forked task will be treated as a greedy task until the scheduler knows its true properties. This helps to avoid stalling under a fork bomb.
After the overhaul, we cleaned up many unnecessary codes and optimizations. Notably, we dropped the sched_prio_to_slice_weight[] table and directly used p->scx.weight.
Note that we first tried to maintain ineligible runnable tasks separately. However, we later removed this because it became unnecessary after the overhaul.

Estimating the service time from run time and frequency is not incorrect. However, it reacts slowly to sudden changes since it relies on the moving average. Hence, we directly measure the service time to enforce fairness. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Signed-off-by: Changwoo Min <changwoo@igalia.com>

This is a prep to add a global ineligible dsq. Signed-off-by: Changwoo Min <changwoo@igalia.com>

This is a prep for adding an ineligible DSQ. Signed-off-by: Changwoo Min <changwoo@igalia.com>

We now maintain two run queues—an eligible run queue (DSQ) and an ineligible run queue (rbtree)—sorted by the task's virtual deadline. When the eligible run queue is empty, or the ineligible run queue has not been consumed for too long (e.g., 15 msec), a task in the ineligible run queue is moved to the eligible run queue for execution. With these two queues, we have a better admission control. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Advancing the clock slower when overloaded gives more opportunities for latency-critical tasks to cut in the run queue. Controlling the clock better reflects the actual load than the prior approach of stretching the time-space when overloaded. Signed-off-by: Changwoo Min <changwoo@igalia.com>

If inheriting the parent's properties, a new fork task tends to be too prioritized. That is, many parent processes, such as `make,` are a bit more latency-critical than average. Signed-off-by: Changwoo Min <changwoo@igalia.com>

That is okay since the runtime is considered in calculating a virtual deadline. A shorter runtime will result in a tighter deadline linearly. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Signed-off-by: Changwoo Min <changwoo@igalia.com>

Use p->scx.weight instead. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Signed-off-by: Changwoo Min <changwoo@igalia.com>

In theory, sys_load_factor should not be necessary since we do not stretch the time space anymore. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Signed-off-by: Changwoo Min <changwoo@igalia.com>

These are no longer necessary after remnoving load factor calculation. Signed-off-by: Changwoo Min <changwoo@igalia.com>

These are no longer necessary after directly using latency criticality. Signed-off-by: Changwoo Min <changwoo@igalia.com>

LAVD_VDL_LOOSENESS_FT represents how loose the deadline is. The smaller value means the deadline is tighter. While it is unlikely to be tuned, let's keep it as a tunable for now. Signed-off-by: Changwoo Min <changwoo@igalia.com>

Further depenalize above-average latency-critical tasks and penalize further below-avergage latency-critical tasks in ineligibility duration. Signed-off-by: Changwoo Min <changwoo@igalia.com>

With all the other optimizations and tunings, it turns out that maintaining two runqueues has more harm than good. Signed-off-by: Changwoo Min <changwoo@igalia.com>

htejun · 2024-07-20T18:36:44Z

scheds/rust/scx_lavd/src/bpf/main.bpf.c

-	taskc_run = try_get_task_ctx(p_run);
-	if (taskc_run && p_run->scx.slice != 0)
-		try_yield_current_cpu(p_run, cpuc, taskc_run);
+	t = bpf_obj_new(typeof(*t));


I think it should be possible to pre-allocate this node on task_init() and keep it on taskc so that enqueue path doesn't have to do dynamic allocations but at the same time bpf_obj_new() might be cheap enough for this to not matter.

nvm, this gets removed later.

htejun · 2024-07-20T18:42:27Z

scheds/rust/scx_lavd/src/bpf/main.bpf.c

+
+	/*
+	 * Advance the clock up to the task's deadline. When overloaded,
+	 * advnace the clock slower so other can jump in the run queue.


typo: advnace

htejun · 2024-07-20T18:46:27Z

scheds/rust/scx_lavd/src/bpf/main.bpf.c

+	 */
+	ratio = (LAVD_LC_STARVATION_FT * stat_cur->avg_svc_time) /
+		taskc->svc_time;
+	return ratio + 1;


Shouldn't starvation avoidance be part of the virtual timeline management rather than implemented through boosting interactivity? Is this an artifact of eligible and ineligible timelines being managed separately?

Oh that gets removed later. I'm curious why deadline in itself isn't sufficient for starvation avoidance.

Signed-off-by: Changwoo Min <changwoo@igalia.com>

multics69 · 2024-07-21T08:59:28Z

Thanks @htejun for the review. I will merge it to the main.

Changwoo Min added 20 commits July 16, 2024 23:48

scx_lavd: tuning the max ineligible duration

adfbf39

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: pretty formatting for ineligible duration

971bb2e

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: rename LAVD_GLOBAL_DSQ to LAVD_ELIGIBLE_DSQ

c84b73e

This is a prep to add a global ineligible dsq. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: do not prioritize a wake-up task in ops.select_cpu()

55e19ea

This is a prep for adding an ineligible DSQ. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: do not inherit parent's properties

b90599e

If inheriting the parent's properties, a new fork task tends to be too prioritized. That is, many parent processes, such as `make,` are a bit more latency-critical than average. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: drop the runtime factor in calculating latency criticality

99e0d21

That is okay since the runtime is considered in calculating a virtual deadline. A shorter runtime will result in a tighter deadline linearly. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: consider starvation factor in determining latency criticality

034303f

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: drop sched_prio_to_slice_weight[] table

6f10d69

Use p->scx.weight instead. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: use lat_cri instead of lat_prio universally

67a6deb

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: drop sys_load_factor

c955cae

In theory, sys_load_factor should not be necessary since we do not stretch the time space anymore. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: directly use p->scx.weight instead load_ideal

02ad43d

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: properly synchronize taskc->vdeadline_log_clk

3924eba

Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: removed unused LAVD_LOAD_FACTOR_*

43f0fcb

These are no longer necessary after remnoving load factor calculation. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: remove LAVD_BOOST_*

e94070d

These are no longer necessary after directly using latency criticality. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: adjust ineligible duration according to task's lat_cri

827187d

Further depenalize above-average latency-critical tasks and penalize further below-avergage latency-critical tasks in ineligibility duration. Signed-off-by: Changwoo Min <changwoo@igalia.com>

scx_lavd: do not maintain ineligible runnable tasks separately

add96f0

With all the other optimizations and tunings, it turns out that maintaining two runqueues has more harm than good. Signed-off-by: Changwoo Min <changwoo@igalia.com>

multics69 requested review from htejun and Byte-Lab July 20, 2024 13:40

htejun approved these changes Jul 20, 2024

View reviewed changes

scx_lavd: fix typo

a9aab6b

Signed-off-by: Changwoo Min <changwoo@igalia.com>

multics69 merged commit af75d14 into sched-ext:main Jul 21, 2024
1 check passed

multics69 deleted the lavd-vtime branch July 21, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_lavd: overhaul the virtual deadline algorithm #443

scx_lavd: overhaul the virtual deadline algorithm #443

multics69 commented Jul 20, 2024

htejun Jul 20, 2024

htejun Jul 20, 2024

htejun Jul 20, 2024

htejun Jul 20, 2024

htejun Jul 20, 2024

multics69 commented Jul 21, 2024

scx_lavd: overhaul the virtual deadline algorithm #443

scx_lavd: overhaul the virtual deadline algorithm #443

Conversation

multics69 commented Jul 20, 2024

htejun Jul 20, 2024

Choose a reason for hiding this comment

htejun Jul 20, 2024

Choose a reason for hiding this comment

htejun Jul 20, 2024

Choose a reason for hiding this comment

htejun Jul 20, 2024

Choose a reason for hiding this comment

htejun Jul 20, 2024

Choose a reason for hiding this comment

multics69 commented Jul 21, 2024