scx_lavd: replesih time slice at ops.running() only when necessary #250

multics69 · 2024-04-29T03:24:14Z

The current code replenishes the task's time slice whenever the task becomes ops.running(). However, there is a case where such behavior can starve the other tasks, causing the watchdog timeout error. One (if not all) such case is when a task is preempted while running by the higher scheduler class (e.g., RT, DL). In such a case, the task will be transit in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc. Whenever it becomes re-running, it will be placed at the head of local DSQ and ops.running() will renew its time slice. Hence, in the worst case, the task can run forever since its time slice is never exhausted. The fix is assigning the time slice only once by checking if the time slice is calculated before.

Suggested-by: Tejun Heo tj@kernel.org

The current code replenishes the task's time slice whenever the task becomes ops.running(). However, there is a case where such behavior can starve the other tasks, causing the watchdog timeout error. One (if not all) such case is when a task is preempted while running by the higher scheduler class (e.g., RT, DL). In such a case, the task will be transit in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc. Whenever it becomes re-running, it will be placed at the head of local DSQ and ops.running() will renew its time slice. Hence, in the worst case, the task can run forever since its time slice is never exhausted. The fix is assigning the time slice only once by checking if the time slice is calculated before. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Changwoo Min <changwoo@igalia.com>

htejun · 2024-04-29T17:56:20Z

scheds/rust/scx_lavd/src/bpf/intf.h

@@ -49,15 +49,16 @@ enum consts {
 	NSEC_PER_USEC			= 1000ULL,
 	NSEC_PER_MSEC			= (1000ULL * NSEC_PER_USEC),
 	LAVD_TIME_ONE_SEC		= (1000ULL * NSEC_PER_MSEC),
+	LAVD_TIME_INFINITY_NS		= 0xFFFFFFFFFFFFFFFFULL,


You should be able to sue SCX_SLICE_INF which is the same u64 max value. Note that if a running task has this slice value, the tick is stopped. I don't think lavd ever actually ends up running tasks with this value tho, so not really a concern but just something to note.

Thank! I will update it accordingly.

htejun · 2024-04-29T18:03:45Z

scheds/rust/scx_lavd/src/bpf/main.bpf.c

+	 * the task can run forever.
+	 */
+	return p->scx.slice == LAVD_SLICE_UNDECIDED ||
+	       p->scx.slice == SCX_SLICE_DFL;


So, the only time the kernel assigns SCX_SLICE_DFL is when the the currently running task is the only runnable task on the CPU. When the task's slice expires, the kernel sets slice to the default value and keeps running the task. This is a convenience feature which can be disabled by setting SCX_OPS_ENQ_LAST in ops.flags. When the flag is set, the task will always be enqueued when the slice expires whether it's the last runnable task on the CPU or not. When the last task is enqueued, ops.enqueue() is called with SCX_ENQ_LAST flag:

/* * The task being enqueued is the only task available for the cpu. By * default, ext core keeps executing such tasks but when * %SCX_OPS_ENQ_LAST is specified, they're ops.enqueue()'d with the * %SCX_ENQ_LAST flag set. * * If the BPF scheduler wants to continue executing the task, * ops.enqueue() should dispatch the task to %SCX_DSQ_LOCAL immediately. * If the task gets queued on a different dsq or the BPF side, the BPF * scheduler is responsible for triggering a follow-up scheduling event. * Otherwise, Execution may stall. */ SCX_ENQ_LAST = 1LLU << 41,

For now, I will keep the code as it is but later when the preemption code per tick is ready, I will change it.
Thank you!

htejun · 2024-04-29T19:01:02Z

As we haven't cut releases for a while, imma land this now, bump versions and cut a new one.

multics69 requested a review from htejun April 29, 2024 03:24

bunkbail mentioned this pull request Apr 29, 2024

[scx_lavd] Getting large pauses and stutters in-game #234

Closed

htejun approved these changes Apr 29, 2024

View reviewed changes

htejun merged commit 3e7ef35 into sched-ext:main Apr 29, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_lavd: replesih time slice at ops.running() only when necessary #250

scx_lavd: replesih time slice at ops.running() only when necessary #250

multics69 commented Apr 29, 2024

htejun Apr 29, 2024

multics69 May 1, 2024

htejun Apr 29, 2024

multics69 May 1, 2024

htejun commented Apr 29, 2024

scx_lavd: replesih time slice at ops.running() only when necessary #250

scx_lavd: replesih time slice at ops.running() only when necessary #250

Conversation

multics69 commented Apr 29, 2024

htejun Apr 29, 2024

Choose a reason for hiding this comment

multics69 May 1, 2024

Choose a reason for hiding this comment

htejun Apr 29, 2024

Choose a reason for hiding this comment

multics69 May 1, 2024

Choose a reason for hiding this comment

htejun commented Apr 29, 2024