Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_lavd: replesih time slice at ops.running() only when necessary #250

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

multics69
Copy link
Contributor

The current code replenishes the task's time slice whenever the task becomes ops.running(). However, there is a case where such behavior can starve the other tasks, causing the watchdog timeout error. One (if not all) such case is when a task is preempted while running by the higher scheduler class (e.g., RT, DL). In such a case, the task will be transit in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc. Whenever it becomes re-running, it will be placed at the head of local DSQ and ops.running() will renew its time slice. Hence, in the worst case, the task can run forever since its time slice is never exhausted. The fix is assigning the time slice only once by checking if the time slice is calculated before.

Suggested-by: Tejun Heo tj@kernel.org

The current code replenishes the task's time slice whenever the task
becomes ops.running(). However, there is a case where such behavior can
starve the other tasks, causing the watchdog timeout error. One (if not
all) such case is when a task is preempted while running by the higher
scheduler class (e.g., RT, DL). In such a case, the task will be transit
in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc.
Whenever it becomes re-running, it will be placed at the head of local
DSQ and ops.running() will renew its time slice. Hence, in the worst
case, the task can run forever since its time slice is never exhausted.
The fix is assigning the time slice only once by checking if the time
slice is calculated before.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
@@ -49,15 +49,16 @@ enum consts {
NSEC_PER_USEC = 1000ULL,
NSEC_PER_MSEC = (1000ULL * NSEC_PER_USEC),
LAVD_TIME_ONE_SEC = (1000ULL * NSEC_PER_MSEC),
LAVD_TIME_INFINITY_NS = 0xFFFFFFFFFFFFFFFFULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to sue SCX_SLICE_INF which is the same u64 max value. Note that if a running task has this slice value, the tick is stopped. I don't think lavd ever actually ends up running tasks with this value tho, so not really a concern but just something to note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank! I will update it accordingly.

* the task can run forever.
*/
return p->scx.slice == LAVD_SLICE_UNDECIDED ||
p->scx.slice == SCX_SLICE_DFL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the only time the kernel assigns SCX_SLICE_DFL is when the the currently running task is the only runnable task on the CPU. When the task's slice expires, the kernel sets slice to the default value and keeps running the task. This is a convenience feature which can be disabled by setting SCX_OPS_ENQ_LAST in ops.flags. When the flag is set, the task will always be enqueued when the slice expires whether it's the last runnable task on the CPU or not. When the last task is enqueued, ops.enqueue() is called with SCX_ENQ_LAST flag:

        /*                                                                                                                                                                                                             
         * The task being enqueued is the only task available for the cpu. By                                                                                                                                          
         * default, ext core keeps executing such tasks but when                                                                                                                                                       
         * %SCX_OPS_ENQ_LAST is specified, they're ops.enqueue()'d with the                                                                                                                                            
         * %SCX_ENQ_LAST flag set.                                                                                                                                                                                     
         *                                                                                                                                                                                                             
         * If the BPF scheduler wants to continue executing the task,                                                                                                                                                  
         * ops.enqueue() should dispatch the task to %SCX_DSQ_LOCAL immediately.                                                                                                                                       
         * If the task gets queued on a different dsq or the BPF side, the BPF                                                                                                                                         
         * scheduler is responsible for triggering a follow-up scheduling event.                                                                                                                                       
         * Otherwise, Execution may stall.                                                                                                                                                                             
         */                                                                                                                                                                                                            
        SCX_ENQ_LAST            = 1LLU << 41,                                                                                                                                                                          

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I will keep the code as it is but later when the preemption code per tick is ready, I will change it.
Thank you!

@htejun
Copy link
Contributor

htejun commented Apr 29, 2024

As we haven't cut releases for a while, imma land this now, bump versions and cut a new one.

@htejun htejun merged commit 3e7ef35 into sched-ext:main Apr 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants