Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_lavd: overhaul the virtual deadline algorithm #443

Merged
merged 21 commits into from
Jul 21, 2024

Conversation

multics69
Copy link
Contributor

This PR contains the major overhaul of the virtual deadline algorithm and relevant code cleanup.

  • Instead of stretching the time-space upon enqueue, we now advance the current virtual clock reverse proportionally to the system load. Under overload, the clock goes slower, so latency-critical tasks have more chance to cut in the timeline.

  • Instead of estimating service time (vruntime in CFS) from runtime and run frequency, we now directly measure the service time for eligibility enforcement.

  • We drop the runtime factor in calculating latency criticality since it is already considered in calculating the task's virtual deadline. Instead, we additionally consider the task's starvation factor (i.e., how much a task starved from the average service time) in calculating the latency criticality. By incorporating the starvation factor, we can systematically avoid the watchdog time-out error from the scx framework.

  • Instead of inheriting the parent's properties for a forked task, a forked task will be treated as a greedy task until the scheduler knows its true properties. This helps to avoid stalling under a fork bomb.

  • After the overhaul, we cleaned up many unnecessary codes and optimizations. Notably, we dropped the sched_prio_to_slice_weight[] table and directly used p->scx.weight.

  • Note that we first tried to maintain ineligible runnable tasks separately. However, we later removed this because it became unnecessary after the overhaul.

Changwoo Min added 20 commits July 16, 2024 23:48
Estimating the service time from run time and frequency is not
incorrect. However, it reacts slowly to sudden changes since it relies
on the moving average. Hence, we directly measure the service time to
enforce fairness.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
This is a prep to add a global ineligible dsq.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
This is a prep for adding an ineligible DSQ.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
We now maintain two run queues—an eligible run queue (DSQ) and an
ineligible run queue (rbtree)—sorted by the task's virtual deadline.
When the eligible run queue is empty, or the ineligible run queue has
not been consumed for too long (e.g., 15 msec), a task in the ineligible
run queue is moved to the eligible run queue for execution. With these
two queues, we have a better admission control.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Advancing the clock slower when overloaded gives more opportunities for
latency-critical tasks to cut in the run queue. Controlling the clock
better reflects the actual load than the prior approach of stretching
the time-space when overloaded.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
If inheriting the parent's properties, a new fork task tends to be too
prioritized. That is, many parent processes, such as `make,` are a bit
more latency-critical than average.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
That is okay since the runtime is considered in calculating a virtual
deadline. A shorter runtime will result in a tighter deadline linearly.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Use p->scx.weight instead.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
In theory, sys_load_factor should not be necessary since we do not
stretch the time space anymore.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
These are no longer necessary after remnoving load factor calculation.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
These are no longer necessary after directly using latency criticality.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
LAVD_VDL_LOOSENESS_FT represents how loose the deadline is. The smaller
value means the deadline is tighter. While it is unlikely to be tuned,
let's keep it as a tunable for now.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Further depenalize above-average latency-critical tasks and penalize
further below-avergage latency-critical tasks in ineligibility duration.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
With all the other optimizations and tunings, it turns out that maintaining
two runqueues has more harm than good.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
taskc_run = try_get_task_ctx(p_run);
if (taskc_run && p_run->scx.slice != 0)
try_yield_current_cpu(p_run, cpuc, taskc_run);
t = bpf_obj_new(typeof(*t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to pre-allocate this node on task_init() and keep it on taskc so that enqueue path doesn't have to do dynamic allocations but at the same time bpf_obj_new() might be cheap enough for this to not matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, this gets removed later.


/*
* Advance the clock up to the task's deadline. When overloaded,
* advnace the clock slower so other can jump in the run queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: advnace

*/
ratio = (LAVD_LC_STARVATION_FT * stat_cur->avg_svc_time) /
taskc->svc_time;
return ratio + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't starvation avoidance be part of the virtual timeline management rather than implemented through boosting interactivity? Is this an artifact of eligible and ineligible timelines being managed separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that gets removed later. I'm curious why deadline in itself isn't sufficient for starvation avoidance.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
@multics69
Copy link
Contributor Author

Thanks @htejun for the review. I will merge it to the main.

@multics69 multics69 merged commit af75d14 into sched-ext:main Jul 21, 2024
1 check passed
@multics69 multics69 deleted the lavd-vtime branch July 21, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants