Skip to content

Commit

Permalink
sched: add sched_dl documentation.
Browse files Browse the repository at this point in the history
Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
  • Loading branch information
dfaggioli authored and jlelli committed Apr 25, 2012
1 parent f04fd4a commit 02b4cc9
Showing 1 changed file with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,146 @@
Deadline Task and Group Scheduling
----------------------------------

CONTENTS
========

0. WARNING
1. Overview
2. Task scheduling
2. The Interface
3. Bandwidth management
3.1 System-wide settings
2.2 Task interface
2.4 Default behavior
3. Future plans


0. WARNING
==========

Fiddling with these settings can result in an unpredictable or even unstable
system behavior. As for -rt (group) scheduling, it is assumed that root users
know what they're doing.


1. Overview
===========

The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
basically an implementation of the Earliest Deadline First (EDF) scheduling
algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
that makes it possible to isolate the behavior of tasks between each other.


2. Task scheduling
==================

The typical -deadline task will be made up of a computation phase (instance)
which is activated on a periodic or sporadic fashion. The expected (maximum)
duration of such computation is called the task's runtime; the time interval
by which each instance needs to be completed is called the task's relative
deadline. The task's absolute deadline is dynamically calculated as the
time instant a task (better, an instance) activates plus the relative
deadline.

The EDF algorithm selects the task with the smallest absolute deadline as
the one to be executed first, while the CBS ensures that each task runs for
at most its runtime every period, avoiding any interference between different
tasks (bandwidth isolation).
Thanks to this feature, also tasks that do not strictly comply with the
computational model described above can effectively use the new policy.
IOW, there are no limitations on what kind of task can exploit this new
scheduling discipline, even if it must be said that it is particularly
suited for periodic or sporadic tasks that need guarantees on their
timing behavior, e.g., multimedia, streaming, control applications, etc.


3. Bandwidth management
=======================

In order for the -deadline scheduling to be effective and useful, it is
important to have some method to keep the allocation of the available CPU
bandwidth to the tasks under control.
This is usually called "admission control" and if it is not performed at all,
no guarantee can be given on the actual scheduling of the -deadline tasks.

Since when RT-throttling has been introduced each task group has a bandwidth
associated, calculated as a certain amount of runtime over a period.
Moreover, to make it possible to manipulate such bandwidth, readable/writable
controls have been added to both procfs (for system wide settings) and cgroupfs
(for per-group settings).
Therefore, the same interface is being used for controlling the bandwidth
distrubution to -deadline tasks and task groups, i.e., new controls but with
similar names, equivalent meaning and with the same usage paradigm are added.

However, more discussion is needed in order to figure out how we want to manage
SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
uses (for now) a less sophisticated, but actually very sensible, mechanism to
ensure that a certain utilization cap is not overcome per each root_domain.

Another main difference between deadline bandwidth management and RT-throttling
is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
and thus we don't need an higher level throttling mechanism to enforce the
desired bandwidth.

3.1 System wide settings
------------------------

The system wide settings are configured under the /proc virtual file system:

The per-group controls that are added to the cgroupfs virtual file system are:
* /proc/sys/kernel/sched_dl_runtime_us,
* /proc/sys/kernel/sched_dl_period_us,

They accept (if written) and provides (if read) the new runtime and period,
respectively, for each CPU in each root_domain.

This means that, for a root_domain comprising M CPUs, -deadline tasks
can be created until the sum of their bandwidths stay below:

M * (sched_dl_runtime_us / sched_dl_period_us)

It is also possible to disable this bandwidth management logic, and
be thus free of oversubscribing the system up to any arbitrary level.
This is done by writing -1 in /proc/sys/kernel/sched_dl_runtime_us.


2.2 Task interface
------------------

Specifying a periodic/sporadic task that executes for a given amount of
runtime at each instance, and that is scheduled according to the urgency of
its own timing constraints needs, in general, a way of declaring:
- a (maximum/typical) instance execution time,
- a minimum interval between consecutive instances,
- a time constraint by which each instance must be completed.

Therefore:
* a new struct sched_param2, containing all the necessary fields is
provided;
* the new scheduling related syscalls that manipulate it, i.e.,
sched_setscheduler2(), sched_setparam2() and sched_getparam2()
are implemented.


2.4 Default behavior
---------------------

The default values for SCHED_DEADLINE bandwidth is to have dl_runtime and
dl_period equal to 500000 and 1000000, respectively. This means -deadline
tasks can use at most 5%, multiplied by the number of CPUs that compose the
root_domain, for each root_domain.

When a -deadline task fork a child, its dl_runtime is set to 0, which means
someone must call sched_setscheduler2() on it, or it won't even start.


3. Future plans
===============

Still Missing:

- refinements to deadline inheritance, especially regarding the possibility
of retaining bandwidth isolation among non-interacting tasks. This is
being studied from both theoretical and practical point of views, and
hopefully we can have some demonstrative code soon.

0 comments on commit 02b4cc9

Please sign in to comment.