Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V6.2 timerslack+cgroups #1

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

randombtree
Copy link
Owner

Make timer slack useful

Hrtimer background and problems

Linux has had a concept of "timer slack", but in it's current implementation it only means delaying a timer by a slack time (current default 50 us) . If a process want's to behave nice and set a larger slack, e.g. 1 second, every timer in that process will effectively be delayed by 1 second. This is partly because the hrtimers are sorted by the hard timeout (time + slack) and it would be expensive to peek through all timers to find timers that have soft-expired (i.e. time < now).

This obviously leads to timer slack having no positive effect on power savings as hrtimers can't be expired before the hard timeout and as such always result in a timer interrupt.

Solution

The solution is to augment the rbtree used to keep hrtimers stored; the timers will be stored in soft-expiry order (i.e. the time without the slack) and the slack will propagate through the augmented rbtree, giving us a chance to figure out the lowest hard timeout (i.e. time + slack) in the tree. This lowest hard timeout is used to program the timer hardware, but now we can opportunistically execute timers that have lower soft-timeouts reducing timer interrupts.

And another thing (possibly split it out to own PR)...

As timer slack becomes useful, changing the global default timer slack can give some power savings. The other part
adds cgroup support for setting per-cg timer slack. The timer slack is inherited from parent cgroups and can be changed at any point to only affect parts of the cgroup hierarchy.

The rb_add_augmented* functions, like the equivalents in rbtree.h remove a bit
of the necessary boilerplate code when implementing augmented rbtrees.

The addition also affects the augmented callbacks as an insert callback has
to be added, slightly changing the augmented rbtree API.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
Augmented rbtrees can be used for e.g. specifying timeout ranges.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
Previously, hrtimers mostly expire at timeout + slack, as the rbtree is sorted
on that value. Now, keep the hrtimer rbtree sorted on the "soft" expiry time,
i.e. without the slack. The optimal timeout value for the rbtree is kept  as an
augmented value, thus allowing an idle system to still wait for a timer
up until timeout + slack.

This patch will make the timer slack (at large values) more useful as timer
timeouts can truly be merged to happen at the same timer interrupt.

This work is based on patches from Venkatesh Pallipadi, albeit heavily modified.

Originally-by: Venkatesh Pallipadi https://lkml.org/lkml/2011/9/23/261

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
This patch doesn't introduce any behavioural changes, but is a
preparatory patch for a dynamic timer slack.

Conversion mostly done by Coccinelle (and some by hand):

@ replace_ts @
expression F;
expression list EL1, EL2;
struct task_struct *T;
symbol current;
@@
(
-F(EL1, T->timer_slack_ns, EL2)
+F(EL1, get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns)
+F(get_task_timer_slack_ns(T))
|
-F = T->timer_slack_ns
+F = get_task_timer_slack_ns(T)
|
-F(EL1, current->timer_slack_ns, EL2)
+F(EL1, get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns)
+F(get_task_timer_slack_ns(current))
|
-F = current->timer_slack_ns
+F = get_task_timer_slack_ns(current)
)

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
This patch shouldn't change the behaviour of timer slack at all, but is a
preparatory patch for cgroup-based timer slack.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
@randombtree randombtree marked this pull request as draft April 6, 2023 13:31
…eout.

The softirq_expires_next is the least hard timeout value (timeout + slack) for
the base, but there can be timers where timeout (sans slack) < now. As the
timers are now sorted in softexpires order, we get the next timer cheaply and
might as well run it if it's available, possibly avoiding a wakeup from idle
later.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
…re idle.

With hrtimer storing timers in a soft-expires order, it's cheap to look ahead
if there are soft-expired timers that could be run before idling the CPU. This
COULD result in power saving when using large-enough timer slack values in user
space.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
css_filter_for_each_descendant_pre behaves like its unfiltered sibling, except
that a filter function is applied on each node. If the filter returns false
for a CSS node, the node and its descendants will be left out from the
iterator.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
Cgroups can now have different timer slack values (cgroup.timer_slack_ns). The
timer slack is inherited down to the descendant cgroups, that can override the
inherited value for their own subtree of descendants, if necessary. A process
that hasn't changed its timer slack value through the appropriate prctl, will
the cgroup provided one which can be either shorter or longer than the default
50 us timer slack previously used in Linux. The 50 us timer slack will still
remain as the default timer slack if the cgroup values are left untouched.

Example inheritance in a cg-hierarchy that has a new timer slack set as R on
the root cgroup, in addition to N and O set in the corresponding descendants:

       {R,s}
      /     \
   {N,r}   {_,R}
    /        \
  {_,N}     {O,r}
    |         |
  {_,N}     {_,O}

where {X,y} denotes the timer slack (X) and the inherited slack (y). The
effective timer slack is in upper case, e.g. {_,Y} means the default inherited
timer slack (Y) is used. Underscore (_) denotes a default timer slack, in which
case the inherited timer slack is used.

Signed-off-by: Roger Blomgren <roger.blomgren@iki.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant