Park unused CPU threads after suspend-to-RAM

This commit introduces nine commits that let Xen properly "park" unused CPU threads when resuming from suspend-to-RAM state. Without this commit and with "smt=0" on Xen's command line, my laptop's CPU consumes more power after resuming from a suspend-to-RAM operation, as measured with powertop 2.13 in dom0, compared to the case with this commit. Below are the averages of power measurements before/after suspend-to-RAM (S2R) operations without and with this patch, obtained with the command "sudo powertop --time=30 --csv=log.csv" invoked before and after the S2R operation. In particular, powertop's estimation of the "system baseline power" was used as the power consumption metric. Please note that the CPU cores were forced to run at their lowest frequencies with "xenpm" before and after the suspend (as the setting appears to be lost during the suspend operation), and cron was stopped after every boot-up, to ensure valid comparisons. * Without this commit, after boot-up, but before S2R: 874 mW (average of 24 measurements in 8 experiment iterations, where each iteration resulted in 3 measurements) * Without this commit, after resuming from S2R: 2.26 W (average of 24 measurements in 8 experiment iterations, with 3 measurements per iteration) * With this commit, after boot-up, but before S2R: 826 mW (average of 27 measurements in 9 experiment iterations, with 3 measurements per iteration) * With this commit, after resuming from S2R: 1.16 W (average of 27 measurements in 9 experiment iterations, with 3 measurements per iteration) As can be observed from the results, the reduction of power consumption after resuming from S2R is significant when the cases without and with this commit are compared: 2.26 W vs. 1.16 W. Please note that I have not been able to explain why there is still a difference before and after the S2R operation with this commit, albeit a smaller one: 826 mW vs. 1.16 W. Also note that, of the 9 new patches introduced by this commit, the last patch is not cherry-picked from upstream, but it is rather a new patch that resolves intermittent segmentation faults encountered by systemd-sleep in dom0 during the resume from S2R. For further details about the issue resolved by the 9th patch, please see its commit message.
m-v-b · Jul 11, 2020 · 9678563 · 9678563
1 parent 46a8496
commit 9678563
Show file tree

Hide file tree

Showing 10 changed files with 914 additions and 0 deletions.
diff --git a/patch-0001-xen-sched-call-cpu_disable_scheduler-via-cpu-notifie.patch b/patch-0001-xen-sched-call-cpu_disable_scheduler-via-cpu-notifie.patch
@@ -0,0 +1,125 @@
+From 07ce5bd2e892dfd2d6931d67caac714cf0c23e77 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 2 Apr 2019 18:19:05 +0200
+Subject: [PATCH 1/9] xen/sched: call cpu_disable_scheduler() via cpu notifier
+
+cpu_disable_scheduler() is being called from __cpu_disable() today.
+There is no need to execute it on the cpu just being disabled, so use
+the CPU_DEAD case of the cpu notifier chain. Moving the call out of
+stop_machine() context is fine, as we just need to hold the domain RCU
+lock and need the scheduler percpu data to be still allocated.
+
+Add another hook for CPU_DOWN_PREPARE to bail out early in case
+cpu_disable_scheduler() would fail. This will avoid crashes in rare
+cases for cpu hotplug or suspend.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <julien.grall@arm.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 9b28696502d400832c012c3b3fecfebee6c75648)
+---
+ xen/arch/arm/smpboot.c |  2 --
+ xen/arch/x86/smpboot.c |  3 ---
+ xen/common/schedule.c  | 36 +++++++++++++++++++++++++++++-------
+ 3 files changed, 29 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
+index 32e87221c096..92980bda1f27 100644
+--- a/xen/arch/arm/smpboot.c
++++ b/xen/arch/arm/smpboot.c
+@@ -341,8 +341,6 @@ void __cpu_disable(void)
+     /* It's now safe to remove this processor from the online map */
+     cpumask_clear_cpu(cpu, &cpu_online_map);
+
+-    if ( cpu_disable_scheduler(cpu) )
+-        BUG();
+     smp_mb();
+
+     /* Return to caller; eventually the IPI mechanism will unwind and the 
+diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
+index ef7858fdeebc..afac724d1d33 100644
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -1211,9 +1211,6 @@ void __cpu_disable(void)
+     cpumask_clear_cpu(cpu, &cpu_online_map);
+     fixup_irqs(&cpu_online_map, 1);
+     fixup_eoi();
+-
+-    if ( cpu_disable_scheduler(cpu) )
+-        BUG();
+ }
+
+ void __cpu_die(unsigned int cpu)
+diff --git a/xen/common/schedule.c b/xen/common/schedule.c
+index 6069b3278cc2..e92faa40959b 100644
+--- a/xen/common/schedule.c
++++ b/xen/common/schedule.c
+@@ -745,8 +745,9 @@ void restore_vcpu_affinity(struct domain *d)
+ }
+
+ /*
+- * This function is used by cpu_hotplug code from stop_machine context
++ * This function is used by cpu_hotplug code via cpu notifier chain
+  * and from cpupools to switch schedulers on a cpu.
++ * Caller must get domlist_read_lock.
+  */
+ int cpu_disable_scheduler(unsigned int cpu)
+ {
+@@ -761,12 +762,6 @@ int cpu_disable_scheduler(unsigned int cpu)
+     if ( c == NULL )
+         return ret;
+
+-    /*
+-     * We'd need the domain RCU lock, but:
+-     *  - when we are called from cpupool code, it's acquired there already;
+-     *  - when we are called for CPU teardown, we're in stop-machine context,
+-     *    so that's not be a problem.
+-     */
+     for_each_domain_in_cpupool ( d, c )
+     {
+         for_each_vcpu ( d, v )
+@@ -865,6 +860,24 @@ int cpu_disable_scheduler(unsigned int cpu)
+     return ret;
+ }
+
++static int cpu_disable_scheduler_check(unsigned int cpu)
++{
++    struct domain *d;
++    struct vcpu *v;
++    struct cpupool *c;
++
++    c = per_cpu(cpupool, cpu);
++    if ( c == NULL )
++        return 0;
++
++    for_each_domain_in_cpupool ( d, c )
++        for_each_vcpu ( d, v )
++            if ( v->affinity_broken )
++                return -EADDRINUSE;
++
++    return 0;
++}
++
+ static int vcpu_set_affinity(
+     struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
+ {
+@@ -1676,7 +1689,16 @@ static int cpu_schedule_callback(
+     case CPU_UP_PREPARE:
+         rc = cpu_schedule_up(cpu);
+         break;
++    case CPU_DOWN_PREPARE:
++        rcu_read_lock(&domlist_read_lock);
++        rc = cpu_disable_scheduler_check(cpu);
++        rcu_read_unlock(&domlist_read_lock);
++        break;
+     case CPU_DEAD:
++        rcu_read_lock(&domlist_read_lock);
++        rc = cpu_disable_scheduler(cpu);
++        BUG_ON(rc);
++        rcu_read_unlock(&domlist_read_lock);
+         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
+         /* Fallthrough */
+     case CPU_UP_CANCELED:
+-- 
+2.26.2
+
diff --git a/patch-0002-xen-add-helper-for-calling-notifier_call_chain-to-co.patch b/patch-0002-xen-add-helper-for-calling-notifier_call_chain-to-co.patch
@@ -0,0 +1,155 @@
+From 3816d82fba6c67b599c970dfe7bc8e024b7e2042 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 2 Apr 2019 07:34:53 +0200
+Subject: [PATCH 2/9] xen: add helper for calling notifier_call_chain() to
+ common/cpu.c
+
+Add a helper cpu_notifier_call_chain() to call notifier_call_chain()
+for a cpu with a specified action, returning an errno value.
+
+This avoids coding the same pattern multiple times.
+
+While at it avoid side effects from using BUG_ON() by not using
+cpu_online(cpu) as a parameter.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 8d88eacb3b390984e2c483d75d8f40b3ec4be67c)
+---
+ xen/common/cpu.c | 56 ++++++++++++++++++++++--------------------------
+ 1 file changed, 26 insertions(+), 30 deletions(-)
+
+diff --git a/xen/common/cpu.c b/xen/common/cpu.c
+index ca415babf8d2..ccf62b5e013e 100644
+--- a/xen/common/cpu.c
++++ b/xen/common/cpu.c
+@@ -68,11 +68,21 @@ void __init register_cpu_notifier(struct notifier_block *nb)
+     spin_unlock(&cpu_add_remove_lock);
+ }
+
++static int cpu_notifier_call_chain(unsigned int cpu, unsigned long action,
++                                   struct notifier_block **nb, bool nofail)
++{
++    void *hcpu = (void *)(long)cpu;
++    int notifier_rc = notifier_call_chain(&cpu_chain, action, hcpu, nb);
++    int ret = (notifier_rc == NOTIFY_DONE) ? 0 : notifier_to_errno(notifier_rc);
++
++    BUG_ON(ret && nofail);
++
++    return ret;
++}
++
+ static void _take_cpu_down(void *unused)
+ {
+-    void *hcpu = (void *)(long)smp_processor_id();
+-    int notifier_rc = notifier_call_chain(&cpu_chain, CPU_DYING, hcpu, NULL);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(smp_processor_id(), CPU_DYING, NULL, true);
+     __cpu_disable();
+ }
+
+@@ -84,8 +94,7 @@ static int take_cpu_down(void *arg)
+
+ int cpu_down(unsigned int cpu)
+ {
+-    int err, notifier_rc;
+-    void *hcpu = (void *)(long)cpu;
++    int err;
+     struct notifier_block *nb = NULL;
+
+     if ( !cpu_hotplug_begin() )
+@@ -97,12 +106,9 @@ int cpu_down(unsigned int cpu)
+         return -EINVAL;
+     }
+
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE, hcpu, &nb);
+-    if ( notifier_rc != NOTIFY_DONE )
+-    {
+-        err = notifier_to_errno(notifier_rc);
++    err = cpu_notifier_call_chain(cpu, CPU_DOWN_PREPARE, &nb, false);
++    if ( err )
+         goto fail;
+-    }
+
+     if ( unlikely(system_state < SYS_STATE_active) )
+         on_selected_cpus(cpumask_of(cpu), _take_cpu_down, NULL, true);
+@@ -110,26 +116,24 @@ int cpu_down(unsigned int cpu)
+         goto fail;
+
+     __cpu_die(cpu);
+-    BUG_ON(cpu_online(cpu));
++    err = cpu_online(cpu);
++    BUG_ON(err);
+
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DEAD, hcpu, NULL);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true);
+
+     send_global_virq(VIRQ_PCPU_STATE);
+     cpu_hotplug_done();
+     return 0;
+
+  fail:
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, hcpu, &nb);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(cpu, CPU_DOWN_FAILED, &nb, true);
+     cpu_hotplug_done();
+     return err;
+ }
+
+ int cpu_up(unsigned int cpu)
+ {
+-    int notifier_rc, err = 0;
+-    void *hcpu = (void *)(long)cpu;
++    int err;
+     struct notifier_block *nb = NULL;
+
+     if ( !cpu_hotplug_begin() )
+@@ -141,19 +145,15 @@ int cpu_up(unsigned int cpu)
+         return -EINVAL;
+     }
+
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu, &nb);
+-    if ( notifier_rc != NOTIFY_DONE )
+-    {
+-        err = notifier_to_errno(notifier_rc);
++    err = cpu_notifier_call_chain(cpu, CPU_UP_PREPARE, &nb, false);
++    if ( err )
+         goto fail;
+-    }
+
+     err = __cpu_up(cpu);
+     if ( err < 0 )
+         goto fail;
+
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu, NULL);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true);
+
+     send_global_virq(VIRQ_PCPU_STATE);
+
+@@ -161,18 +161,14 @@ int cpu_up(unsigned int cpu)
+     return 0;
+
+  fail:
+-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_CANCELED, hcpu, &nb);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(cpu, CPU_UP_CANCELED, &nb, true);
+     cpu_hotplug_done();
+     return err;
+ }
+
+ void notify_cpu_starting(unsigned int cpu)
+ {
+-    void *hcpu = (void *)(long)cpu;
+-    int notifier_rc = notifier_call_chain(
+-        &cpu_chain, CPU_STARTING, hcpu, NULL);
+-    BUG_ON(notifier_rc != NOTIFY_DONE);
++    cpu_notifier_call_chain(cpu, CPU_STARTING, NULL, true);
+ }
+
+ static cpumask_t frozen_cpus;
+-- 
+2.26.2
+
diff --git a/patch-0003-xen-add-new-cpu-notifier-action-CPU_RESUME_FAILED.patch b/patch-0003-xen-add-new-cpu-notifier-action-CPU_RESUME_FAILED.patch
@@ -0,0 +1,95 @@
+From b14779c4ada96c52d0c1964fcf6ef20416012478 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 2 Apr 2019 07:34:54 +0200
+Subject: [PATCH 3/9] xen: add new cpu notifier action CPU_RESUME_FAILED
+
+Add a new cpu notifier action CPU_RESUME_FAILED which is called for all
+cpus which failed to come up at resume. The calls will be done after
+all other cpus are already up in order to know which resources are
+available then.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@citrix.com>
+(cherry picked from commit 51c79e943fb3f9a746181f8b8415cf2baa5d26bd)
+---
+ xen/common/cpu.c      |  5 +++++
+ xen/include/xen/cpu.h | 29 ++++++++++++++++++-----------
+ 2 files changed, 23 insertions(+), 11 deletions(-)
+
+diff --git a/xen/common/cpu.c b/xen/common/cpu.c
+index ccf62b5e013e..8dca74cbb0bd 100644
+--- a/xen/common/cpu.c
++++ b/xen/common/cpu.c
+@@ -215,7 +215,12 @@ void enable_nonboot_cpus(void)
+             printk("Error bringing CPU%d up: %d\n", cpu, error);
+             BUG_ON(error == -EBUSY);
+         }
++        else
++            __cpumask_clear_cpu(cpu, &frozen_cpus);
+     }
+
++    for_each_cpu ( cpu, &frozen_cpus )
++        cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL, true);
++
+     cpumask_clear(&frozen_cpus);
+ }
+diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
+index 2fe3ec05d851..4638c509e271 100644
+--- a/xen/include/xen/cpu.h
++++ b/xen/include/xen/cpu.h
+@@ -22,33 +22,40 @@ void register_cpu_notifier(struct notifier_block *nb);
+  *  CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up
+  *  CPU_DOWN_PREPARE -> CPU_DOWN_FAILED          -- failed CPU down
+  *  CPU_DOWN_PREPARE -> CPU_DYING -> CPU_DEAD    -- successful CPU down
+- * 
++ * in the resume case we have additionally:
++ *  CPU_UP_PREPARE -> CPU_UP_CANCELLED -> CPU_RESUME_FAILED -- CPU not resumed
++ *  with the CPU_RESUME_FAILED handler called only after all CPUs have been
++ *  tried to put online again in order to know which CPUs did restart
++ *  successfully.
++ *
+  * Hence note that only CPU_*_PREPARE handlers are allowed to fail. Also note
+  * that once CPU_DYING is delivered, an offline action can no longer fail.
+- * 
++ *
+  * Notifiers are called highest-priority-first when:
+  *  (a) A CPU is coming up; or (b) CPU_DOWN_FAILED
+  * Notifiers are called lowest-priority-first when:
+  *  (a) A CPU is going down; or (b) CPU_UP_CANCELED
+  */
+ /* CPU_UP_PREPARE: Preparing to bring CPU online. */
+-#define CPU_UP_PREPARE   (0x0001 | NOTIFY_FORWARD)
++#define CPU_UP_PREPARE    (0x0001 | NOTIFY_FORWARD)
+ /* CPU_UP_CANCELED: CPU is no longer being brought online. */
+-#define CPU_UP_CANCELED  (0x0002 | NOTIFY_REVERSE)
++#define CPU_UP_CANCELED   (0x0002 | NOTIFY_REVERSE)
+ /* CPU_STARTING: CPU nearly online. Runs on new CPU, irqs still disabled. */
+-#define CPU_STARTING     (0x0003 | NOTIFY_FORWARD)
++#define CPU_STARTING      (0x0003 | NOTIFY_FORWARD)
+ /* CPU_ONLINE: CPU is up. */
+-#define CPU_ONLINE       (0x0004 | NOTIFY_FORWARD)
++#define CPU_ONLINE        (0x0004 | NOTIFY_FORWARD)
+ /* CPU_DOWN_PREPARE: CPU is going down. */
+-#define CPU_DOWN_PREPARE (0x0005 | NOTIFY_REVERSE)
++#define CPU_DOWN_PREPARE  (0x0005 | NOTIFY_REVERSE)
+ /* CPU_DOWN_FAILED: CPU is no longer going down. */
+-#define CPU_DOWN_FAILED  (0x0006 | NOTIFY_FORWARD)
++#define CPU_DOWN_FAILED   (0x0006 | NOTIFY_FORWARD)
+ /* CPU_DYING: CPU is nearly dead (in stop_machine context). */
+-#define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
++#define CPU_DYING         (0x0007 | NOTIFY_REVERSE)
+ /* CPU_DEAD: CPU is dead. */
+-#define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
++#define CPU_DEAD          (0x0008 | NOTIFY_REVERSE)
+ /* CPU_REMOVE: CPU was removed. */
+-#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
++#define CPU_REMOVE        (0x0009 | NOTIFY_REVERSE)
++/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */
++#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE)
+
+ /* Perform CPU hotplug. May return -EAGAIN. */
+ int cpu_down(unsigned int cpu);
+-- 
+2.26.2
+