Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: Add Native/Paravirt qspinlock support #420

Closed
wants to merge 15 commits into from

Conversation

bjoto
Copy link

@bjoto bjoto commented Dec 25, 2023

Pull request for series with
subject: riscv: Add Native/Paravirt qspinlock support
version: 12
url: https://patchwork.kernel.org/project/linux-riscv/list/?series=812817

@bjoto
Copy link
Author

bjoto commented Dec 25, 2023

Upstream branch: f352a28
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=812817
version: 12

The arch_spinlock_t of qspinlock has contained the atomic_t val, which
satisfies the ticket-lock requirement. Thus, unify the arch_spinlock_t
into qspinlock_types.h. This is the preparation for the next combo
spinlock.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Add a separate ticket-lock.h to include multiple spinlock versions and
select one at compile time or runtime.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Move errata vendor func-id definitions from errata_list into
vendorid_list.h. Unifying these definitions is also for following
rwonce errata implementation.

Suggested-by: Leonardo Bras <leobras@redhat.com>
Link: https://lore.kernel.org/linux-riscv/ZQLFJ1cmQ8PAoMHm@redhat.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
The early version of T-Head C9xx cores has a store merge buffer
delay problem. The store merge buffer could improve the store queue
performance by merging multi-store requests, but when there are not
continued store requests, the prior single store request would be
waiting in the store queue for a long time. That would cause
significant problems for communication between multi-cores. This
problem was found on sg2042 & th1520 platforms with the qspinlock
lock torture test.

So appending a fence w.o could immediately flush the store merge
buffer and let other cores see the write result.

This will apply the WRITE_ONCE errata to handle the non-standard
behavior via appending a fence w.o instruction for WRITE_ONCE().

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
The requirements of qspinlock have been documented by commit:
a8ad07e ("asm-generic: qspinlock: Indicate the use of mixed-size
atomics").

Although RISC-V ISA gives out a weaker forward guarantee LR/SC, which
doesn't satisfy the requirements of qspinlock above, it won't prevent
some riscv vendors from implementing a strong fwd guarantee LR/SC in
microarchitecture to match xchg_tail requirement. T-HEAD C9xx processor
is the one.

We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
test on Fedora & Ubuntu & OpenEuler ... Here is the performance
comparison between qspinlock and ticket_lock on sg2042 (64 cores):

sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
  queued_spinlock 0.5109/0.00
  ticket_spinlock 0.5814/0.00

perf futex/hash (+6.7%):
  queued_spinlock 1444393 operations/sec (+- 0.09%)
  ticket_spinlock 1353215 operations/sec (+- 0.15%)

perf futex/wake-parallel (+8.6%):
  queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
  ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)

perf futex/requeue (+4.2%):
  queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
  ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)

System Benchmarks (+6.4%)
  queued_spinlock:
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0  628613745.4  53865.8
    Double-Precision Whetstone                       55.0     182422.8  33167.8
    Execl Throughput                                 43.0      13116.6   3050.4
    File Copy 1024 bufsize 2000 maxblocks          3960.0    7762306.2  19601.8
    File Copy 256 bufsize 500 maxblocks            1655.0    3417556.8  20649.9
    File Copy 4096 bufsize 8000 maxblocks          5800.0    7427995.7  12806.9
    Pipe Throughput                               12440.0   23058600.5  18535.9
    Pipe-based Context Switching                   4000.0    2835617.7   7089.0
    Process Creation                                126.0      12537.3    995.0
    Shell Scripts (1 concurrent)                     42.4      57057.4  13456.9
    Shell Scripts (8 concurrent)                      6.0       7367.1  12278.5
    System Call Overhead                          15000.0   33308301.3  22205.5
                                                                       ========
    System Benchmarks Index Score                                       12426.1

  ticket_spinlock:
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0  626541701.9  53688.2
    Double-Precision Whetstone                       55.0     181921.0  33076.5
    Execl Throughput                                 43.0      12625.1   2936.1
    File Copy 1024 bufsize 2000 maxblocks          3960.0    6553792.9  16550.0
    File Copy 256 bufsize 500 maxblocks            1655.0    3189231.6  19270.3
    File Copy 4096 bufsize 8000 maxblocks          5800.0    7221277.0  12450.5
    Pipe Throughput                               12440.0   20594018.7  16554.7
    Pipe-based Context Switching                   4000.0    2571117.7   6427.8
    Process Creation                                126.0      10798.4    857.0
    Shell Scripts (1 concurrent)                     42.4      57227.5  13497.1
    Shell Scripts (8 concurrent)                      6.0       7329.2  12215.3
    System Call Overhead                          15000.0   30766778.4  20511.2
                                                                       ========
    System Benchmarks Index Score                                       11670.7

The qspinlock has a significant improvement on SOPHGO SG2042 64
cores platform than the ticket_lock.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Combo spinlock could support queued and ticket in one Linux Image and
select them during boot time via command line. Here is the func
size (Bytes) comparison table below:

TYPE			: COMBO | TICKET | QUEUED
arch_spin_lock		: 106	| 60     | 50
arch_spin_unlock	: 54    | 36     | 26
arch_spin_trylock	: 110   | 72     | 54
arch_spin_is_locked	: 48    | 34     | 20
arch_spin_is_contended	: 56    | 40     | 24
rch_spin_value_unlocked	: 48    | 34     | 24

One example of disassemble combo arch_spin_unlock:
  <+14>:    nop                # detour slot
  <+18>:    fence   rw,w       --+-> queued_spin_unlock
  <+22>:    sb      zero,0(a4) --+   (2 instructions)
  <+26>:    ld      s0,8(sp)
  <+28>:    addi    sp,sp,16
  <+30>:    ret
  <+32>:    lw      a5,0(a4)   --+-> ticket_spin_unlock
  <+34>:    sext.w  a5,a5        |   (7 instructions)
  <+36>:    fence   rw,w         |
  <+40>:    addiw   a5,a5,1      |
  <+42>:    slli    a5,a5,0x30   |
  <+44>:    srli    a5,a5,0x30   |
  <+46>:    sh      a5,0(a4)   --+
  <+50>:    ld      s0,8(sp)
  <+52>:    addi    sp,sp,16
  <+54>:    ret
The qspinlock is smaller and faster than ticket-lock when all are in a
fast path.

The combo spinlock could provide a compatible Linux Image for different
micro-arch designs that have/haven't forward progress guarantee. Use
command line options to select between qspinlock and ticket-lock, and
the default is ticket-lock.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Add a static key controlling whether virt_spin_lock() should be
called or not. When running on bare metal set the new key to
false.

The VM guests should fall back to a Test-and-Set spinlock,
because fair locks have horrible lock 'holder' preemption issues.
The virt_spin_lock_key would shortcut for the queued_spin_lock_-
slowpath() function that allow virt_spin_lock to hijack it.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Force to enable virt_spin_lock when KVM guest, because fair locks
have horrible lock 'holder' preemption issues.

Suggested-by: Leonardo Bras <leobras@redhat.com>
Link: https://lkml.kernel.org/kvm/ZQK9-tn2MepXlY1u@redhat.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Add the files functions needed to support the SBI PVLOCK (paravirt
qspinlock kick_cpu) extension. Implement kvm_sbi_ext_pvlock_kick_-
cpu(), and we only need to call the kvm_vcpu_kick() and bring
target_vcpu from the halt state. No irq raised, no other request,
just a pure vcpu_kick.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Using static_call to switch between:
  native_queued_spin_lock_slowpath()    __pv_queued_spin_lock_slowpath()
  native_queued_spin_unlock()           __pv_queued_spin_unlock()

Finish the pv_wait implementation, but pv_kick needs the SBI
definition of the next patches.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Implement pv_kick with SBI guest implementation, and add
SBI_EXT_PVLOCK extension detection. The backend part is
in the KVM pvqspinlock patch.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Disables the qspinlock slow path using PV optimizations which
allow the hypervisor to 'idle' the guest on lock contention.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Add kconfig entry for paravirt_spinlock, an unfair qspinlock
virtualization-friendly backend, by halting the virtual CPU rather
than spinning.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Add trace point for pv_kick&wait, here is the output:

ls /sys/kernel/debug/tracing/events/paravirt/
 enable   filter   pv_kick  pv_wait

cat /sys/kernel/debug/tracing/trace
 entries-in-buffer/entries-written: 33927/33927   #P:12

                                _-----=> irqs-off/BH-disabled
                               / _----=> need-resched
                              | / _---=> hardirq/softirq
                              || / _--=> preempt-depth
                              ||| / _-=> migrate-disable
                              |||| /     delay
           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
              | |         |   |||||     |         |
             sh-100     [001] d..2.    28.312294: pv_wait: cpu 1 out of wfi
         <idle>-0       [000] d.h4.    28.322030: pv_kick: cpu 0 kick target cpu 1
             sh-100     [001] d..2.    30.982631: pv_wait: cpu 1 out of wfi
         <idle>-0       [000] d.h4.    30.993289: pv_kick: cpu 0 kick target cpu 1
             sh-100     [002] d..2.    44.987573: pv_wait: cpu 2 out of wfi
         <idle>-0       [000] d.h4.    44.989000: pv_kick: cpu 0 kick target cpu 2
         <idle>-0       [003] d.s3.    51.593978: pv_kick: cpu 3 kick target cpu 4
      rcu_sched-15      [004] d..2.    51.595192: pv_wait: cpu 4 out of wfi
lock_torture_wr-115     [004] ...2.    52.656482: pv_kick: cpu 4 kick target cpu 2
lock_torture_wr-113     [002] d..2.    52.659146: pv_wait: cpu 2 out of wfi
lock_torture_wr-114     [008] d..2.    52.659507: pv_wait: cpu 8 out of wfi
lock_torture_wr-114     [008] d..2.    52.663503: pv_wait: cpu 8 out of wfi
lock_torture_wr-113     [002] ...2.    52.666128: pv_kick: cpu 2 kick target cpu 8
lock_torture_wr-114     [008] d..2.    52.667261: pv_wait: cpu 8 out of wfi
lock_torture_wr-114     [009] .n.2.    53.141515: pv_kick: cpu 9 kick target cpu 11
lock_torture_wr-113     [002] d..2.    53.143339: pv_wait: cpu 2 out of wfi
lock_torture_wr-116     [007] d..2.    53.143412: pv_wait: cpu 7 out of wfi
lock_torture_wr-118     [000] d..2.    53.143457: pv_wait: cpu 0 out of wfi
lock_torture_wr-115     [008] d..2.    53.143481: pv_wait: cpu 8 out of wfi
lock_torture_wr-117     [011] d..2.    53.143522: pv_wait: cpu 11 out of wfi
lock_torture_wr-117     [011] ...2.    53.143987: pv_kick: cpu 11 kick target cpu 8
lock_torture_wr-115     [008] ...2.    53.144269: pv_kick: cpu 8 kick target cpu 7

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
@bjoto
Copy link
Author

bjoto commented Jan 5, 2024

Upstream branch: f352a28
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=812817
version: 12

@bjoto
Copy link
Author

bjoto commented Jan 5, 2024

Upstream branch: 5a2cf77
series: https://patchwork.kernel.org/project/linux-riscv/list/?series=812817
version: 12

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-riscv/list/?series=812817
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am -s --3way
  stdout: 'Applying: asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
Applying: asm-generic: ticket-lock: Add separate ticket-lock.h
Applying: riscv: errata: Move errata vendor func-id into vendorid_list.h
Applying: riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup
Applying: riscv: qspinlock: Add basic queued_spinlock support
Applying: riscv: qspinlock: Introduce combo spinlock
Applying: riscv: qspinlock: Add virt_spin_lock() support for VM guest
Using index info to reconstruct a base tree...
M	arch/riscv/kernel/setup.c
Falling back to patching base and 3-way merge...
Auto-merging arch/riscv/kernel/setup.c
CONFLICT (content): Merge conflict in arch/riscv/kernel/setup.c
Patch failed at 0007 riscv: qspinlock: Add virt_spin_lock() support for VM guest
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

conflict:

diff --cc arch/riscv/kernel/setup.c
index 729e4361f13c,0bafb9fd6ea3..000000000000
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@@ -26,6 -26,8 +26,11 @@@
  #include <asm/alternative.h>
  #include <asm/cacheflush.h>
  #include <asm/cpufeature.h>
++<<<<<<< HEAD
++=======
+ #include <asm/cpu_ops.h>
+ #include <asm/cpufeature.h>
++>>>>>>> riscv: qspinlock: Add virt_spin_lock() support for VM guest
  #include <asm/early_ioremap.h>
  #include <asm/pgtable.h>
  #include <asm/setup.h>

@bjoto
Copy link
Author

bjoto commented Jan 8, 2024

At least one diff in series https://patchwork.kernel.org/project/linux-riscv/list/?series=812817 expired. Closing PR.

@bjoto bjoto deleted the series/782738=>for-next branch January 10, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants