Avoid ping-pong in spinlock::lock #1944

kirillgrachoff · 2023-11-15T16:04:57Z

When 2 threads wait on this atomic:
1: underlying cache line is moved between L1 caches
2: cache line is needed to be in E (exclusive) state which affects performance due to cache coherense protocol: MESIF, MESI and so on

avikivity · 2023-11-15T20:20:37Z

include/seastar/util/spinlock.hh

-            internal::cpu_relax();
+            while (_busy.load(std::memory_order_relaxed)) {
+                internal::cpu_relax();
+            }


Patch looks okay, but did you see a problem? We use the spinlock very sparingly.

Also note: if there is no contention, then the new code is slower, since it will need two cache transactions (one to move to shared state, the other to move it to modified/exclusive state. But it's probably better overall.

No, I haven't seen a problem. Maybe some (micro)benchmarks are needed (2 and 3 threads).

You are right, the hypothesis that new implementation slows down or speeds up the execution time, is needed to be proved by benchmark. I will write some (probably next week). My hypothesis is that unlock operation is affected by this busy exchange operation (because of MESI).

When many threads (>= 2) try to lock already locked spinlock, this patch improves execution time (reduces number of transactions in MESI).

kirillgrachoff · 2023-11-17T19:41:38Z

Benchmark

ms is micro seconds

I've implemented a benchmark (not commited because of style inconsistency; but it is possible to add a benchmark to this PR).

Results

I've checked some scenarios. Scenarios with >= 3 workers were not interesting because this spinlock is known to be better. The scenario with 2 workers is more interesting. I don't know any proofs that new spinlock is better than old in this case.

Mac M1 (aarch64)

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix

Params: workers: 2
old spinlock:                                      speed: ~31.973932 op/ms
add `while` loop (as in PR):                       speed: ~32.615116 op/ms
replace `exchange` with `compare_exchange_strong`: speed: ~16.080979 op/ms
replace `exchange` with `compare_exchange_weak`:   speed: ~16.220595 op/ms

So, locked.echange(true) is dramatically faster than locked.compare_exchange(false, true) on aarch64 (M1) machine.

Intel Ice Lake (x86_64)

Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix

Params: workers: 2
old spinlock: speed: ~17.170028 op/ms (not stable: from 16 op/ms to 75.151604 op/ms)
new spinlock: speed: ~18.011552 op/ms

Other spinlocks are not tested because of dramatical performance reduce on aarch64 machine.

Conclusion

This patch speeds up spinlock.

kirillgrachoff · 2023-11-17T20:06:10Z

Results with >= 3 workers

ms is micro seconds

(less detailed)

Intel

workers	old	new
3	11.696168 op/ms	12.375299 op/ms

M1

workers	old	new
3	22.410138 op/ms	25.528729 op/ms
4	17.768904 op/ms	20.809818 op/ms
5	7.859780 op/ms	9.702729 op/ms
10	4.589271 op/ms	6.093093 op/ms
15	2.462706 op/ms	3.656569 op/ms
16	2.646841 op/ms	3.743264 op/ms
17	2.753661 op/ms	3.845345 op/ms

avikivity · 2023-12-03T15:19:07Z

Please fold the two patches together, we don't merge patches that fix a problem in the same series.

When 2 threads wait on this atomic: 1: underlying cache line is moved between L1 caches 2: cache line is needed to be in E (exclusive) state which affects performance due to cache coherense protocol: [MESIF](https://en.wikipedia.org/wiki/MESIF_protocol), MESI and so on Plus: Do chmod -x on .cc file

kirillgrachoff · 2023-12-24T13:40:20Z

(I've squashed 2 commits)

avikivity reviewed Nov 15, 2023

View reviewed changes

kirillgrachoff requested a review from avikivity November 20, 2023 20:54

kirillgrachoff force-pushed the remove_spinlock_ping_pong branch from 6b66443 to 66d67ea Compare December 1, 2023 16:05

kirillgrachoff force-pushed the remove_spinlock_ping_pong branch from 66d67ea to a77d101 Compare December 4, 2023 22:18

avikivity merged commit 3c24968 into scylladb:master Dec 24, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid ping-pong in spinlock::lock #1944

Avoid ping-pong in spinlock::lock #1944

kirillgrachoff commented Nov 15, 2023

avikivity Nov 15, 2023

kirillgrachoff Nov 15, 2023

kirillgrachoff commented Nov 17, 2023 •

edited

Loading

kirillgrachoff commented Nov 17, 2023 •

edited

Loading

avikivity commented Dec 3, 2023

kirillgrachoff commented Dec 24, 2023

Avoid ping-pong in spinlock::lock #1944

Avoid ping-pong in spinlock::lock #1944

Conversation

kirillgrachoff commented Nov 15, 2023

avikivity Nov 15, 2023

Choose a reason for hiding this comment

kirillgrachoff Nov 15, 2023

Choose a reason for hiding this comment

kirillgrachoff commented Nov 17, 2023 • edited Loading

Benchmark

Results

Mac M1 (aarch64)

Intel Ice Lake (x86_64)

Conclusion

kirillgrachoff commented Nov 17, 2023 • edited Loading

Results with >= 3 workers

Intel

M1

avikivity commented Dec 3, 2023

kirillgrachoff commented Dec 24, 2023

kirillgrachoff commented Nov 17, 2023 •

edited

Loading

kirillgrachoff commented Nov 17, 2023 •

edited

Loading