Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid ping-pong in spinlock::lock #1944

Merged
merged 1 commit into from
Dec 24, 2023

Conversation

kirillgrachoff
Copy link
Contributor

When 2 threads wait on this atomic:
1: underlying cache line is moved between L1 caches
2: cache line is needed to be in E (exclusive) state which affects performance due to cache coherense protocol: MESIF, MESI and so on

internal::cpu_relax();
while (_busy.load(std::memory_order_relaxed)) {
internal::cpu_relax();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch looks okay, but did you see a problem? We use the spinlock very sparingly.

Also note: if there is no contention, then the new code is slower, since it will need two cache transactions (one to move to shared state, the other to move it to modified/exclusive state. But it's probably better overall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I haven't seen a problem. Maybe some (micro)benchmarks are needed (2 and 3 threads).

You are right, the hypothesis that new implementation slows down or speeds up the execution time, is needed to be proved by benchmark. I will write some (probably next week). My hypothesis is that unlock operation is affected by this busy exchange operation (because of MESI).

When many threads (>= 2) try to lock already locked spinlock, this patch improves execution time (reduces number of transactions in MESI).

@kirillgrachoff
Copy link
Contributor Author

kirillgrachoff commented Nov 17, 2023

Benchmark

ms is micro seconds

I've implemented a benchmark (not commited because of style inconsistency; but it is possible to add a benchmark to this PR).

Results

I've checked some scenarios. Scenarios with >= 3 workers were not interesting because this spinlock is known to be better. The scenario with 2 workers is more interesting. I don't know any proofs that new spinlock is better than old in this case.

Mac M1 (aarch64)

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix

Params: workers: 2
old spinlock:                                      speed: ~31.973932 op/ms
add `while` loop (as in PR):                       speed: ~32.615116 op/ms
replace `exchange` with `compare_exchange_strong`: speed: ~16.080979 op/ms
replace `exchange` with `compare_exchange_weak`:   speed: ~16.220595 op/ms

So, locked.echange(true) is dramatically faster than locked.compare_exchange(false, true) on aarch64 (M1) machine.

Intel Ice Lake (x86_64)

Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix

Params: workers: 2
old spinlock: speed: ~17.170028 op/ms (not stable: from 16 op/ms to 75.151604 op/ms)
new spinlock: speed: ~18.011552 op/ms

Other spinlocks are not tested because of dramatical performance reduce on aarch64 machine.

Conclusion

This patch speeds up spinlock.

@kirillgrachoff
Copy link
Contributor Author

kirillgrachoff commented Nov 17, 2023

Results with >= 3 workers

ms is micro seconds

(less detailed)

Intel

workers old new
3 11.696168 op/ms 12.375299 op/ms

M1

workers old new
3 22.410138 op/ms 25.528729 op/ms
4 17.768904 op/ms 20.809818 op/ms
5 7.859780 op/ms 9.702729 op/ms
10 4.589271 op/ms 6.093093 op/ms
15 2.462706 op/ms 3.656569 op/ms
16 2.646841 op/ms 3.743264 op/ms
17 2.753661 op/ms 3.845345 op/ms

@avikivity
Copy link
Member

Please fold the two patches together, we don't merge patches that fix a problem in the same series.

When 2 threads wait on this atomic:
1: underlying cache line is moved between L1 caches
2: cache line is needed to be in E (exclusive) state which
affects performance due to cache coherense protocol:
[MESIF](https://en.wikipedia.org/wiki/MESIF_protocol), MESI and so on

Plus: Do chmod -x on .cc file
@kirillgrachoff
Copy link
Contributor Author

(I've squashed 2 commits)

@avikivity avikivity merged commit 3c24968 into scylladb:master Dec 24, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants