-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid ping-pong in spinlock::lock #1944
Avoid ping-pong in spinlock::lock #1944
Conversation
internal::cpu_relax(); | ||
while (_busy.load(std::memory_order_relaxed)) { | ||
internal::cpu_relax(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks okay, but did you see a problem? We use the spinlock very sparingly.
Also note: if there is no contention, then the new code is slower, since it will need two cache transactions (one to move to shared state, the other to move it to modified/exclusive state. But it's probably better overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't seen a problem. Maybe some (micro)benchmarks are needed (2 and 3 threads).
You are right, the hypothesis that new implementation slows down or speeds up the execution time, is needed to be proved by benchmark. I will write some (probably next week). My hypothesis is that unlock
operation is affected by this busy exchange
operation (because of MESI).
When many threads (>= 2) try to lock already locked spinlock, this patch improves execution time (reduces number of transactions in MESI).
Benchmarkms is micro seconds I've implemented a benchmark (not commited because of style inconsistency; but it is possible to add a benchmark to this PR). ResultsI've checked some scenarios. Scenarios with >= 3 workers were not interesting because this spinlock is known to be better. The scenario with 2 workers is more interesting. I don't know any proofs that new spinlock is better than old in this case. Mac M1 (aarch64)Apple clang version 15.0.0 (clang-1500.0.40.1)
So, Intel Ice Lake (x86_64)Ubuntu clang version 14.0.0-1ubuntu1.1
Other spinlocks are not tested because of dramatical performance reduce on aarch64 machine. ConclusionThis patch speeds up spinlock. |
Results with >= 3 workersms is micro seconds (less detailed) Intel
M1
|
6b66443
to
66d67ea
Compare
Please fold the two patches together, we don't merge patches that fix a problem in the same series. |
When 2 threads wait on this atomic: 1: underlying cache line is moved between L1 caches 2: cache line is needed to be in E (exclusive) state which affects performance due to cache coherense protocol: [MESIF](https://en.wikipedia.org/wiki/MESIF_protocol), MESI and so on Plus: Do chmod -x on .cc file
66d67ea
to
a77d101
Compare
(I've squashed 2 commits) |
When 2 threads wait on this atomic:
1: underlying cache line is moved between L1 caches
2: cache line is needed to be in E (exclusive) state which affects performance due to cache coherense protocol: MESIF, MESI and so on