-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize min/max atomics with early exit on no-op case #3265
Conversation
@dalg24 asked about performance (#3190 (review)). The early-exit change reduces the cost of atomic min/max by ~5x with 1 thread and ~6x with 4 threads (on 4-core Intel Core i7 Kaby Lake processor). One can estimate the value for a real workload by considering how often min/max has no effect for a given distribution of inputs. 1 thread
4 threads
|
@dhollman Did I apply your suggested implementation properly? |
e8f693c
to
83bda33
Compare
Based on the prototype written by @dhollman in https://godbolt.org/z/zqP8ox
c72e11b
to
c7a26f9
Compare
If this passes we can cleanup history after.
I pushed the clang-format here. We gonna rewrite history to fold it into the prior commit if this works and otherwise passes testing. Btw. its 5x and 6x respectivly :-) |
(Failure on Jenkins is not related to this PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but don't merge just on my review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the makefile
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
I decided to default the length to 1'000'000 rather than waiting for #3279 to land. diff --git a/core/perf_test/CMakeLists.txt b/core/perf_test/CMakeLists.txt
index 49741ed8..3b7d154f 100644
--- a/core/perf_test/CMakeLists.txt
+++ b/core/perf_test/CMakeLists.txt
@@ -93,6 +93,7 @@ KOKKOS_ADD_EXECUTABLE_AND_TEST(
PerformanceTest_Atomic_MinMax
SOURCES test_atomic_minmax_simple.cpp
CATEGORIES PERFORMANCE
+ ARGS 1000000
)
KOKKOS_ADD_EXECUTABLE_AND_TEST( |
@@ -165,6 +202,7 @@ KOKKOS_INLINE_FUNCTION T atomic_fetch_oper( | |||
} oldval, assume, newval; | |||
|
|||
oldval.t = *dest; | |||
if (check_early_exit(Oper{}, oldval.t, val)) return oldval.t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be in the while loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The purpose of this check is to determine if there is no need to even attempt an update.
I assumed load atomicity is architectural here, so if you are seeing issues on PowerPC, the fix is not to put the check in the loop but to add a relaxed atomic load of oldval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not asking you to change anything. @crtrott mentioned yesterday there might be an issue with the current implementation. I commented as a reminder for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with not doing anything, but I would like to understand why anyone thinks this check belongs in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @crtrott I'm not sure why the early exit check needs to be in the while loop. Probably the fewer branches the better in the CAS loop itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
desul/desul#13 has our discussion of this topic. It may be easier to have it over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Based on the prototype written by @dhollman in https://godbolt.org/z/zqP8ox
Fixes #3144