-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. #66082
Conversation
D156301 introduced atomic optimizations for FAdd/FSub. For FSub, scan need to be performed using add operation (not sub) and memory location will be updated by reduced value using atomic sub later by only one lane.
@llvm/pr-subscribers-backend-amdgpu ChangesD156301 introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (
|
@@ -451,7 +451,7 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str | |||
; IR-ITERATIVE-NEXT: [[TMP26:%.*]] = bitcast float [[OLDVALUEPHI]] to i32 | |||
; IR-ITERATIVE-NEXT: [[TMP27:%.*]] = call i32 @llvm.amdgcn.writelane(i32 [[TMP25]], i32 [[TMP21]], i32 [[TMP26]]) #[[ATTR7]] | |||
; IR-ITERATIVE-NEXT: [[TMP28]] = bitcast i32 [[TMP27]] to float | |||
; IR-ITERATIVE-NEXT: [[TMP29]] = call float @llvm.experimental.constrained.fsub.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]] | |||
; IR-ITERATIVE-NEXT: [[TMP29]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're doing the reduction with fadd then you need to start with the identity for fadd, i.e. -0.0.
(For that reason the code changes might be very slightly simpler if you do the reduction with fsub followed by a final atomic fadd.)
AtomicRMWInst::BinOp ScanOp; | ||
if(Op == AtomicRMWInst::Sub ) { | ||
// For atomic sub, perform scan with add operation and allow one lane to | ||
// substract the reduced value later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "subtract"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
[D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
[D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Change-Id: If2562ccd3b49345eb1f4dca25cd93ca8c5f7393a
D156301 introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (
not sub
) and memory location will be updated by reduced value using atomic sub later by only one lane.