Skip to content

Commit

Permalink
[AMDGPU] Support double type in atomic optimizer. (#84307)
Browse files Browse the repository at this point in the history
Presently the atomic optimizer supports only 32-bit operations. Plan is
to extend the atomic optimizer for 64-bit operations for compute and
graphics. This patch extends support for double type for `uniform
values` only. Going forward, will extend the support for divergent
values. Adding support for divergent values requires
extending/legalizing readfirstlane, readlane, writelane, etc ops for
64-bit operations to avoid `bitcast` noise that we have currently.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
  • Loading branch information
pravinjagtap committed Mar 22, 2024
1 parent c67ed2f commit e1a8120
Show file tree
Hide file tree
Showing 9 changed files with 20,497 additions and 112 deletions.
11 changes: 7 additions & 4 deletions llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,9 @@ void AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) {
break;
}

// Only 32-bit floating point atomic ops are supported.
if (AtomicRMWInst::isFPOperation(Op) && !I.getType()->isFloatTy()) {
// Only 32 and 64 bit floating point atomic ops are supported.
if (AtomicRMWInst::isFPOperation(Op) &&
!(I.getType()->isFloatTy() || I.getType()->isDoubleTy())) {
return;
}

Expand Down Expand Up @@ -920,8 +921,10 @@ void AMDGPUAtomicOptimizerImpl::optimizeAtomic(Instruction &I,
Value *BroadcastI = nullptr;

if (TyBitWidth == 64) {
Value *const ExtractLo = B.CreateTrunc(PHI, Int32Ty);
Value *const ExtractHi = B.CreateTrunc(B.CreateLShr(PHI, 32), Int32Ty);
Value *CastedPhi = B.CreateBitCast(PHI, IntNTy);
Value *const ExtractLo = B.CreateTrunc(CastedPhi, Int32Ty);
Value *const ExtractHi =
B.CreateTrunc(B.CreateLShr(CastedPhi, 32), Int32Ty);
CallInst *const ReadFirstLaneLo =
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractLo);
CallInst *const ReadFirstLaneHi =
Expand Down
274 changes: 223 additions & 51 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll

Large diffs are not rendered by default.

270 changes: 213 additions & 57 deletions llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll

Large diffs are not rendered by default.

560 changes: 560 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll

Large diffs are not rendered by default.

420 changes: 420 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll

Large diffs are not rendered by default.

5,578 changes: 5,578 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll

Large diffs are not rendered by default.

3,960 changes: 3,960 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll

Large diffs are not rendered by default.

3,960 changes: 3,960 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll

Large diffs are not rendered by default.

5,576 changes: 5,576 additions & 0 deletions llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll

Large diffs are not rendered by default.

0 comments on commit e1a8120

Please sign in to comment.