Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Produce better memoperand for LDS DMA #75247

Merged
merged 2 commits into from
Dec 18, 2023

Conversation

rampitec
Copy link
Collaborator

  1. It was marked as volatile. This is not needed and the only reason
    it was done is because it is both load and store and handled
    together with atomics. Global load to LDS was marked as volatile
    just because buffer load was done that way.
  2. Preserve at least LDS (store) pointer which we always have with
    the intrinsics.
  3. Use PoisonValue instead of nullptr for load memop as a Value.

1) It was marked as volatile. This is not needed and the only reason
   it was done is because it is both load and store and handled
   together with atomics. Global load to LDS was marked as volatile
   just because buffer load was done that way.
2) Preserve at least LDS (store) pointer which we always have with
   the intrinsics.
3) Use PoisonValue instead of nullptr for load memop as a Value.
@llvmbot
Copy link

llvmbot commented Dec 12, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes
  1. It was marked as volatile. This is not needed and the only reason
    it was done is because it is both load and store and handled
    together with atomics. Global load to LDS was marked as volatile
    just because buffer load was done that way.
  2. Preserve at least LDS (store) pointer which we always have with
    the intrinsics.
  3. Use PoisonValue instead of nullptr for load memop as a Value.

Full diff: https://github.com/llvm/llvm-project/pull/75247.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+10-6)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 85cc3cfec19cd..2549e41ee527d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
                     MachineMemOperand::MOStore |
                     MachineMemOperand::MODereferenceable;
 
-      // XXX - Should this be volatile without known ordering?
-      Info.flags |= MachineMemOperand::MOVolatile;
-
       switch (IntrID) {
       default:
+        // XXX - Should this be volatile without known ordering?
+        Info.flags |= MachineMemOperand::MOVolatile;
         break;
       case Intrinsic::amdgcn_raw_buffer_load_lds:
       case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
       case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
         unsigned Width = cast<ConstantInt>(CI.getArgOperand(2))->getZExtValue();
         Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+        Info.ptrVal = CI.getArgOperand(1);
         return true;
       }
       }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
     Info.opc = ISD::INTRINSIC_VOID;
     unsigned Width = cast<ConstantInt>(CI.getArgOperand(2))->getZExtValue();
     Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-    Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-                  MachineMemOperand::MOVolatile;
+    Info.ptrVal = CI.getArgOperand(1);
+    Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
     return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9095,7 +9095,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
     MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
     MachinePointerInfo StorePtrI = LoadPtrI;
-    StorePtrI.V = nullptr;
+    LoadPtrI.V = PoisonValue::get(
+        PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+    LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
     StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
     auto F = LoadMMO->getFlags() &
@@ -9173,6 +9175,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
     MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
     LoadPtrI.Offset = Op->getConstantOperandVal(5);
     MachinePointerInfo StorePtrI = LoadPtrI;
+    LoadPtrI.V = PoisonValue::get(
+        PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
     LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
     StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
     auto F = LoadMMO->getFlags() &

rampitec added a commit to rampitec/llvm-project that referenced this pull request Dec 12, 2023
SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset
checks, but does not account for LDS DMA instructions. Added
these checks. Without it code falls through and returns true
which is wrong. As a result mayAlias would always return false
for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a
context which may touch LDS DMA instructions. This is also
unreacheable now because of the ordered memory ref checks just
above in the function and LDS DMA is marked as volatile. This
volatile marking is removed in PR llvm#75247, therefore I'd submit
this check before llvm#75247.
rampitec added a commit that referenced this pull request Dec 18, 2023
SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks,
but does not account for LDS DMA instructions. Added these checks.
Without it code falls through and returns true which is wrong. As a
result mayAlias would always return false for LDS DMA and a regular LDS
instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a context
which may touch LDS DMA instructions. This is also unreacheable now
because of the ordered memory ref checks just above in the function and
LDS DMA is marked as volatile. This volatile marking is removed in PR
#75247, therefore I'd submit this check before #75247.
@rampitec rampitec merged commit e5c523e into llvm:main Dec 18, 2023
4 checks passed
@rampitec rampitec deleted the lds-dma-memop branch December 18, 2023 23:02
@jayfoad
Copy link
Contributor

jayfoad commented Dec 19, 2023

Use PoisonValue instead of nullptr for load memop as a Value.

What is the effect of that? I thought nullptr was supposed to represent an unknown value, so you have to conservatively assume it might alias with anything.

The effect is that MI.mayAlias does not bail before checking AA. It needs at least something as a Value, and that something must be not a PseudoValue. There is alias.scope info attached specifically to work after lowering, when no Values exist, but even to get there you need to have a Value. This is just a general distrust of the llvm in the AA info past lowering.

And no, this is exactly the point: it may not alias with anything. It has alias.scope info, created just for that.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 6, 2024
1) It was marked as volatile. This is not needed and the only reason
   it was done is because it is both load and store and handled
   together with atomics. Global load to LDS was marked as volatile
   just because buffer load was done that way.
2) Preserve at least LDS (store) pointer which we always have with
   the intrinsics.
3) Use PoisonValue instead of nullptr for load memop as a Value.

Change-Id: I3f937cff3280a411653712357a144d5ce819fea6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants