Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Add Volta Load/Store Atomics (.relaxed, .acquire, .release) and Volatile (.mmio/.volatile) support #98022

Merged
merged 7 commits into from
Jul 15, 2024

Conversation

gonzalobg
Copy link
Contributor

@gonzalobg gonzalobg commented Jul 8, 2024

This PR adds initial support for some of Volta's (sm_70) load/store atomic and volatile/MMIO operations, hopefully without breaking any preexisting code. Only relaxed, acquire, and release operations w/ volatile are handled.

This PR does not aim to add support for any of the following:

  • syncscope support
  • read atomic ops to const, param, grid param
  • local memory atomics
  • sequentially consistent atomics
  • atomicrmw
  • ...

This PR builds on top of #96436 which has not been reviewed yet (not sure if there is anything for me to do there; opened it 2 weeks ago and have heard nothing).

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 8, 2024

@llvm/pr-subscribers-backend-nvptx

Author: None (gonzalobg)

Changes

This PR adds initial support for some of Volta's (sm_70) load/store atomic and volatile/MMIO operations, hopefully without breaking any preexisting code. Only relaxed, acquire, and release operations w/ volatile are handled.

This PR does not aim to add support for any of the following:

  • syncscope support
  • read atomic ops to const, param, grid param
  • local memory atomics
  • sequentially consistent atomics
  • atomicrmw
  • ...

Patch is 105.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98022.diff

6 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp (+26-3)
  • (modified) llvm/lib/Target/NVPTX/NVPTX.h (+8)
  • (modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (+125-60)
  • (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+72-72)
  • (added) llvm/test/CodeGen/NVPTX/load-store-sm-70.ll (+951)
  • (modified) llvm/test/CodeGen/NVPTX/load-store.ll (+577-6)
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
index b7a20c351f5ff6..c96e5a8878fcba 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
@@ -224,9 +224,32 @@ void NVPTXInstPrinter::printLdStCode(const MCInst *MI, int OpNum,
   if (Modifier) {
     const MCOperand &MO = MI->getOperand(OpNum);
     int Imm = (int) MO.getImm();
-    if (!strcmp(Modifier, "volatile")) {
-      if (Imm)
-        O << ".volatile";
+    if (!strcmp(Modifier, "sem")) {
+      switch (Imm) {
+      case NVPTX::PTXLdStInstCode::NotAtomic:
+          break;
+      case NVPTX::PTXLdStInstCode::Volatile:
+          O << ".volatile";
+          break;
+      case NVPTX::PTXLdStInstCode::Relaxed:
+          O << ".relaxed.sys";
+          break;
+      case NVPTX::PTXLdStInstCode::Acquire:
+          O << ".acquire.sys";
+          break;
+      case NVPTX::PTXLdStInstCode::Release:
+          O << ".release.sys";
+          break;
+      case NVPTX::PTXLdStInstCode::RelaxedMMIO:
+          O << ".mmio.relaxed.sys";
+          break;
+      default:
+        SmallString<256> Msg;
+        raw_svector_ostream OS(Msg);
+        OS << "NVPTX LdStCode Printer does not support \"" << Imm << "\" sem modifier.";
+        report_fatal_error(OS.str());
+        break;
+      }
     } else if (!strcmp(Modifier, "addsp")) {
       switch (Imm) {
       case NVPTX::PTXLdStInstCode::GLOBAL:
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index b0cb24c63c3ceb..3c7167b1570254 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -107,6 +107,14 @@ enum LoadStore {
 };
 
 namespace PTXLdStInstCode {
+enum MemorySemantic {
+  NotAtomic = 0, // PTX calls these: "Weak"
+  Volatile = 1,
+  Relaxed = 2,
+  Acquire = 3,
+  Release = 4,
+  RelaxedMMIO = 5
+};
 enum AddressSpace {
   GENERIC = 0,
   GLOBAL = 1,
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index 1e1cbb15e33d4a..8c25c79db41ebd 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -700,6 +700,109 @@ static unsigned int getCodeAddrSpace(MemSDNode *N) {
   return NVPTX::PTXLdStInstCode::GENERIC;
 }
 
+static unsigned int getCodeMemorySemantic(MemSDNode *N, const NVPTXSubtarget *Subtarget) {
+  AtomicOrdering Ordering = N->getSuccessOrdering();
+  auto CodeAddrSpace = getCodeAddrSpace(N);
+
+  // Supports relaxed, acquire, release, weak:
+  bool hasAtomics = Subtarget->getPTXVersion() >= 60 && Subtarget->getSmVersion() >= 70;
+  // Supports mmio:
+  bool hasRelaxedMMIO = Subtarget->getPTXVersion() >= 82 && Subtarget->getSmVersion() >= 70;
+
+  // TODO: lowering for SequentiallyConsistent Operations: for now, we error.
+  // TODO: lowering for AcquireRelease Operations: for now, we error.
+  //
+  // Lowering for non-SequentiallyConsistent Operations
+  // 
+  // | Atomic  | Volatile | Statespace                    | Lowering sm_60- | Lowering sm_70+                                      |
+  // |---------|----------|-------------------------------|-----------------|------------------------------------------------------|
+  // | No      | No       | All                           | plain           | .weak                                                |
+  // | No      | Yes      | Generic / Shared / Global [0] | .volatile       | .volatile                                            |
+  // | No      | Yes      | Local / Const / Param         | plain [1]       | .weak [1]                                            |
+  // | Relaxed | No       | Generic / Shared / Global [0] | .volatile       | <atomic sem>                                         |
+  // | Other   | No       | Generic / Shared / Global [0] | Error [2]       | <atomic sem>                                         |
+  // | Yes     | No       | Local / Const / Param         | plain [1]       | .weak [1]                                            |
+  // | Relaxed | Yes      | Generic / Shared [0]          | .volatile       | .volatile                                            |
+  // | Relaxed | Yes      | Global [0]                    | .volatile       | .mmio.relaxed.sys (PTX 8.2+) or .volatile (PTX 8.1-) |
+  // | Relaxed | Yes      | Local / Const / Param         | plain [1]       | .weak [1]                                            |
+  // | Other   | Yes      | Generic / Shared / Global [0] | Error [4]       | <atomic sem> [3]                                     |
+  //
+  // [0]: volatile and atomics are only supported on generic addressing to shared or global, or shared, or global.
+  //      MMIO requires generic addressing to global or global, but (TODO) we only implement it for global.
+  // [1]: TODO: this implementation exhibits PTX Undefined Behavior; it fails to preserve the side-effects of atomics and volatile
+  //      accesses in LLVM IR to local / const / param, causing well-formed LLVM-IR & CUDA C++ programs to be miscompiled in sm_70+.
+  if (CodeAddrSpace == NVPTX::PTXLdStInstCode::LOCAL || CodeAddrSpace == NVPTX::PTXLdStInstCode::CONSTANT || CodeAddrSpace == NVPTX::PTXLdStInstCode::PARAM) {
+    return NVPTX::PTXLdStInstCode::NotAtomic;
+  }
+
+  // [2]: Atomics with Ordering different than Relaxed are not supported on sm_60 and older.
+  if (!(Ordering == AtomicOrdering::NotAtomic || Ordering == AtomicOrdering::Monotonic) && !hasAtomics) {
+      SmallString<256> Msg;
+      raw_svector_ostream OS(Msg);
+      OS << "PTX does not support \"atomic\" for orderings different than \"NotAtomic\" or \"Monotonic\" for sm_60 or older, but order is: \"" << toIRString(Ordering) << "\".";
+      report_fatal_error(OS.str());
+  }
+
+  // [3]: TODO: these should eventually use .mmio<.atomic sem>; for now we drop the volatile semantics and preserve the atomic ones.
+  // [4]: TODO: volatile atomics with order stronger than relaxed are currently unimplemented in sm_60 and older..
+  if (!hasAtomics && N->isVolatile() && !(Ordering == AtomicOrdering::NotAtomic || Ordering == AtomicOrdering::Monotonic)) {
+      SmallString<256> Msg;
+      raw_svector_ostream OS(Msg);
+      OS << "PTX does not support \"volatile atomic\" for orderings different than \"NotAtomic\" or \"Monotonic\" for sm_60 and older, but order is: \"" << toIRString(Ordering) << "\".";
+      report_fatal_error(OS.str());
+  }
+
+  // PTX volatile and PTX atomics are not available for statespace that differ from .generic, .global, or .shared.
+  // The behavior of PTX volatile and PTX atomics is undefined if the generic address does not refer to a .global or .shared memory location.
+  bool addrGenericOrGlobalOrShared = (CodeAddrSpace == NVPTX::PTXLdStInstCode::GENERIC
+                                      || CodeAddrSpace == NVPTX::PTXLdStInstCode::GLOBAL
+                                      || CodeAddrSpace == NVPTX::PTXLdStInstCode::SHARED);
+  bool useRelaxedMMIO = hasRelaxedMMIO && CodeAddrSpace == NVPTX::PTXLdStInstCode::GLOBAL;
+
+  switch (Ordering) {
+    case AtomicOrdering::NotAtomic: return N->isVolatile() && addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+    case AtomicOrdering::Monotonic:
+      if (N->isVolatile()) return useRelaxedMMIO? NVPTX::PTXLdStInstCode::RelaxedMMIO: addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+      else return hasAtomics? NVPTX::PTXLdStInstCode::Relaxed: addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+    case AtomicOrdering::Acquire:
+      if (!N->readMem()) {
+        SmallString<256> Msg;
+        raw_svector_ostream OS(Msg);
+        OS << "PTX only supports Acquire Ordering on reads: " << N->getOperationName();
+        N->print(OS);
+        report_fatal_error(OS.str());
+      }
+      return addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Acquire: NVPTX::PTXLdStInstCode::NotAtomic;
+    case AtomicOrdering::Release:
+      if (!N->writeMem()) {
+        SmallString<256> Msg;
+        raw_svector_ostream OS(Msg);
+        OS << "PTX only supports Release Ordering on writes: " << N->getOperationName();
+        N->print(OS);
+        report_fatal_error(OS.str());
+      }
+      return addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Release: NVPTX::PTXLdStInstCode::NotAtomic;
+    case AtomicOrdering::AcquireRelease: {
+      SmallString<256> Msg;
+      raw_svector_ostream OS(Msg);
+      OS << "PTX only supports AcquireRelease Ordering on read-modify-write: " << N->getOperationName();
+      N->print(OS);
+      report_fatal_error(OS.str());
+    }
+    case AtomicOrdering::SequentiallyConsistent:
+    case AtomicOrdering::Unordered:
+    default: {
+      // TODO: support AcquireRelease and SequentiallyConsistent
+      SmallString<256> Msg;
+      raw_svector_ostream OS(Msg);
+      OS << "NVPTX backend does not support AtomicOrdering \"" << toIRString(Ordering) << "\" yet.";
+      report_fatal_error(OS.str());
+    }
+  }
+
+  report_fatal_error("unreachable");
+}
+
 static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,
                           unsigned CodeAddrSpace, MachineFunction *F) {
   // We use ldg (i.e. ld.global.nc) for invariant loads from the global address
@@ -902,32 +1005,18 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
   if (!LoadedVT.isSimple())
     return false;
 
-  AtomicOrdering Ordering = LD->getSuccessOrdering();
-  // In order to lower atomic loads with stronger guarantees we would need to
-  // use load.acquire or insert fences. However these features were only added
-  // with PTX ISA 6.0 / sm_70.
-  // TODO: Check if we can actually use the new instructions and implement them.
-  if (isStrongerThanMonotonic(Ordering))
-    return false;
-
   // Address Space Setting
   unsigned int CodeAddrSpace = getCodeAddrSpace(LD);
   if (canLowerToLDG(LD, *Subtarget, CodeAddrSpace, MF)) {
     return tryLDGLDU(N);
   }
 
+  // Memory Semantic Setting
+  unsigned int CodeMemorySem = getCodeMemorySemantic(LD, Subtarget);
+
   unsigned int PointerSize =
       CurDAG->getDataLayout().getPointerSizeInBits(LD->getAddressSpace());
 
-  // Volatile Setting
-  // - .volatile is only available for .global and .shared
-  // - .volatile has the same memory synchronization semantics as .relaxed.sys
-  bool isVolatile = LD->isVolatile() || Ordering == AtomicOrdering::Monotonic;
-  if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
-    isVolatile = false;
-
   // Type Setting: fromType + fromTypeWidth
   //
   // Sign   : ISD::SEXTLOAD
@@ -968,7 +1057,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
                              NVPTX::LD_f32_avar, NVPTX::LD_f64_avar);
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
                       getI32Imm(vecType, dl), getI32Imm(fromType, dl),
                       getI32Imm(fromTypeWidth, dl), Addr, Chain };
     NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -979,7 +1068,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
                              NVPTX::LD_f32_asi, NVPTX::LD_f64_asi);
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
                       getI32Imm(vecType, dl), getI32Imm(fromType, dl),
                       getI32Imm(fromTypeWidth, dl), Base, Offset, Chain };
     NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -996,7 +1085,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
                                NVPTX::LD_f32_ari, NVPTX::LD_f64_ari);
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
                       getI32Imm(vecType, dl), getI32Imm(fromType, dl),
                       getI32Imm(fromTypeWidth, dl), Base, Offset, Chain };
     NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -1012,7 +1101,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
                                NVPTX::LD_f32_areg, NVPTX::LD_f64_areg);
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
                       getI32Imm(vecType, dl), getI32Imm(fromType, dl),
                       getI32Imm(fromTypeWidth, dl), N1, Chain };
     NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -1051,13 +1140,8 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
   unsigned int PointerSize =
       CurDAG->getDataLayout().getPointerSizeInBits(MemSD->getAddressSpace());
 
-  // Volatile Setting
-  // - .volatile is only availalble for .global and .shared
-  bool IsVolatile = MemSD->isVolatile();
-  if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
-    IsVolatile = false;
+  // Memory Semantic Setting
+  unsigned int CodeMemorySem = getCodeMemorySemantic(MemSD, Subtarget);
 
   // Vector Setting
   MVT SimpleVT = LoadedVT.getSimpleVT();
@@ -1124,7 +1208,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
     }
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
                       getI32Imm(VecType, DL), getI32Imm(FromType, DL),
                       getI32Imm(FromTypeWidth, DL), Addr, Chain };
     LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1149,7 +1233,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
     }
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
                       getI32Imm(VecType, DL), getI32Imm(FromType, DL),
                       getI32Imm(FromTypeWidth, DL), Base, Offset, Chain };
     LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1194,7 +1278,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
     }
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
                       getI32Imm(VecType, DL), getI32Imm(FromType, DL),
                       getI32Imm(FromTypeWidth, DL), Base, Offset, Chain };
 
@@ -1239,7 +1323,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
     }
     if (!Opcode)
       return false;
-    SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+    SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
                       getI32Imm(VecType, DL), getI32Imm(FromType, DL),
                       getI32Imm(FromTypeWidth, DL), Op1, Chain };
     LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1684,27 +1768,13 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
   if (!StoreVT.isSimple())
     return false;
 
-  AtomicOrdering Ordering = ST->getSuccessOrdering();
-  // In order to lower atomic loads with stronger guarantees we would need to
-  // use store.release or insert fences. However these features were only added
-  // with PTX ISA 6.0 / sm_70.
-  // TODO: Check if we can actually use the new instructions and implement them.
-  if (isStrongerThanMonotonic(Ordering))
-    return false;
-
   // Address Space Setting
   unsigned int CodeAddrSpace = getCodeAddrSpace(ST);
   unsigned int PointerSize =
       CurDAG->getDataLayout().getPointerSizeInBits(ST->getAddressSpace());
 
-  // Volatile Setting
-  // - .volatile is only available for .global and .shared
-  // - .volatile has the same memory synchronization semantics as .relaxed.sys
-  bool isVolatile = ST->isVolatile() || Ordering == AtomicOrdering::Monotonic;
-  if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
-    isVolatile = false;
+  // Memory Semantic Setting
+  unsigned int CodeMemorySem = getCodeMemorySemantic(ST, Subtarget);
 
   // Vector Setting
   MVT SimpleVT = StoreVT.getSimpleVT();
@@ -1741,7 +1811,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
     if (!Opcode)
       return false;
     SDValue Ops[] = {Value,
-                     getI32Imm(isVolatile, dl),
+                     getI32Imm(CodeMemorySem, dl),
                      getI32Imm(CodeAddrSpace, dl),
                      getI32Imm(vecType, dl),
                      getI32Imm(toType, dl),
@@ -1758,7 +1828,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
     if (!Opcode)
       return false;
     SDValue Ops[] = {Value,
-                     getI32Imm(isVolatile, dl),
+                     getI32Imm(CodeMemorySem, dl),
                      getI32Imm(CodeAddrSpace, dl),
                      getI32Imm(vecType, dl),
                      getI32Imm(toType, dl),
@@ -1783,7 +1853,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
       return false;
 
     SDValue Ops[] = {Value,
-                     getI32Imm(isVolatile, dl),
+                     getI32Imm(CodeMemorySem, dl),
                      getI32Imm(CodeAddrSpace, dl),
                      getI32Imm(vecType, dl),
                      getI32Imm(toType, dl),
@@ -1805,7 +1875,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
     if (!Opcode)
       return false;
     SDValue Ops[] = {Value,
-                     getI32Imm(isVolatile, dl),
+                     getI32Imm(CodeMemorySem, dl),
                      getI32Imm(CodeAddrSpace, dl),
                      getI32Imm(vecType, dl),
                      getI32Imm(toType, dl),
@@ -1844,13 +1914,8 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
   unsigned int PointerSize =
       CurDAG->getDataLayout().getPointerSizeInBits(MemSD->getAddressSpace());
 
-  // Volatile Setting
-  // - .volatile is only availalble for .global and .shared
-  bool IsVolatile = MemSD->isVolatile();
-  if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
-      CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
-    IsVolatile = false;
+  // Memory Semantic Setting
+  unsigned int CodeMemorySem = getCodeMemorySemantic(MemSD, Subtarget);
 
   // Type Setting: toType + toTypeWidth
   // - for integer type, always use 'u'
@@ -1892,7 +1957,7 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
     ToTypeWidth = 32;
   }
 
-  StOps.push_back(getI32Imm(IsVolatile, DL));
+  StOps.push_back(getI32Imm(CodeMemorySem, DL));
   StOps.push_back(getI32Imm(CodeAddrSpace, DL));
   StOps.push_back(getI32Imm(VecType, DL));
   StOps.push_back(getI32Imm(ToType, DL));
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index c4c35a1f74ba93..9be8fe5d4e6efb 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -2939,39 +2939,39 @@ foreach vt = [v2f16, v2bf16, v2i16, v4i8] in {
 multiclass LD<NVPTXRegClass regclass> {
   def _avar : NVPTXInst<
     (outs regclass:$dst),
-    (ins LdStCode:$isVol, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
+    (ins LdStCode:$sem, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
          i32imm:$fromWidth, imem:$addr),
-    "ld${isVol:volatile}${addsp:addsp}${Vec:vec}.${Sign:sign}$fromWidth "
+    "ld${sem:sem}${addsp:addsp}${Vec:vec}.${Sign:sign}$fromWidth "
     "\t$dst, [$addr];", []>;
   def _areg : NVPTXInst<
     (outs regclass:$dst),
-    (ins LdStCode:$isVol, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
+    (ins LdStCode:$sem, LdStCode:$addsp, Ld...
[truncated]

Copy link

github-actions bot commented Jul 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle, with few comments and nits.

llvm/test/CodeGen/NVPTX/load-store-sm-70.ll Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXSubtarget.h Outdated Show resolved Hide resolved
llvm/test/CodeGen/NVPTX/load-store-sm-70.ll Show resolved Hide resolved
llvm/test/CodeGen/NVPTX/load-store-sm-70.ll Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Outdated Show resolved Hide resolved
@gonzalobg gonzalobg force-pushed the nvptx_volta_relaxed_volatile branch 8 times, most recently from af3b119 to d298f16 Compare July 15, 2024 12:19
@gonzalobg gonzalobg force-pushed the nvptx_volta_relaxed_volatile branch from d298f16 to 46ab54a Compare July 15, 2024 19:26
@Artem-B Artem-B merged commit 2da30f8 into llvm:main Jul 15, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants