[NVPTX] Add a few more missing fence intrinsics #166352

schwarzschild-radius · 2025-11-04T11:37:15Z

This commit adds the below fence intrinsics:

llvm.nvvm.fence.acquire.sync_restrict.space.cluster.scope.cluster
llvm.nvvm.fence.release.sync_restrict.space.cta.scope.cluster
llvm.nvvm.fence.mbarrier_init.release.cluster
llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.cluster.scope.cluster
llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.cta.scope.cluster llvm.nvvm.fence.proxy.alias
llvm.nvvm.fence.proxy.async
llvm.nvvm.fence.proxy.async.global
llvm.nvvm.fence.proxy.async.shared_cluster
llvm.nvvm.fence.proxy.async.shared_cta

For more information, please refere the PTX ISA

llvmbot · 2025-11-04T11:37:46Z

@llvm/pr-subscribers-backend-nvptx

Author: Pradeep Kumar (schwarzschild-radius)

Changes

This commit adds the below fence intrinsics:

llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.mbarrier_init.release.cluster
llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.proxy.alias
llvm.nvvm.fence.proxy.async
llvm.nvvm.fence.proxy.async.global
llvm.nvvm.fence.proxy.async.shared_cluster
llvm.nvvm.fence.proxy.async.shared_cta

For more information, please refere the PTX ISA

Full diff: https://github.com/llvm/llvm-project/pull/166352.diff

8 Files Affected:

(modified) llvm/docs/NVPTXUsage.rst (+106)
(modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+41-4)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+35)
(added) llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll (+27)
(added) llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll (+51)
(added) llvm/test/CodeGen/NVPTX/fence-proxy.ll (+15)
(added) llvm/test/CodeGen/NVPTX/op-fence.ll (+17)
(added) llvm/test/CodeGen/NVPTX/thread-fence.ll (+31)

diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 5ad8f9ab07e40..978a5035017b9 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -796,6 +796,112 @@ every time. For more information, refer PTX ISA
 Membar/Fences
 -------------
 
+'``llvm.nvvm.fence.acquire/release.sync_restrict.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  declare void @llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster()
+
+Overview:
+"""""""""
+
+The `nvvm.fence.{semantics}.sync_restrict.*` restrict the class of memory
+operations for which the fence instruction provides the memory ordering guarantees.
+When `.sync_restrict` is restricted to `shared_cta`, then memory semantics must
+be `release` and the effect of the fence operation only applies to operations
+performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
+restricted to `shared_cluster`, then memory semantics must be `acquire` and the
+effect of the fence operation only applies to operations performed on objects in
+`shared_cluster` memory space. The scope for both operations is `cluster`. For more details,
+please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.mbarrier_init.release.cluster``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.mbarrier_init.release.cluster()
+
+Overview:
+"""""""""
+
+`nvvm.fence.mbarrier_init.release.cluster` intrinsic restrict the class of
+memory operations for which the fence instruction provides the memory ordering
+guarantees. The `mbarrier_init` modifiers restricts the synchronizing effect to
+the prior `mbarrier_init` operation executed by the same thread on mbarrier objects
+in `shared_cta` memory space. For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.proxy.async_generic.acquire/release.sync_restrict``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  declare void @llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.shared_cta.scope.cluster()
+
+Overview:
+"""""""""
+
+`nvvm.fence.proxy.async_generic.{semantics}.sync_restrict` are used to establish
+ordering between a prior memory access performed via the `async proxy<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#proxies>__`
+and a subsequent memory access performed via the generic proxy.
+``nvvm.fence.proxy.async_generic.release.sync_restrict`` can form a release
+sequence that synchronizes with an acquire sequence that contains the
+``nvvm.fence.proxy.async_generic.acquire.sync_restrict`` proxy fence. When
+`.sync_restrict` is restricted to `shared_cta`, then memory semantics must
+be `release` and the effect of the fence operation only applies to operations
+performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
+restricted to `shared_cluster`, then memory semantics must be `acquire` and the
+effect of the fence operation only applies to operations performed on objects in
+`shared_cluster` memory space. The scope for both operations is `cluster`.
+For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.proxy.<proxykind>``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.proxy.alias()
+  declare void @llvm.nvvm.fence.proxy.async()
+  declare void @llvm.nvvm.fence.proxy.async.global()
+  declare void @llvm.nvvm.fence.proxy.async.shared_cluster()
+  declare void @llvm.nvvm.fence.proxy.async.shared_cta()
+
+Overview:
+"""""""""
+
+`nvvm.fence.proxy.{proxykind}` intrinsics represent a fence with bi-directional
+proxy ordering that is established between the memory accesses done between the
+`generic proxy<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#proxies>__`
+and the proxy specified by `proxykind`. A `bi-directional proxy` ordering between
+two proxykinds establishes two `uni-directional` proxy orderings: one from the
+first proxykind to the second proxykind and the other from the second proxykind
+to the first proxykind.
+
+`alias` proxykind refers to memory accesses performed using virtually aliased
+addresses to the same memory location
+
+`async` proxykind specifies that the memory ordering is established between the
+`async proxy` and the `generic proxy`. The memory ordering is limited only to
+operations performed on objects in the state space specified (`generic`, `global`,
+`shared_cluster`, `shared_cta`). If no state space is specified, then the memory
+ordering applies on all state spaces. For more details, please refer the
+`PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
 '``llvm.nvvm.fence.proxy.tensormap_generic.*``'
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 2710853e17688..e288d2075cbb4 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1747,18 +1747,43 @@ let TargetPrefix = "nvvm" in {
   }
 
   //
-  // Membar
+  // Membar / Fence
   //
   let IntrProperties = [IntrNoCallback] in {
     def int_nvvm_membar_cta : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_membar_gl : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_membar_sys : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_fence_sc_cluster : Intrinsic<[]>;
+
+    // Operation fence
+    def int_nvvm_fence_mbarrier_init_release_cluster: NVVMBuiltin,
+          Intrinsic<[], [], [],
+            "llvm.nvvm.fence.mbarrier_init.release.cluster">;
+
+    // Thread fence
+    def int_nvvm_fence_acquire_sync_restrict_space_shared_cluster_scope_cluster :
+          NVVMBuiltin, Intrinsic<[], [], [],
+            "llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster">;
+
+    def int_nvvm_fence_release_sync_restrict_space_shared_cta_scope_cluster :
+          NVVMBuiltin, Intrinsic<[], [], [],
+            "llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster">;
   }
 
-  //
-  // Proxy fence (uni-directional)
-  //
+//
+// Proxy fence (uni-directional)
+//
+
+let IntrProperties = [IntrNoCallback] in {
+  def int_nvvm_fence_proxy_async_generic_acquire_sync_restrict_space_shared_cluster_scope_cluster :
+        NVVMBuiltin, Intrinsic<[], [], [],
+          "llvm.nvvm.fence.proxy.async_generic.acquire.sync_restrict.space.shared_cluster.scope.cluster">;
+
+  def int_nvvm_fence_proxy_async_generic_release_sync_restrict_space_shared_cta_scope_cluster :
+        NVVMBuiltin, Intrinsic<[], [], [],
+          "llvm.nvvm.fence.proxy.async_generic.release.sync_restrict.space.shared_cta.scope.cluster">;
+}
+
   foreach scope = ["cta", "cluster", "gpu", "sys"] in {
 
     def int_nvvm_fence_proxy_tensormap_generic_release_ # scope :
@@ -1773,6 +1798,18 @@ let TargetPrefix = "nvvm" in {
                     "llvm.nvvm.fence.proxy.tensormap_generic.acquire." # scope>;
   }
 
+//
+// Proxy fence (bi-directional)
+//
+
+let IntrProperties = [IntrNoCallback] in {
+  foreach proxykind = ["alias", "async", "async.global", "async.shared_cta",
+                        "async.shared_cluster"] in {
+    defvar Intr = IntrinsicName<"llvm.nvvm.fence.proxy." # proxykind>;
+    def Intr.record_name: Intrinsic<[], [], [], Intr.intr_name>;
+  }
+}
+
 //
 // Async Copy
 //
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 50827bd548ad5..193aca11cba74 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -364,7 +364,42 @@ def INT_FENCE_SC_CLUSTER:
        NullaryInst<"fence.sc.cluster", int_nvvm_fence_sc_cluster>,
        Requires<[hasPTX<78>, hasSM<90>]>;
 
+def INT_FENCE_MBARRIER_INIT_RELEASE_CLUSTER:
+       NullaryInst<"fence.mbarrier_init.release.cluster",
+        int_nvvm_fence_mbarrier_init_release_cluster>,
+       Requires<[hasPTX<80>, hasSM<90>]>;
+
+let Predicates = [hasPTX<86>, hasSM<90>] in {
+def INT_FENCE_ACQUIRE_SYNC_RESTRICT_SHARED_CLUSTER_CLUSTER:
+       NullaryInst<"fence.acquire.sync_restrict::shared::cluster.cluster",
+        int_nvvm_fence_acquire_sync_restrict_space_shared_cluster_scope_cluster>;
+
+def INT_FENCE_RELEASE_SYNC_RESTRICT_SHARED_CTA_CLUSTER:
+       NullaryInst<"fence.release.sync_restrict::shared::cta.cluster",
+        int_nvvm_fence_release_sync_restrict_space_shared_cta_scope_cluster>;
+}
+
 // Proxy fence (uni-directional)
+let Predicates = [hasPTX<86>, hasSM<90>] in {
+def INT_NVVM_FENCE_PROXY_ASYNC_GENERIC_ACQUIRE_SYNC_RESTRICT_SPACE_SHARED_CLUSTER_SCOPE_CLUSTER:
+       NullaryInst<"fence.proxy.async::generic.acquire.sync_restrict::shared::cluster.cluster",
+        int_nvvm_fence_proxy_async_generic_acquire_sync_restrict_space_shared_cluster_scope_cluster>;
+
+def INT_NVVM_FENCE_PROXY_ASYNC_GENERIC_RELEASE_SYNC_RESTRICT_SPACE_SHARED_CTA_SCOPE_CLUSTER:
+       NullaryInst<"fence.proxy.async::generic.release.sync_restrict::shared::cta.cluster",
+        int_nvvm_fence_proxy_async_generic_release_sync_restrict_space_shared_cta_scope_cluster>;
+}
+
+// Proxy fence (bi-directional)
+foreach proxykind = ["alias", "async", "async.global", "async.shared_cta",
+                      "async.shared_cluster"] in {
+  defvar Preds = !if(!eq(proxykind, "alias"), [hasPTX<75>, hasSM<70>],
+                                              [hasPTX<80>, hasSM<90>]);
+  defvar Intr = IntrinsicName<"llvm.nvvm.fence.proxy." # proxykind>;
+  def : NullaryInst<"fence.proxy." # !subst("_", "::", proxykind),
+          !cast<Intrinsic>(Intr.record_name)>, Requires<Preds>;
+}
+
 class FENCE_PROXY_TENSORMAP_GENERIC_RELEASE<string Scope, Intrinsic Intr> :
         NullaryInst<"fence.proxy.tensormap::generic.release." # Scope, Intr>,
         Requires<[hasPTX<83>, hasSM<90>]>;
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll
new file mode 100644
index 0000000000000..3f758b58f24c4
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll
@@ -0,0 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.6 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | %ptxas-verify -arch=sm_90 %}
+
+define void @test_nvvm_fence_proxy_async_generic_acquire_sync_restrict() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_generic_acquire_sync_restrict(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async::generic.acquire.sync_restrict::shared::cluster.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async_generic.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_generic_release_sync_restrict() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_generic_release_sync_restrict(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async::generic.release.sync_restrict::shared::cta.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async_generic.release.sync_restrict.space.shared_cta.scope.cluster()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll
new file mode 100644
index 0000000000000..896c624602a60
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll
@@ -0,0 +1,51 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | %ptxas-verify -arch=sm_90 %}
+
+define void @test_nvvm_fence_proxy_async() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_global() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_global(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.global;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.global()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_shared_cluster() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_shared_cluster(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.shared::cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.shared_cluster()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_shared_cta() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_shared_cta(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.shared::cta;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.shared_cta()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy.ll b/llvm/test/CodeGen/NVPTX/fence-proxy.ll
new file mode 100644
index 0000000000000..cb5679e68944d
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy.ll
@@ -0,0 +1,15 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx75 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_70 && ptxas-isa-7.5 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx75 | %ptxas-verify -arch=sm_70 %}
+
+define void @test_nvvm_fence_proxy_alias() {
+; CHECK-LABEL: test_nvvm_fence_proxy_alias(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.alias;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.alias()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/op-fence.ll b/llvm/test/CodeGen/NVPTX/op-fence.ll
new file mode 100644
index 0000000000000..629b702742afb
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/op-fence.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | %ptxas-verify -arch=sm_90 %}
+
+; CHECK-LABEL: test_fence_mbarrier_init
+define void @test_fence_mbarrier_init() {
+; CHECK-LABEL: test_fence_mbarrier_init(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.mbarrier_init.release.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.mbarrier_init.release.cluster();
+
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/thread-fence.ll b/llvm/test/CodeGen/NVPTX/thread-fence.ll
new file mode 100644
index 0000000000000..bd92d18bc147e
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/thread-fence.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.6 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | %ptxas-verify -arch=sm_90 %}
+
+; CHECK-LABEL: test_fence_acquire
+define void @test_fence_acquire() {
+; CHECK-LABEL: test_fence_acquire(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.acquire.sync_restrict::shared::cluster.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster();
+
+  ret void
+}
+
+; CHECK-LABEL: test_fence_release
+define void @test_fence_release() {
+; CHECK-LABEL: test_fence_release(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.release.sync_restrict::shared::cta.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster();
+
+  ret void
+}

llvmbot · 2025-11-04T11:37:46Z

@llvm/pr-subscribers-llvm-ir

Author: Pradeep Kumar (schwarzschild-radius)

Changes

This commit adds the below fence intrinsics:

llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.mbarrier_init.release.cluster
llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.proxy.alias
llvm.nvvm.fence.proxy.async
llvm.nvvm.fence.proxy.async.global
llvm.nvvm.fence.proxy.async.shared_cluster
llvm.nvvm.fence.proxy.async.shared_cta

For more information, please refere the PTX ISA

Full diff: https://github.com/llvm/llvm-project/pull/166352.diff

8 Files Affected:

(modified) llvm/docs/NVPTXUsage.rst (+106)
(modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+41-4)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+35)
(added) llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll (+27)
(added) llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll (+51)
(added) llvm/test/CodeGen/NVPTX/fence-proxy.ll (+15)
(added) llvm/test/CodeGen/NVPTX/op-fence.ll (+17)
(added) llvm/test/CodeGen/NVPTX/thread-fence.ll (+31)

diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 5ad8f9ab07e40..978a5035017b9 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -796,6 +796,112 @@ every time. For more information, refer PTX ISA
 Membar/Fences
 -------------
 
+'``llvm.nvvm.fence.acquire/release.sync_restrict.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  declare void @llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster()
+
+Overview:
+"""""""""
+
+The `nvvm.fence.{semantics}.sync_restrict.*` restrict the class of memory
+operations for which the fence instruction provides the memory ordering guarantees.
+When `.sync_restrict` is restricted to `shared_cta`, then memory semantics must
+be `release` and the effect of the fence operation only applies to operations
+performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
+restricted to `shared_cluster`, then memory semantics must be `acquire` and the
+effect of the fence operation only applies to operations performed on objects in
+`shared_cluster` memory space. The scope for both operations is `cluster`. For more details,
+please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.mbarrier_init.release.cluster``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.mbarrier_init.release.cluster()
+
+Overview:
+"""""""""
+
+`nvvm.fence.mbarrier_init.release.cluster` intrinsic restrict the class of
+memory operations for which the fence instruction provides the memory ordering
+guarantees. The `mbarrier_init` modifiers restricts the synchronizing effect to
+the prior `mbarrier_init` operation executed by the same thread on mbarrier objects
+in `shared_cta` memory space. For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.proxy.async_generic.acquire/release.sync_restrict``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  declare void @llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.shared_cta.scope.cluster()
+
+Overview:
+"""""""""
+
+`nvvm.fence.proxy.async_generic.{semantics}.sync_restrict` are used to establish
+ordering between a prior memory access performed via the `async proxy<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#proxies>__`
+and a subsequent memory access performed via the generic proxy.
+``nvvm.fence.proxy.async_generic.release.sync_restrict`` can form a release
+sequence that synchronizes with an acquire sequence that contains the
+``nvvm.fence.proxy.async_generic.acquire.sync_restrict`` proxy fence. When
+`.sync_restrict` is restricted to `shared_cta`, then memory semantics must
+be `release` and the effect of the fence operation only applies to operations
+performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
+restricted to `shared_cluster`, then memory semantics must be `acquire` and the
+effect of the fence operation only applies to operations performed on objects in
+`shared_cluster` memory space. The scope for both operations is `cluster`.
+For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
+'``llvm.nvvm.fence.proxy.<proxykind>``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.fence.proxy.alias()
+  declare void @llvm.nvvm.fence.proxy.async()
+  declare void @llvm.nvvm.fence.proxy.async.global()
+  declare void @llvm.nvvm.fence.proxy.async.shared_cluster()
+  declare void @llvm.nvvm.fence.proxy.async.shared_cta()
+
+Overview:
+"""""""""
+
+`nvvm.fence.proxy.{proxykind}` intrinsics represent a fence with bi-directional
+proxy ordering that is established between the memory accesses done between the
+`generic proxy<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#proxies>__`
+and the proxy specified by `proxykind`. A `bi-directional proxy` ordering between
+two proxykinds establishes two `uni-directional` proxy orderings: one from the
+first proxykind to the second proxykind and the other from the second proxykind
+to the first proxykind.
+
+`alias` proxykind refers to memory accesses performed using virtually aliased
+addresses to the same memory location
+
+`async` proxykind specifies that the memory ordering is established between the
+`async proxy` and the `generic proxy`. The memory ordering is limited only to
+operations performed on objects in the state space specified (`generic`, `global`,
+`shared_cluster`, `shared_cta`). If no state space is specified, then the memory
+ordering applies on all state spaces. For more details, please refer the
+`PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
+
 '``llvm.nvvm.fence.proxy.tensormap_generic.*``'
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 2710853e17688..e288d2075cbb4 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1747,18 +1747,43 @@ let TargetPrefix = "nvvm" in {
   }
 
   //
-  // Membar
+  // Membar / Fence
   //
   let IntrProperties = [IntrNoCallback] in {
     def int_nvvm_membar_cta : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_membar_gl : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_membar_sys : NVVMBuiltin, Intrinsic<[]>;
     def int_nvvm_fence_sc_cluster : Intrinsic<[]>;
+
+    // Operation fence
+    def int_nvvm_fence_mbarrier_init_release_cluster: NVVMBuiltin,
+          Intrinsic<[], [], [],
+            "llvm.nvvm.fence.mbarrier_init.release.cluster">;
+
+    // Thread fence
+    def int_nvvm_fence_acquire_sync_restrict_space_shared_cluster_scope_cluster :
+          NVVMBuiltin, Intrinsic<[], [], [],
+            "llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster">;
+
+    def int_nvvm_fence_release_sync_restrict_space_shared_cta_scope_cluster :
+          NVVMBuiltin, Intrinsic<[], [], [],
+            "llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster">;
   }
 
-  //
-  // Proxy fence (uni-directional)
-  //
+//
+// Proxy fence (uni-directional)
+//
+
+let IntrProperties = [IntrNoCallback] in {
+  def int_nvvm_fence_proxy_async_generic_acquire_sync_restrict_space_shared_cluster_scope_cluster :
+        NVVMBuiltin, Intrinsic<[], [], [],
+          "llvm.nvvm.fence.proxy.async_generic.acquire.sync_restrict.space.shared_cluster.scope.cluster">;
+
+  def int_nvvm_fence_proxy_async_generic_release_sync_restrict_space_shared_cta_scope_cluster :
+        NVVMBuiltin, Intrinsic<[], [], [],
+          "llvm.nvvm.fence.proxy.async_generic.release.sync_restrict.space.shared_cta.scope.cluster">;
+}
+
   foreach scope = ["cta", "cluster", "gpu", "sys"] in {
 
     def int_nvvm_fence_proxy_tensormap_generic_release_ # scope :
@@ -1773,6 +1798,18 @@ let TargetPrefix = "nvvm" in {
                     "llvm.nvvm.fence.proxy.tensormap_generic.acquire." # scope>;
   }
 
+//
+// Proxy fence (bi-directional)
+//
+
+let IntrProperties = [IntrNoCallback] in {
+  foreach proxykind = ["alias", "async", "async.global", "async.shared_cta",
+                        "async.shared_cluster"] in {
+    defvar Intr = IntrinsicName<"llvm.nvvm.fence.proxy." # proxykind>;
+    def Intr.record_name: Intrinsic<[], [], [], Intr.intr_name>;
+  }
+}
+
 //
 // Async Copy
 //
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 50827bd548ad5..193aca11cba74 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -364,7 +364,42 @@ def INT_FENCE_SC_CLUSTER:
        NullaryInst<"fence.sc.cluster", int_nvvm_fence_sc_cluster>,
        Requires<[hasPTX<78>, hasSM<90>]>;
 
+def INT_FENCE_MBARRIER_INIT_RELEASE_CLUSTER:
+       NullaryInst<"fence.mbarrier_init.release.cluster",
+        int_nvvm_fence_mbarrier_init_release_cluster>,
+       Requires<[hasPTX<80>, hasSM<90>]>;
+
+let Predicates = [hasPTX<86>, hasSM<90>] in {
+def INT_FENCE_ACQUIRE_SYNC_RESTRICT_SHARED_CLUSTER_CLUSTER:
+       NullaryInst<"fence.acquire.sync_restrict::shared::cluster.cluster",
+        int_nvvm_fence_acquire_sync_restrict_space_shared_cluster_scope_cluster>;
+
+def INT_FENCE_RELEASE_SYNC_RESTRICT_SHARED_CTA_CLUSTER:
+       NullaryInst<"fence.release.sync_restrict::shared::cta.cluster",
+        int_nvvm_fence_release_sync_restrict_space_shared_cta_scope_cluster>;
+}
+
 // Proxy fence (uni-directional)
+let Predicates = [hasPTX<86>, hasSM<90>] in {
+def INT_NVVM_FENCE_PROXY_ASYNC_GENERIC_ACQUIRE_SYNC_RESTRICT_SPACE_SHARED_CLUSTER_SCOPE_CLUSTER:
+       NullaryInst<"fence.proxy.async::generic.acquire.sync_restrict::shared::cluster.cluster",
+        int_nvvm_fence_proxy_async_generic_acquire_sync_restrict_space_shared_cluster_scope_cluster>;
+
+def INT_NVVM_FENCE_PROXY_ASYNC_GENERIC_RELEASE_SYNC_RESTRICT_SPACE_SHARED_CTA_SCOPE_CLUSTER:
+       NullaryInst<"fence.proxy.async::generic.release.sync_restrict::shared::cta.cluster",
+        int_nvvm_fence_proxy_async_generic_release_sync_restrict_space_shared_cta_scope_cluster>;
+}
+
+// Proxy fence (bi-directional)
+foreach proxykind = ["alias", "async", "async.global", "async.shared_cta",
+                      "async.shared_cluster"] in {
+  defvar Preds = !if(!eq(proxykind, "alias"), [hasPTX<75>, hasSM<70>],
+                                              [hasPTX<80>, hasSM<90>]);
+  defvar Intr = IntrinsicName<"llvm.nvvm.fence.proxy." # proxykind>;
+  def : NullaryInst<"fence.proxy." # !subst("_", "::", proxykind),
+          !cast<Intrinsic>(Intr.record_name)>, Requires<Preds>;
+}
+
 class FENCE_PROXY_TENSORMAP_GENERIC_RELEASE<string Scope, Intrinsic Intr> :
         NullaryInst<"fence.proxy.tensormap::generic.release." # Scope, Intr>,
         Requires<[hasPTX<83>, hasSM<90>]>;
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll
new file mode 100644
index 0000000000000..3f758b58f24c4
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90-ptx86.ll
@@ -0,0 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.6 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | %ptxas-verify -arch=sm_90 %}
+
+define void @test_nvvm_fence_proxy_async_generic_acquire_sync_restrict() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_generic_acquire_sync_restrict(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async::generic.acquire.sync_restrict::shared::cluster.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async_generic.acquire.sync_restrict.space.shared_cluster.scope.cluster()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_generic_release_sync_restrict() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_generic_release_sync_restrict(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async::generic.release.sync_restrict::shared::cta.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async_generic.release.sync_restrict.space.shared_cta.scope.cluster()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll
new file mode 100644
index 0000000000000..896c624602a60
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy-sm90.ll
@@ -0,0 +1,51 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | %ptxas-verify -arch=sm_90 %}
+
+define void @test_nvvm_fence_proxy_async() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_global() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_global(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.global;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.global()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_shared_cluster() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_shared_cluster(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.shared::cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.shared_cluster()
+  ret void
+}
+
+define void @test_nvvm_fence_proxy_async_shared_cta() {
+; CHECK-LABEL: test_nvvm_fence_proxy_async_shared_cta(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.async.shared::cta;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.async.shared_cta()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/fence-proxy.ll b/llvm/test/CodeGen/NVPTX/fence-proxy.ll
new file mode 100644
index 0000000000000..cb5679e68944d
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fence-proxy.ll
@@ -0,0 +1,15 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx75 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_70 && ptxas-isa-7.5 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx75 | %ptxas-verify -arch=sm_70 %}
+
+define void @test_nvvm_fence_proxy_alias() {
+; CHECK-LABEL: test_nvvm_fence_proxy_alias(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.proxy.alias;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.proxy.alias()
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/op-fence.ll b/llvm/test/CodeGen/NVPTX/op-fence.ll
new file mode 100644
index 0000000000000..629b702742afb
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/op-fence.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80 | %ptxas-verify -arch=sm_90 %}
+
+; CHECK-LABEL: test_fence_mbarrier_init
+define void @test_fence_mbarrier_init() {
+; CHECK-LABEL: test_fence_mbarrier_init(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.mbarrier_init.release.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.mbarrier_init.release.cluster();
+
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/thread-fence.ll b/llvm/test/CodeGen/NVPTX/thread-fence.ll
new file mode 100644
index 0000000000000..bd92d18bc147e
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/thread-fence.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.6 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx86 | %ptxas-verify -arch=sm_90 %}
+
+; CHECK-LABEL: test_fence_acquire
+define void @test_fence_acquire() {
+; CHECK-LABEL: test_fence_acquire(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.acquire.sync_restrict::shared::cluster.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster();
+
+  ret void
+}
+
+; CHECK-LABEL: test_fence_release
+define void @test_fence_release() {
+; CHECK-LABEL: test_fence_release(
+; CHECK:       {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    fence.release.sync_restrict::shared::cta.cluster;
+; CHECK-NEXT:    ret;
+  call void @llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster();
+
+  ret void
+}

llvm/include/llvm/IR/IntrinsicsNVVM.td

durga4github

The latest revision LGTM

Artem-B

LGTM with a few nits.

llvm/include/llvm/IR/IntrinsicsNVVM.td

This commit adds the below fence intrinsics: llvm.nvvm.fence.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.mbarrier_init.release.cluster llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.shared_cluster.scope.cluster llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.shared_cta.scope.cluster llvm.nvvm.fence.proxy.alias llvm.nvvm.fence.proxy.async llvm.nvvm.fence.proxy.async.global llvm.nvvm.fence.proxy.async.shared_cluster llvm.nvvm.fence.proxy.async.shared_cta For more information, please refere the [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)

schwarzschild-radius requested a review from Artem-B November 4, 2025 11:37

llvmbot added backend:NVPTX llvm:ir labels Nov 4, 2025

schwarzschild-radius requested a review from durga4github November 4, 2025 11:37

durga4github reviewed Nov 5, 2025

View reviewed changes

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

durga4github reviewed Nov 5, 2025

View reviewed changes

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

durga4github mentioned this pull request Nov 6, 2025

[mlir][NVVM] Add nvvm.membar operation #166698

Merged

schwarzschild-radius force-pushed the fence_bw_nvptx_support branch 2 times, most recently from 954cad4 to d6da4c8 Compare November 10, 2025 12:57

durga4github approved these changes Nov 10, 2025

View reviewed changes

Artem-B approved these changes Nov 11, 2025

View reviewed changes

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

llvm/include/llvm/IR/IntrinsicsNVVM.td Show resolved Hide resolved

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

schwarzschild-radius force-pushed the fence_bw_nvptx_support branch from d6da4c8 to 7cde127 Compare November 11, 2025 04:41

Artem-B approved these changes Nov 13, 2025

View reviewed changes

schwarzschild-radius merged commit 522177c into llvm:main Nov 19, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Add a few more missing fence intrinsics #166352

[NVPTX] Add a few more missing fence intrinsics #166352

Uh oh!

schwarzschild-radius commented Nov 4, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Uh oh!

Artem-B left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[NVPTX] Add a few more missing fence intrinsics #166352

[NVPTX] Add a few more missing fence intrinsics #166352

Uh oh!

Conversation

schwarzschild-radius commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

schwarzschild-radius commented Nov 4, 2025 •

edited

Loading