Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Add builtin support for 'nanosleep' PTX instrunction #79888

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 29, 2024

Summary:
This patch adds a builtin for the nanosleep PTX function. It takes
either an immediate or a register and sleeps for [0, 2t] nanoseconds
given t. More information at the documentation:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-nanosleep

Summary:
This patch adds a builtin for the `nanosleep` PTX function. It takes
either an immediate or a register and sleeps for [0, 2t] nanoseconds
given t. More information at the documentation:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-nanosleep
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:ir labels Jan 29, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch adds a builtin for the nanosleep PTX function. It takes
either an immediate or a register and sleeps for [0, 2t] nanoseconds
given t. More information at the documentation:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-nanosleep


Full diff: https://github.com/llvm/llvm-project/pull/79888.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsNVPTX.def (+1)
  • (modified) clang/test/CodeGen/builtins-nvptx.c (+11)
  • (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+4)
  • (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+6)
  • (added) llvm/test/CodeGen/NVPTX/nanosleep.ll (+20)
diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def
index 0f2e8260143be78..ef3a37c8753d162 100644
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ b/clang/include/clang/Basic/BuiltinsNVPTX.def
@@ -155,6 +155,7 @@ BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
 // MISC
 
 BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
+TARGET_BUILTIN(__nvvm_nanosleep, "vi", "n", AND(SM_70, PTX63))
 
 // Min Max
 
diff --git a/clang/test/CodeGen/builtins-nvptx.c b/clang/test/CodeGen/builtins-nvptx.c
index 353f3ebb608c2b1..b209e2fbad98fb0 100644
--- a/clang/test/CodeGen/builtins-nvptx.c
+++ b/clang/test/CodeGen/builtins-nvptx.c
@@ -810,6 +810,17 @@ __device__ void nvvm_vote(int pred) {
   // CHECK: ret void
 }
 
+// CHECK-LABEL: nvvm_nanosleep
+__device__ void nvvm_nanosleep(int d) {
+#if __CUDA_ARCH__ >= 700
+  // CHECK_PTX70_SM80: call void @llvm.nvvm.nanosleep
+  __nvvm_nanosleep(d);
+
+  // CHECK_PTX70_SM80: call void @llvm.nvvm.nanosleep
+  __nvvm_nanosleep(1);
+#endif
+}
+
 // CHECK-LABEL: nvvm_mbarrier
 __device__ void nvvm_mbarrier(long long* addr, __attribute__((address_space(3))) long long* sharedAddr, int count, long long state) {
   #if __CUDA_ARCH__ >= 800
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 5a5ba2592e1467e..5d863b283d0466e 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -557,6 +557,10 @@ let TargetPrefix = "nvvm" in {
       DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
         [IntrNoMem, IntrSpeculatable]>;
 
+  def int_nvvm_nanosleep : ClangBuiltin<"__nvvm_nanosleep">,
+      DefaultAttrsIntrinsic<[], [llvm_i32_ty],
+                            [IntrConvergent, IntrNoMem, IntrHasSideEffects]>;
+
 //
 // Min Max
 //
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 33f1e4a43e072af..133514f4f48024e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -634,6 +634,12 @@ class F_MATH_3<string OpcStr, NVPTXRegClass t_regclass,
 def INT_NVVM_PRMT : F_MATH_3<"prmt.b32 \t$dst, $src0, $src1, $src2;", Int32Regs,
   Int32Regs, Int32Regs, Int32Regs, int_nvvm_prmt>;
 
+def INT_NVVM_NANOSLEEP_I : NVPTXInst<(outs), (ins i32imm:$i), "nanosleep.u32 \t$i;",
+                             [(int_nvvm_nanosleep imm:$i)]>,
+        Requires<[hasPTX<63>, hasSM<70>]>;
+def INT_NVVM_NANOSLEEP_R : NVPTXInst<(outs), (ins Int32Regs:$i), "nanosleep.u32 \t$i;",
+                             [(int_nvvm_nanosleep Int32Regs:$i)]>,
+        Requires<[hasPTX<63>, hasSM<70>]>;
 //
 // Min Max
 //
diff --git a/llvm/test/CodeGen/NVPTX/nanosleep.ll b/llvm/test/CodeGen/NVPTX/nanosleep.ll
new file mode 100644
index 000000000000000..1b2a7bf9476cf5f
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/nanosleep.ll
@@ -0,0 +1,20 @@
+; RUN: llc < %s -march=nvptx64 -O2 -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify %}
+
+declare void @llvm.nvvm.nanosleep(i32)
+
+; CHECK-LABEL: test_nanosleep_r
+define void @test_nanosleep_r(i32 noundef %d) {
+entry:
+; CHECK: nanosleep.u32   %[[REG:.+]];
+  call void @llvm.nvvm.nanosleep(i32 %d)
+  ret void
+}
+
+; CHECK-LABEL: test_nanosleep_i
+define void @test_nanosleep_i() {
+entry:
+; CHECK: nanosleep.u32   42;
+  call void @llvm.nvvm.nanosleep(i32 42)
+  ret void
+}

@jhuber6 jhuber6 merged commit 5f12cc9 into llvm:main Jan 29, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants