Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Fix code generation for trap-unreachable. #67478

Merged
merged 6 commits into from
Oct 1, 2023

Conversation

chsigg
Copy link
Contributor

@chsigg chsigg commented Sep 26, 2023

https://reviews.llvm.org/D152789 added an exit op before each unreachable. This means we never get to the trap instruction.

This change limits the insertion of exit instructions to the cases where unreachable is not lowered to trap. Trap itself is changed to be emitted as trap; exit; to convey to ptxas that it exits the CFG.

@chsigg
Copy link
Contributor Author

chsigg commented Sep 26, 2023

I think we would then want skip lowering ISD::TRAP, so that we don't end up with trap; exit; trap; sequences.

Actually, no. Because ISD::TRAP can also be generated elsewhere, not just from unreachable (e.g. from Intrinsic::trap).

https://reviews.llvm.org/D152789 added an `exit` op before each `unreachable`. This means we never get to the `trap` instruction.

- When `trap-unreachable` is enabled and `no-trap-after-noreturn` is not, don't insert `exit` before each `unreachable`.
- Lower ISD::TRAP to both `trap` and `exit` instead of just the former.

The fix doesn't work with `no-trap-after-noreturn`, because the `unreachable`s not following a `noreturn` are lowered to `exit; trap; exit;`.

An alternative approach would be to insert `trap`s in `NVPTXLowerUnreachablePass`, depending on the `trap-unreachable` and `no-trap-after-noreturn` settings. I think we would then want skip lowering ISD::TRAP, so that we don't end up with `trap; exit; trap;` sequences.
@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Sep 27, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 27, 2023

@llvm/pr-subscribers-llvm-selectiondag

Changes

https://reviews.llvm.org/D152789 added an exit op before each unreachable. This means we never get to the trap instruction.

  • When trap-unreachable is enabled and no-trap-after-noreturn is not, don't insert exit before each unreachable.
  • Lower ISD::TRAP to both trap and exit instead of just the former.

The fix doesn't work with no-trap-after-noreturn, because the unreachables not following a noreturn are lowered to exit; trap; exit;.

An alternative approach would be to insert traps in NVPTXLowerUnreachablePass, depending on the trap-unreachable and no-trap-after-noreturn settings. I think we would then want skip lowering ISD::TRAP, so that we don't end up with trap; exit; trap; sequences.


Full diff: https://github.com/llvm/llvm-project/pull/67478.diff

5 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+1-6)
  • (modified) llvm/lib/Target/NVPTX/NVPTX.h (+2-1)
  • (modified) llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp (+38-7)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+3-9)
  • (modified) llvm/test/CodeGen/NVPTX/unreachable.ll (+12-3)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index f39b62abdd87790..ba0ab3586f75825 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3226,14 +3226,9 @@ void SelectionDAGBuilder::visitUnreachable(const UnreachableInst &I) {
 
   // We may be able to ignore unreachable behind a noreturn call.
   if (DAG.getTarget().Options.NoTrapAfterNoreturn) {
-    const BasicBlock &BB = *I.getParent();
-    if (&I != &BB.front()) {
-      BasicBlock::const_iterator PredI =
-        std::prev(BasicBlock::const_iterator(&I));
-      if (const CallInst *Call = dyn_cast<CallInst>(&*PredI)) {
+    if (const CallInst *Call = dyn_cast_or_null<CallInst>(I.getPrevNode())) {
         if (Call->doesNotReturn())
           return;
-      }
     }
   }
 
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index c5816b9266dfd9e..8dc68911fff0c05 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -47,7 +47,8 @@ MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
 FunctionPass *createNVPTXImageOptimizerPass();
 FunctionPass *createNVPTXLowerArgsPass();
 FunctionPass *createNVPTXLowerAllocaPass();
-FunctionPass *createNVPTXLowerUnreachablePass();
+FunctionPass *createNVPTXLowerUnreachablePass(bool TrapUnreachable,
+                                              bool NoTrapAfterNoreturn);
 MachineFunctionPass *createNVPTXPeephole();
 MachineFunctionPass *createNVPTXProxyRegErasurePass();
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp b/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
index 1d312f82e6c061c..efafd909b93be37 100644
--- a/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
@@ -72,6 +72,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/Instructions.h"
+#include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/Type.h"
 #include "llvm/Pass.h"
 
@@ -83,14 +84,19 @@ void initializeNVPTXLowerUnreachablePass(PassRegistry &);
 
 namespace {
 class NVPTXLowerUnreachable : public FunctionPass {
+  StringRef getPassName() const override;
   bool runOnFunction(Function &F) override;
+  bool shouldEmitTrap(const UnreachableInst &I) const;
 
 public:
   static char ID; // Pass identification, replacement for typeid
-  NVPTXLowerUnreachable() : FunctionPass(ID) {}
-  StringRef getPassName() const override {
-    return "add an exit instruction before every unreachable";
-  }
+  NVPTXLowerUnreachable(bool TrapUnreachable, bool NoTrapAfterNoreturn)
+      : FunctionPass(ID), TrapUnreachable(TrapUnreachable),
+        NoTrapAfterNoreturn(NoTrapAfterNoreturn) {}
+
+private:
+  bool TrapUnreachable;
+  bool NoTrapAfterNoreturn;
 };
 } // namespace
 
@@ -99,6 +105,24 @@ char NVPTXLowerUnreachable::ID = 1;
 INITIALIZE_PASS(NVPTXLowerUnreachable, "nvptx-lower-unreachable",
                 "Lower Unreachable", false, false)
 
+StringRef NVPTXLowerUnreachable::getPassName() const {
+  return "add an exit instruction before every unreachable";
+}
+
+// =============================================================================
+// Returns whether a `trap` intrinsic should be emitted before I.
+//
+// This is a copy of the logic in SelectionDAGBuilder::visitUnreachable().
+// =============================================================================
+bool NVPTXLowerUnreachable::shouldEmitTrap(const UnreachableInst &I) const {
+  if (!TrapUnreachable)
+    return false;
+  if (!NoTrapAfterNoreturn)
+    return true;
+  const CallInst *Call = dyn_cast_or_null<CallInst>(I.getPrevNode());
+  return Call && Call->doesNotReturn();
+}
+
 // =============================================================================
 // Main function for this pass.
 // =============================================================================
@@ -109,18 +133,25 @@ bool NVPTXLowerUnreachable::runOnFunction(Function &F) {
   LLVMContext &C = F.getContext();
   FunctionType *ExitFTy = FunctionType::get(Type::getVoidTy(C), false);
   InlineAsm *Exit = InlineAsm::get(ExitFTy, "exit;", "", true);
+  Function *Trap = nullptr;
 
   bool Changed = false;
   for (auto &BB : F)
     for (auto &I : BB) {
       if (auto unreachableInst = dyn_cast<UnreachableInst>(&I)) {
-        Changed = true;
+        if (shouldEmitTrap(*unreachableInst)) {
+          if (!Trap)
+            Trap = Intrinsic::getDeclaration(F.getParent(), Intrinsic::trap);
+          CallInst::Create(Trap, "", unreachableInst);
+        }
         CallInst::Create(ExitFTy, Exit, "", unreachableInst);
+        Changed = true;
       }
     }
   return Changed;
 }
 
-FunctionPass *llvm::createNVPTXLowerUnreachablePass() {
-  return new NVPTXLowerUnreachable();
+FunctionPass *llvm::createNVPTXLowerUnreachablePass(bool TrapUnreachable,
+                                                    bool NoTrapAfterNoreturn) {
+  return new NVPTXLowerUnreachable(TrapUnreachable, NoTrapAfterNoreturn);
 }
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index cad97b1f14eb2b9..8d895762fbe1d9d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -63,13 +63,6 @@ static cl::opt<bool> UseShortPointersOpt(
         "Use 32-bit pointers for accessing const/local/shared address spaces."),
     cl::init(false), cl::Hidden);
 
-// FIXME: intended as a temporary debugging aid. Should be removed before it
-// makes it into the LLVM-17 release.
-static cl::opt<bool>
-    ExitOnUnreachable("nvptx-exit-on-unreachable",
-                      cl::desc("Lower 'unreachable' as 'exit' instruction."),
-                      cl::init(true), cl::Hidden);
-
 namespace llvm {
 
 void initializeGenericToNVVMLegacyPassPass(PassRegistry &);
@@ -410,8 +403,9 @@ void NVPTXPassConfig::addIRPasses() {
     addPass(createSROAPass());
   }
 
-  if (ExitOnUnreachable)
-    addPass(createNVPTXLowerUnreachablePass());
+  const auto &Options = getNVPTXTargetMachine().Options;
+  addPass(createNVPTXLowerUnreachablePass(Options.TrapUnreachable,
+                                          Options.NoTrapAfterNoreturn));
 }
 
 bool NVPTXPassConfig::addInstSelector() {
diff --git a/llvm/test/CodeGen/NVPTX/unreachable.ll b/llvm/test/CodeGen/NVPTX/unreachable.ll
index 742089df1bd4533..011497c4e23401a 100644
--- a/llvm/test/CodeGen/NVPTX/unreachable.ll
+++ b/llvm/test/CodeGen/NVPTX/unreachable.ll
@@ -1,5 +1,11 @@
-; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
-; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs \
+; RUN:     | FileCheck %s  --check-prefix=CHECK --check-prefix=CHECK-NOTRAP
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs \
+; RUN:     | FileCheck %s  --check-prefix=CHECK --check-prefix=CHECK-NOTRAP
+; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs -trap-unreachable \
+; RUN:     | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-TRAP
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs -trap-unreachable \
+; RUN:     | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-TRAP
 ; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
 
@@ -11,7 +17,10 @@ define void @kernel_func() {
 ; CHECK: call.uni
 ; CHECK: throw,
   call void @throw()
-; CHECK: exit
+; CHECK-TRAP-NOT: exit;
+; CHECK-TRAP: trap;
+; CHECK-NOTRAP-NOT: trap;
+; CHECK: exit;
   unreachable
 }
 

@chsigg chsigg requested a review from Artem-B September 27, 2023 07:43
@chsigg
Copy link
Contributor Author

chsigg commented Sep 27, 2023

I moved the implementation to NVPTXLowerUnreachablePass, which is better because it also fixes the no-trap-after-noreturn case.

I'm wondering though if we should convey to ptxas that trap exits the CFG by always emitting ISD::TRAP as trap; exit; (under the assumption that we don't support predicated trap). This would avoid the slightly weird trap; exit; trap; for trap-unreachable (i.e., NVPTXLowerUnreachablePass would only add exit iff unsupported is dropped later on).

@Artem-B
Copy link
Member

Artem-B commented Sep 27, 2023

always emitting ISD::TRAP as trap; exit;

That would be fine.

@Artem-B
Copy link
Member

Artem-B commented Sep 27, 2023

@maleadt FYI. For some reason I can't add you as a reviewer on this PR.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.

vwbaker added a commit to vwbaker/triton that referenced this pull request Sep 28, 2023
It was possible for multiDimWarpId[1] to be 0 which then gets translated
into a `urem 0, 0` and results in an unreachable when going through
llvm, an empty kernel, and nans.

chsigg is working on a fix to lower the unreachable in llvm to a trap
(llvm/llvm-project#67478).
vwbaker added a commit to vwbaker/triton that referenced this pull request Sep 28, 2023
It was possible for multiDimWarpId[1] to be 0 which then gets translated
into a `urem 0, 0` and results in an unreachable when going through
llvm, an empty kernel, and nans.

chsigg is working on a fix to lower the unreachable in llvm to a trap
(llvm/llvm-project#67478).
@github-actions
Copy link

github-actions bot commented Sep 28, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@chsigg
Copy link
Contributor Author

chsigg commented Sep 28, 2023

always emitting ISD::TRAP as trap; exit;

That would be fine.

Ok, I changed it to that. It seems a little cleaner because it a) avoids trap; exit; trap; and b) conveys to ptxas that traps from intrinsics exit the CFG as well.

Copy link
Contributor

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It didn't occur to me to preserve the trap as in Julia we're mostly avoiding that instruction since it 'breaks' the GPU for the current process, without a way to reset.

EDIT: just to be sure, I ran our GPU testsuite on an older device where JuliaGPU/CUDA.jl#1746 is an issue, and using trap; exit as proposed here doesn't seem to cause any issues.

ptillet pushed a commit to triton-lang/triton that referenced this pull request Sep 30, 2023
It was possible for multiDimWarpId[1] to be 0 which then gets translated
into a `urem 0, 0` and results in an unreachable when going through
llvm, an empty kernel, and nans. This PR uses ceiling to clamp the
result to be >=1.

chsigg is working on a fix to lower the unreachable in llvm to a trap
(llvm/llvm-project#67478).
@chsigg chsigg merged commit 5b7a7ec into llvm:main Oct 1, 2023
2 of 3 checks passed
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Oct 12, 2023
Local branch amd-gfx 5546a04 Merged main:18461dc45483 into amd-gfx:2f63a363aacc
Remote branch main 5b7a7ec [NVPTX] Fix code generation for `trap-unreachable`. (llvm#67478)
@chsigg chsigg deleted the piper_export_cl_568482300 branch February 15, 2024 09:19
@Artem-B
Copy link
Member

Artem-B commented Feb 15, 2024

We've removed nvptx-exit-on-unreachable knob which used to disable generation of exit.

#72641 tries to disable it for some unreachable branches, which I'm not convinced we should do, but we do not seem to have an escape hatch option any more, as an alternative.

pingzhuu pushed a commit to siliconflow/triton that referenced this pull request Apr 2, 2024
It was possible for multiDimWarpId[1] to be 0 which then gets translated
into a `urem 0, 0` and results in an unreachable when going through
llvm, an empty kernel, and nans. This PR uses ceiling to clamp the
result to be >=1.

chsigg is working on a fix to lower the unreachable in llvm to a trap
(llvm/llvm-project#67478).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants