Skip to content

Conversation

skc7
Copy link
Contributor

@skc7 skc7 commented Sep 9, 2025

In OpenMPToLLVMIRTranslation, If host_eval vars are empty, LoopTripCount cannot be calculated.

This PR fixes this issue by adding a check to presence of host_eval vars before calculating the loopTripCount.

@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Chaitanya (skc7)

Changes

In OpenMPToLLVMIRTranslation, If host_eval vars are empty, LoopTripCount cannot be calculated.

This PR fixes this issue by adding a check to presence of host_eval vars before calculating the loopTripCount.


Full diff: https://github.com/llvm/llvm-project/pull/157717.diff

2 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+7-2)
  • (added) mlir/test/Target/LLVMIR/omptarget-loopnest-no-host-eval.mlir (+28)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 6694de8383534..019ce626903b2 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -5393,8 +5393,11 @@ initTargetRuntimeAttrs(llvm::IRBuilderBase &builder,
   Value numThreads, numTeamsLower, numTeamsUpper, teamsThreadLimit;
   llvm::SmallVector<Value> lowerBounds(numLoops), upperBounds(numLoops),
       steps(numLoops);
-  extractHostEvalClauses(targetOp, numThreads, numTeamsLower, numTeamsUpper,
-                         teamsThreadLimit, &lowerBounds, &upperBounds, &steps);
+  if (!targetOp.getHostEvalVars().empty()) {
+    extractHostEvalClauses(targetOp, numThreads, numTeamsLower, numTeamsUpper,
+                           teamsThreadLimit, &lowerBounds, &upperBounds,
+                           &steps);
+  }
 
   // TODO: Handle constant 'if' clauses.
   if (Value targetThreadLimit = targetOp.getThreadLimit())
@@ -5424,6 +5427,8 @@ initTargetRuntimeAttrs(llvm::IRBuilderBase &builder,
     // here, since we're only interested in the trip count.
     for (auto [loopLower, loopUpper, loopStep] :
          llvm::zip_equal(lowerBounds, upperBounds, steps)) {
+      if (!loopLower || !loopUpper || !loopStep)
+        break;
       llvm::Value *lowerBound = moduleTranslation.lookupValue(loopLower);
       llvm::Value *upperBound = moduleTranslation.lookupValue(loopUpper);
       llvm::Value *step = moduleTranslation.lookupValue(loopStep);
diff --git a/mlir/test/Target/LLVMIR/omptarget-loopnest-no-host-eval.mlir b/mlir/test/Target/LLVMIR/omptarget-loopnest-no-host-eval.mlir
new file mode 100644
index 0000000000000..8628c115364f9
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-loopnest-no-host-eval.mlir
@@ -0,0 +1,28 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+// Ensure that the -mlir-to-llmvir pass doesn't crash.
+
+// CHECK: define void @_QQmain()
+
+module attributes {llvm.target_triple = "x86_64-unknown-linux-gnu", omp.is_gpu = false, omp.is_target_device = false} {
+  llvm.func @_QQmain()  {
+    omp.target {
+      %0 = llvm.mlir.constant(1000 : i32) : i32
+      %1 = llvm.mlir.constant(1 : i32) : i32
+      omp.teams {
+        omp.parallel {
+          omp.distribute {
+            omp.wsloop {
+              omp.loop_nest (%arg0) : i32 = (%1) to (%0) inclusive step (%1) {
+                omp.yield
+              }
+            } {omp.composite}
+          } {omp.composite}
+          omp.terminator
+        } {omp.composite}
+        omp.terminator
+      }
+      omp.terminator
+    }
+    llvm.return
+  }
+}

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is correct. Having host-evaluated loop bounds is a hard requirement for SPMD kernels (i.e. target teams distribute parallel do). Even if they are constants, they must be passed as host_eval arguments to omp.target when compiling for the host. See for example the unit test at "flang/test/Lower/OpenMP/target-spmd.f90".

It is only when compiling for a target device (i.e. the module attribute omp.is_target_device=true) that we don't use host_eval.

What is the motivation for this change, is there any code currently producing such invalid MLIR?

@skc7
Copy link
Contributor Author

skc7 commented Sep 10, 2025

I don't think this change is correct. Having host-evaluated loop bounds is a hard requirement for SPMD kernels (i.e. target teams distribute parallel do). Even if they are constants, they must be passed as host_eval arguments to omp.target when compiling for the host. See for example the unit test at "flang/test/Lower/OpenMP/target-spmd.f90".

It is only when compiling for a target device (i.e. the module attribute omp.is_target_device=true) that we don't use host_eval.

What is the motivation for this change, is there any code currently producing such invalid MLIR?

This change is required for #140523 workdistribute construct lowering, where in a fir.do_loop unordered is identified under target teams workdistribute region and then lowered to target{teams{parallel{distribute{wsloop{loop_nest}}}.

The test mentioned in this PR passes the verifier, but crashes at lowering from openmp to llvm ir.
I'm unaware that host_eval clause needs to have loop bounds information even though they are constants.

@skatrak
Copy link
Member

skatrak commented Sep 11, 2025

This change is required for #140523 workdistribute construct lowering, where in a fir.do_loop unordered is identified under target teams workdistribute region and then lowered to target{teams{parallel{distribute{wsloop{loop_nest}}}.

The test mentioned in this PR passes the verifier, but crashes at lowering from openmp to llvm ir. I'm unaware that host_eval clause needs to have loop bounds information even though they are constants.

If that's the case, then I think some changes would probably have to be done to workdistribute lowering. I think it should conform to the rules of the dialect, so we can properly codegen for the resulting target region. If you look at the current work being done for automatic offloading of do-concurrent (#155987), you can see it is able to follow these rules and produce the expected MLIR.

I think the core issue here is that, whenever omp.workdistribute is replaced with the corresponding set of nested operations, we need to check whether it's located in a target region and, in that case, find their value outside of the target region (or copy the constant) and pass that as host_eval. Ideally, I would try to do this earlier in Flang lowering, rather than as a transformation pass, because that's where we have all of the host_eval infrastructure prepared to do exactly this (see processHostEvalClauses and HostEvalInfo in flang/lib/Lower/OpenMP/OpenMP.cpp).

@skc7
Copy link
Contributor Author

skc7 commented Sep 15, 2025

Closing this PR, since host_eval is needed in host compilation.

@skc7 skc7 closed this Sep 15, 2025
skc7 added a commit to skc7/llvm-project that referenced this pull request Sep 29, 2025
skc7 added a commit to skc7/llvm-project that referenced this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants