[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct #171597

ergawy · 2025-12-10T10:51:20Z

Having more than 1 descritpr (allocatable or array) on the same private clause triggers a runtime crash on GPUs at the moment.

For SPMD kernels, the issue happens because the initialization logic includes:

Allocating a number of temporary structs (these are emitted by flang when fir is lowered to mlir.llvm).
There is a conditional branch that determines whether we will allocate storage for the descriptor and initialize array bounds from the original descriptor or whether we will initialize the private descriptor to null.

Because of these 2 things, temp allocations needed for descriptors beyond the 1st one are preceded by branching which causes the observed the runtime crash.

This PR solves this issue by hoisting these static allocas instructions to the suitable allca IP of the parent construct.

llvmbot · 2025-12-10T10:51:52Z

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Kareem Ergawy (ergawy)

Changes

Having more than 1 descritpr (allocatable or array) on the same private clause triggers a runtime crash on GPUs at the moment.

For SPMD kernels, the issue happens because the initialization logic includes:

Allocating a number of temporary structs (these are emitted by flang when fir is lowered to mlir.llvm).
There is a conditional branch that determines whether we will allocate storage for the descriptor and initialize array bounds from the original descriptor or whether we will initialize the private descriptor to null.

Because of these 2 things, temp allocations needed for descriptors beyond the 1st one are preceded by branching which causes the observed the runtime crash.

This PR solves this issue by hoisting these static allocas instructions to the suitable allca IP of the parent construct.

Full diff: https://github.com/llvm/llvm-project/pull/171597.diff

2 Files Affected:

(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+66-8)
(added) mlir/test/Target/LLVMIR/openmp-private-allloca-hoisting.mlir (+79)

diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 03d67a52853f6..350763fac0d22 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1596,10 +1596,64 @@ static llvm::Expected<llvm::Value *> initPrivateVar(
   return phis[0];
 }
 
+/// Beginning with \p startBlock, this function visits all reachable successor
+/// blocks. For each such block, static alloca instructions (i.e. non-array
+/// allocas) are collected. Then, these collected alloca instructions are moved
+/// to the \p allocaIP insertion point.
+///
+/// This is useful in cases where, for example, more than one allocatable or
+/// array are privatized. In such cases, we allocate a number of temporary
+/// descriptors to handle the initialization logic. Additonally, for each
+/// private value, there is branching logic based on the value of the origianl
+/// private variable's allocation state. Therefore, we end up with descriptor
+/// alloca instructions preceded by conditional branches which casues runtime
+/// issues at least on the GPU.
+static void hoistStaticAllocasToAllocaIP(
+    llvm::BasicBlock *startBlock,
+    const llvm::OpenMPIRBuilder::InsertPointTy &allocaIP) {
+  llvm::SmallVector<llvm::BasicBlock *> inlinedBlocks{startBlock};
+  llvm::SmallPtrSet<llvm::BasicBlock *, 4> seenBlocks;
+  llvm::SmallVector<llvm::Instruction *> staticAllocas;
+
+  while (!inlinedBlocks.empty()) {
+    llvm::BasicBlock *curBlock = inlinedBlocks.front();
+    inlinedBlocks.erase(inlinedBlocks.begin());
+    llvm::Instruction *terminator = curBlock->getTerminator();
+
+    for (llvm::Instruction &inst : *curBlock) {
+      if (auto *allocaInst = mlir::dyn_cast<llvm::AllocaInst>(&inst)) {
+        if (!allocaInst->isArrayAllocation()) {
+#ifdef EXPENSIVE_CHECKS
+          assert(llvm::count(staticInitAllocas, allocaInst) == 0);
+#endif
+          staticAllocas.push_back(allocaInst);
+        }
+      }
+    }
+
+    if (!terminator || !terminator->isTerminator() ||
+        terminator->getNumSuccessors() == 0)
+      continue;
+
+    for (unsigned i = 0; i < terminator->getNumSuccessors(); ++i) {
+      llvm::BasicBlock *successor = terminator->getSuccessor(i);
+
+      if (!seenBlocks.contains(successor)) {
+        inlinedBlocks.push_back(successor);
+        seenBlocks.insert(successor);
+      }
+    }
+  }
+
+  for (llvm::Instruction *staticAlloca : staticAllocas)
+    staticAlloca->moveBefore(allocaIP.getPoint());
+}
+
 static llvm::Error
 initPrivateVars(llvm::IRBuilderBase &builder,
                 LLVM::ModuleTranslation &moduleTranslation,
                 PrivateVarsInfo &privateVarsInfo,
+                const llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
                 llvm::DenseMap<Value, Value> *mappedPrivateVars = nullptr) {
   if (privateVarsInfo.blockArgs.empty())
     return llvm::Error::success();
@@ -1624,6 +1678,8 @@ initPrivateVars(llvm::IRBuilderBase &builder,
     setInsertPointForPossiblyEmptyBlock(builder);
   }
 
+  hoistStaticAllocasToAllocaIP(privInitBlock, allocaIP);
+
   return llvm::Error::success();
 }
 
@@ -2575,7 +2631,8 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
                                 deferredStores, isByRef)))
     return failure();
 
-  if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo),
+  if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo,
+                                  allocaIP),
                   opInst)
           .failed())
     return failure();
@@ -2764,9 +2821,9 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
     assert(afterAllocas.get()->getSinglePredecessor());
     builder.restoreIP(codeGenIP);
 
-    if (handleError(
-            initPrivateVars(builder, moduleTranslation, privateVarsInfo),
-            *opInst)
+    if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo,
+                                    allocaIP),
+                    *opInst)
             .failed())
       return llvm::make_error<PreviouslyReportedError>();
 
@@ -2967,7 +3024,7 @@ convertOmpSimd(Operation &opInst, llvm::IRBuilderBase &builder,
                                 deferredStores, isByRef)))
     return failure();
 
-  if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo),
+  if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo, allocaIP),
                   opInst)
           .failed())
     return failure();
@@ -5288,8 +5345,9 @@ convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder,
     if (handleError(afterAllocas, opInst).failed())
       return llvm::make_error<PreviouslyReportedError>();
 
-    if (handleError(initPrivateVars(builder, moduleTranslation, privVarsInfo),
-                    opInst)
+    if (handleError(
+            initPrivateVars(builder, moduleTranslation, privVarsInfo, allocaIP),
+            opInst)
             .failed())
       return llvm::make_error<PreviouslyReportedError>();
 
@@ -6090,7 +6148,7 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
 
     builder.restoreIP(codeGenIP);
     if (handleError(initPrivateVars(builder, moduleTranslation, privateVarsInfo,
-                                    &mappedPrivateVars),
+                                    allocaIP, &mappedPrivateVars),
                     *targetOp)
             .failed())
       return llvm::make_error<PreviouslyReportedError>();
diff --git a/mlir/test/Target/LLVMIR/openmp-private-allloca-hoisting.mlir b/mlir/test/Target/LLVMIR/openmp-private-allloca-hoisting.mlir
new file mode 100644
index 0000000000000..5371f2814085e
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-private-allloca-hoisting.mlir
@@ -0,0 +1,79 @@
+// Tests that static alloca's in `omp.private ... init` regions are hoisted to
+// the parent construct's alloca IP.
+// RUN: mlir-translate -mlir-to-llvmir -split-input-file %s | FileCheck %s
+
+llvm.func @foo1()
+llvm.func @foo2()
+llvm.func @foo3()
+llvm.func @foo4()
+
+omp.private {type = private} @multi_block.privatizer : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %alloca1 = llvm.alloca %0 x !llvm.struct<(i64)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+
+  %1 = llvm.load %arg0 : !llvm.ptr -> f32
+
+  %c1 = llvm.mlir.constant(1 : i32) : i32
+  %c2 = llvm.mlir.constant(2 : i32) : i32
+  %cond1 = llvm.icmp "eq" %c1, %c2 : i32
+  llvm.cond_br %cond1, ^bb1, ^bb2
+
+^bb1:
+  llvm.call @foo1() : () -> ()
+  llvm.br ^bb3
+
+^bb2:
+  llvm.call @foo2() : () -> ()
+  llvm.br ^bb3
+
+^bb3:
+  llvm.store %1, %arg1 : f32, !llvm.ptr
+
+  omp.yield(%arg1 : !llvm.ptr)
+}
+
+omp.private {type = private} @multi_block.privatizer2 : f32 init {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  %0 = llvm.mlir.constant(1 : i32) : i32
+  %alloca1 = llvm.alloca %0 x !llvm.struct<(ptr)> {alignment = 8 : i64} : (i32) -> !llvm.ptr
+
+  %1 = llvm.load %arg0 : !llvm.ptr -> f32
+
+  %c1 = llvm.mlir.constant(1 : i32) : i32
+  %c2 = llvm.mlir.constant(2 : i32) : i32
+  %cond1 = llvm.icmp "eq" %c1, %c2 : i32
+  llvm.cond_br %cond1, ^bb1, ^bb2
+
+^bb1:
+  llvm.call @foo3() : () -> ()
+  llvm.br ^bb3
+
+^bb2:
+  llvm.call @foo4() : () -> ()
+  llvm.br ^bb3
+
+^bb3:
+  llvm.store %1, %arg1 : f32, !llvm.ptr
+
+  omp.yield(%arg1 : !llvm.ptr)
+}
+
+llvm.func @parallel_op_private_multi_block(%arg0: !llvm.ptr, %arg1: !llvm.ptr) {
+  omp.parallel private(@multi_block.privatizer %arg0 -> %arg2,
+                       @multi_block.privatizer2 %arg1 -> %arg3 : !llvm.ptr, !llvm.ptr) {
+    %0 = llvm.load %arg2 : !llvm.ptr -> f32
+    %1 = llvm.load %arg3 : !llvm.ptr -> f32
+    omp.terminator
+  }
+  llvm.return
+}
+
+// CHECK: define internal void @parallel_op_private_multi_block..omp_par({{.*}}) {{.*}} {
+// CHECK: omp.par.entry:
+// Varify that both allocas were hoisted to the parallel region's entry block.
+// CHECK:        %{{.*}} = alloca { i64 }, align 8
+// CHECK-NEXT:   %{{.*}} = alloca { ptr }, align 8
+// CHECK-NEXT:   br label %omp.region.after_alloca
+// CHECK: omp.region.after_alloca:
+// CHECK: }

github-actions · 2025-12-10T10:52:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ergawy · 2025-12-10T11:00:20Z

We probably need to do something similar to reductions as well. But this PR focuses on private clauses.

tblah

Thanks for the fix

tblah · 2025-12-10T17:16:47Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+
+    for (llvm::Instruction &inst : *curBlock) {
+      if (auto *allocaInst = mlir::dyn_cast<llvm::AllocaInst>(&inst)) {
+        if (!allocaInst->isArrayAllocation()) {


Why are array allocations special? I would have thought the case worth worrying about is array allocations which have a size determined dynamically - a statically sized array allocation should work okay.

Would multiple of these dynamically sized arrays still crash the GPU code?

For descriptors, the problematic allocations are the allocations of the temporary descriptor structures used to initialize the private storage. All such allocations are static allocations because they are just structs.

The dynamic arrays are allocated only when the original value is allocated, so these allocations has to be maintained in the proper branch since we read the shape from the original value.

The problem when we have many descriptors is that:

We inline the init region of descriptor number 1 which includes temp allocations + the if-else branch for initialization.

We do the same for descriptor number 2.

.....
Because of that, such temp allocations are emitted between if-else branching.

I think the backend is smart enough to keep the dynamic allocations since they are obviously protected/tucked inside a branch while some of the static allocations after the branching joins again are problamtic since they are supposed to be uncoditional.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

…ns to the allocation IP of the construct Having more than 1 descritpr (allocatable or array) on the same `private` clause triggers a runtime crash on GPUs at the moment. For SPMD kernels, the issue happens because the initialization logic includes: * Allocating a number of temporary structs (these are emitted by flang when `fir` is lowered to `mlir.llvm`). * There is a conditional branch that determines whether we will allocate storage for the descriptor and initialize array bounds from the original descriptor or whether we will initialize the private descriptor to null. Because of these 2 things, temp allocations needed for descriptors beyond the 1st one are preceded by branching which causes the observed the runtime crash. This PR solves this issue by hoisting these static `alloca`s instructions to the suitable allca IP of the parent construct.

llvmbot added mlir:llvm mlir mlir:openmp flang:openmp labels Dec 10, 2025

ergawy force-pushed the users/ergawy/hoist_init_allocas branch from 873d0eb to 9e64fef Compare December 10, 2025 10:54

ergawy requested review from bhandarkar-pranav, jsonn, skatrak and tblah December 10, 2025 11:49

tblah reviewed Dec 10, 2025

View reviewed changes

ergawy added 2 commits December 11, 2025 03:13

review comments, Tom

c9c432f

ergawy force-pushed the users/ergawy/hoist_init_allocas branch from 9e64fef to c9c432f Compare December 11, 2025 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct #171597

[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct #171597

ergawy commented Dec 10, 2025

Uh oh!

llvmbot commented Dec 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

ergawy commented Dec 10, 2025

Uh oh!

tblah left a comment

Uh oh!

tblah Dec 10, 2025

Uh oh!

ergawy Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[OpenMP][MLIR] Hoist static allocas emitted by private init regions to the allocation IP of the construct #171597

Are you sure you want to change the base?

[OpenMP][MLIR] Hoist static allocas emitted by private init regions to the allocation IP of the construct #171597

Conversation

ergawy commented Dec 10, 2025

Uh oh!

llvmbot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ergawy commented Dec 10, 2025

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

tblah Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

ergawy Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct #171597

[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct #171597

llvmbot commented Dec 10, 2025 •

edited

Loading

github-actions bot commented Dec 10, 2025 •

edited

Loading

ergawy Dec 11, 2025 •

edited

Loading