[flang][cuda][NFC] Use NVVM barrier op with reduction #167940

clementval · 2025-11-13T19:59:15Z

Simplify the lowering by using the barrier op from NVVM updated in #167036

llvmbot · 2025-11-13T19:59:51Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタインクレメン) (clementval)

Changes

Simplify the lowering by using the barrier op from NVVM updated in #167036

Full diff: https://github.com/llvm/llvm-project/pull/167940.diff

2 Files Affected:

(modified) flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp (+21-24)
(modified) flang/test/Lower/CUDA/cuda-device-proc.cuf (+9-9)

diff --git a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
index 323d1ef78e65d..f67129dfa6730 100644
--- a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
@@ -1080,42 +1080,39 @@ void CUDAIntrinsicLibrary::genSyncThreads(
 mlir::Value
 CUDAIntrinsicLibrary::genSyncThreadsAnd(mlir::Type resultType,
                                         llvm::ArrayRef<mlir::Value> args) {
-  constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0.and";
-  mlir::MLIRContext *context = builder.getContext();
-  mlir::Type i32 = builder.getI32Type();
-  mlir::FunctionType ftype =
-      mlir::FunctionType::get(context, {resultType}, {i32});
-  auto funcOp = builder.createFunction(loc, funcName, ftype);
-  mlir::Value arg = builder.createConvert(loc, i32, args[0]);
-  return fir::CallOp::create(builder, loc, funcOp, {arg}).getResult(0);
+  mlir::Value arg = builder.createConvert(loc, builder.getI32Type(), args[0]);
+  return mlir::NVVM::BarrierOp::create(
+             builder, loc, resultType, {}, {},
+             mlir::NVVM::BarrierReductionAttr::get(
+                 builder.getContext(), mlir::NVVM::BarrierReduction::AND),
+             arg)
+      .getResult(0);
 }
 
 // SYNCTHREADS_COUNT
 mlir::Value
 CUDAIntrinsicLibrary::genSyncThreadsCount(mlir::Type resultType,
                                           llvm::ArrayRef<mlir::Value> args) {
-  constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0.popc";
-  mlir::MLIRContext *context = builder.getContext();
-  mlir::Type i32 = builder.getI32Type();
-  mlir::FunctionType ftype =
-      mlir::FunctionType::get(context, {resultType}, {i32});
-  auto funcOp = builder.createFunction(loc, funcName, ftype);
-  mlir::Value arg = builder.createConvert(loc, i32, args[0]);
-  return fir::CallOp::create(builder, loc, funcOp, {arg}).getResult(0);
+  mlir::Value arg = builder.createConvert(loc, builder.getI32Type(), args[0]);
+  return mlir::NVVM::BarrierOp::create(
+             builder, loc, resultType, {}, {},
+             mlir::NVVM::BarrierReductionAttr::get(
+                 builder.getContext(), mlir::NVVM::BarrierReduction::POPC),
+             arg)
+      .getResult(0);
 }
 
 // SYNCTHREADS_OR
 mlir::Value
 CUDAIntrinsicLibrary::genSyncThreadsOr(mlir::Type resultType,
                                        llvm::ArrayRef<mlir::Value> args) {
-  constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0.or";
-  mlir::MLIRContext *context = builder.getContext();
-  mlir::Type i32 = builder.getI32Type();
-  mlir::FunctionType ftype =
-      mlir::FunctionType::get(context, {resultType}, {i32});
-  auto funcOp = builder.createFunction(loc, funcName, ftype);
-  mlir::Value arg = builder.createConvert(loc, i32, args[0]);
-  return fir::CallOp::create(builder, loc, funcOp, {arg}).getResult(0);
+  mlir::Value arg = builder.createConvert(loc, builder.getI32Type(), args[0]);
+  return mlir::NVVM::BarrierOp::create(
+             builder, loc, resultType, {}, {},
+             mlir::NVVM::BarrierReductionAttr::get(
+                 builder.getContext(), mlir::NVVM::BarrierReduction::OR),
+             arg)
+      .getResult(0);
 }
 
 // SYNCWARP
diff --git a/flang/test/Lower/CUDA/cuda-device-proc.cuf b/flang/test/Lower/CUDA/cuda-device-proc.cuf
index 3a255afd59263..ef15bf8d7726d 100644
--- a/flang/test/Lower/CUDA/cuda-device-proc.cuf
+++ b/flang/test/Lower/CUDA/cuda-device-proc.cuf
@@ -103,24 +103,24 @@ end
 ! CHECK-LABEL: func.func @_QPdevsub() attributes {cuf.proc_attr = #cuf.cuda_proc<global>}
 ! CHECK: nvvm.barrier0
 ! CHECK: nvvm.bar.warp.sync %c1{{.*}} : i32 
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.and(%c1{{.*}}) fastmath<contract> : (i32) -> i32
+! CHECK: %{{.*}} = nvvm.barrier <and> %c1{{.*}} -> i32
 ! CHECK: %[[A:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[B:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[CMP:.*]] = arith.cmpi sgt, %[[A]], %[[B]] : i32
 ! CHECK: %[[CONV:.*]] = fir.convert %[[CMP]] : (i1) -> i32
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.and(%[[CONV]])
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.popc(%c1{{.*}}) fastmath<contract> : (i32) -> i32
+! CHECK: %{{.*}} = nvvm.barrier <and> %[[CONV]] -> i32
+! CHECK: %{{.*}} = nvvm.barrier <popc> %c1{{.*}} -> i32
 ! CHECK: %[[A:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[B:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[CMP:.*]] = arith.cmpi sgt, %[[A]], %[[B]] : i32
 ! CHECK: %[[CONV:.*]] = fir.convert %[[CMP]] : (i1) -> i32
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.popc(%[[CONV]]) fastmath<contract> : (i32) -> i32
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.or(%c1{{.*}}) fastmath<contract> : (i32) -> i32
+! CHECK: %{{.*}} = nvvm.barrier <popc> %[[CONV]] -> i32
+! CHECK: %{{.*}} = nvvm.barrier <or> %c1{{.*}} -> i32
 ! CHECK: %[[A:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[B:.*]] = fir.load %{{.*}} : !fir.ref<i32>
 ! CHECK: %[[CMP:.*]] = arith.cmpi sgt, %[[A]], %[[B]] : i32
 ! CHECK: %[[CONV:.*]] = fir.convert %[[CMP]] : (i1) -> i32
-! CHECK: %{{.*}} = fir.call @llvm.nvvm.barrier0.or(%[[CONV]]) fastmath<contract> : (i32) -> i32
+! CHECK: %{{.*}} = nvvm.barrier <or> %[[CONV]] -> i32
 ! CHECK: %{{.*}} = llvm.atomicrmw add  %{{.*}}, %{{.*}} seq_cst : !llvm.ptr, i32
 ! CHECK: %{{.*}} = llvm.atomicrmw add  %{{.*}}, %{{.*}} seq_cst : !llvm.ptr, i64
 ! CHECK: %{{.*}} = llvm.atomicrmw fadd %{{.*}}, %{{.*}} seq_cst : !llvm.ptr, f32
@@ -214,9 +214,9 @@ end
 ! CHECK: cuf.kernel
 ! CHECK: nvvm.barrier0
 ! CHECK: nvvm.bar.warp.sync %c1{{.*}} : i32 
-! CHECK: fir.call @llvm.nvvm.barrier0.and(%c1{{.*}}) fastmath<contract> : (i32) -> i32
-! CHECK: fir.call @llvm.nvvm.barrier0.popc(%c1{{.*}}) fastmath<contract> : (i32) -> i32
-! CHECK: fir.call @llvm.nvvm.barrier0.or(%c1{{.*}}) fastmath<contract> : (i32) -> i32
+! CHECK: nvvm.barrier <and> %c1{{.*}} -> i32
+! CHECK: nvvm.barrier <popc> %c1{{.*}} -> i32
+! CHECK: nvvm.barrier <or> %c1{{.*}} -> i32
 
 attributes(device) subroutine testMatch()
   integer :: a, ipred, mask, v32

razvanlupusoru

Nice improvement! Thank you!

vzakhari

Looks great!

[flang][cuda][NFC] Use NVVM barrier op with reduction

171d357

clementval requested review from razvanlupusoru and vzakhari November 13, 2025 19:59

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 13, 2025

razvanlupusoru approved these changes Nov 13, 2025

View reviewed changes

clementval enabled auto-merge (squash) November 13, 2025 20:05

vzakhari approved these changes Nov 13, 2025

View reviewed changes

clementval merged commit 606a0c2 into llvm:main Nov 13, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][cuda][NFC] Use NVVM barrier op with reduction #167940

[flang][cuda][NFC] Use NVVM barrier op with reduction #167940

clementval commented Nov 13, 2025

Uh oh!

llvmbot commented Nov 13, 2025

Uh oh!

razvanlupusoru left a comment

Uh oh!

vzakhari left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[flang][cuda][NFC] Use NVVM barrier op with reduction #167940

[flang][cuda][NFC] Use NVVM barrier op with reduction #167940

Conversation

clementval commented Nov 13, 2025

Uh oh!

llvmbot commented Nov 13, 2025

Uh oh!

razvanlupusoru left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants