Skip to content

Conversation

@clementval
Copy link
Contributor

Just use a single templated function to generate the 3 kind of thread fence so we can remove duplicated code.

@clementval clementval requested a review from wangzpgi November 7, 2025 19:52
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Just use a single templated function to generate the 3 kind of thread fence so we can remove duplicated code.


Full diff: https://github.com/llvm/llvm-project/pull/167009.diff

2 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h (+1-2)
  • (modified) flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp (+6-19)
diff --git a/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h b/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
index d735ce95a83dc..ae7d566920656 100644
--- a/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
+++ b/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
@@ -63,9 +63,8 @@ struct CUDAIntrinsicLibrary : IntrinsicLibrary {
   mlir::Value genThisGrid(mlir::Type, llvm::ArrayRef<mlir::Value>);
   mlir::Value genThisThreadBlock(mlir::Type, llvm::ArrayRef<mlir::Value>);
   mlir::Value genThisWarp(mlir::Type, llvm::ArrayRef<mlir::Value>);
+  template <mlir::NVVM::MemScopeKind scope>
   void genThreadFence(llvm::ArrayRef<fir::ExtendedValue>);
-  void genThreadFenceBlock(llvm::ArrayRef<fir::ExtendedValue>);
-  void genThreadFenceSystem(llvm::ArrayRef<fir::ExtendedValue>);
   void genTMABulkCommitGroup(llvm::ArrayRef<fir::ExtendedValue>);
   void genTMABulkG2S(llvm::ArrayRef<fir::ExtendedValue>);
   void genTMABulkLoadC4(llvm::ArrayRef<fir::ExtendedValue>);
diff --git a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
index 18b56d384b479..323d1ef78e65d 100644
--- a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
@@ -472,17 +472,17 @@ static constexpr IntrinsicHandler cudaHandlers[]{
      /*isElemental=*/false},
     {"threadfence",
      static_cast<CUDAIntrinsicLibrary::SubroutineGenerator>(
-         &CI::genThreadFence),
+         &CI::genThreadFence<mlir::NVVM::MemScopeKind::GPU>),
      {},
      /*isElemental=*/false},
     {"threadfence_block",
      static_cast<CUDAIntrinsicLibrary::SubroutineGenerator>(
-         &CI::genThreadFenceBlock),
+         &CI::genThreadFence<mlir::NVVM::MemScopeKind::CTA>),
      {},
      /*isElemental=*/false},
     {"threadfence_system",
      static_cast<CUDAIntrinsicLibrary::SubroutineGenerator>(
-         &CI::genThreadFenceSystem),
+         &CI::genThreadFence<mlir::NVVM::MemScopeKind::SYS>),
      {},
      /*isElemental=*/false},
     {"tma_bulk_commit_group",
@@ -1306,25 +1306,12 @@ CUDAIntrinsicLibrary::genThisWarp(mlir::Type resultType,
   return res;
 }
 
-// THREADFENCE
+// THREADFENCE, THREADFENCE_BLOCK, THREADFENCE_SYSTEM
+template <mlir::NVVM::MemScopeKind scope>
 void CUDAIntrinsicLibrary::genThreadFence(
     llvm::ArrayRef<fir::ExtendedValue> args) {
   assert(args.size() == 0);
-  mlir::NVVM::MembarOp::create(builder, loc, mlir::NVVM::MemScopeKind::GPU);
-}
-
-// THREADFENCE_BLOCK
-void CUDAIntrinsicLibrary::genThreadFenceBlock(
-    llvm::ArrayRef<fir::ExtendedValue> args) {
-  assert(args.size() == 0);
-  mlir::NVVM::MembarOp::create(builder, loc, mlir::NVVM::MemScopeKind::CTA);
-}
-
-// THREADFENCE_SYSTEM
-void CUDAIntrinsicLibrary::genThreadFenceSystem(
-    llvm::ArrayRef<fir::ExtendedValue> args) {
-  assert(args.size() == 0);
-  mlir::NVVM::MembarOp::create(builder, loc, mlir::NVVM::MemScopeKind::SYS);
+  mlir::NVVM::MembarOp::create(builder, loc, scope);
 }
 
 // TMA_BULK_COMMIT_GROUP

@clementval clementval merged commit f00d353 into llvm:main Nov 7, 2025
13 checks passed
@clementval clementval deleted the cuf_refact_genthread branch November 7, 2025 21:12
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
Just use a single templated function to generate the 3 kind of thread
fence so we can remove duplicated code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants