[MLIR][NVVM][Refactor] Refactor intrinsic lowering for NVVM Ops #157079

Wolfram70 · 2025-09-05T11:41:38Z

This change:

Standardizes the usage of getIntrinsicIDAndArgsMaybeWithTypes
across NVVM Ops for intrinsic lowering which returns the intrinsic ID, arguments,
and in the case of overloaded intrinsics, the types of the arguments as well.
Moves the get*Intrinsic* functions defined in NVVMToLLVMIRTranslation.cpp
to NVVMDialect.cpp and refactors them to be of the same signature as other
getIntrinsicIDAndArgsMaybeWithTypes functions to keep consistency and allow for a
future refactor.
Replaces the use of undef with poison in the cp.async.bulk.tensor.reduce Op.

This change adds standardizes the usage of getIntrinsicIDAndArgsMaybeWithTypes across NVVM Ops for intrinsic lowering which returns the intrinsic ID, arguments, and in the case of overloaded intrinsics, the types of the arguments as well.

llvmbot · 2025-09-05T11:42:10Z

@llvm/pr-subscribers-mlir-llvm

Author: Srinivasa Ravi (Wolfram70)

Changes

This change adds standardizes the usage of getIntrinsicIDAndArgsMaybeWithTypes
across NVVM Ops for intrinsic lowering which returns the intrinsic ID, arguments,
and in the case of overloaded intrinsics, the types of the arguments as well.

It also moves the get*Intrinsic* functions defined in NVVMToLLVMIRTranslation.cpp
to NVVMDialect.cpp and refactors them to be of the same signature as other
getIntrinsicIDAndArgsMaybeWithTypes functions to keep consistency and allow for a
future refactor.

Patch is 84.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157079.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h (+8-4)
(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+161-163)
(modified) mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp (+781-97)
(modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (-374)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h b/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
index 6137bb087c576..3c463db548011 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
@@ -55,10 +55,14 @@ enum NVVMMemorySpace {
   kSharedClusterMemorySpace = 7,
 };
 
-/// A pair type of LLVM's Intrinsic ID and args (which are llvm values).
-/// This type is returned by the getIntrinsicIDAndArgs() methods.
-using IDArgPair =
-    std::pair<llvm::Intrinsic::ID, llvm::SmallVector<llvm::Value *>>;
+/// A tuple type of LLVM's Intrinsic ID, args (which are llvm values),
+/// and args types (which are llvm types).
+/// Args types are only required for overloaded intrinsics to provide the
+/// correct argument types to the createIntrinsicCall() method.
+/// This type is returned by the getIIDAndArgsWithTypes() methods.
+using IIDArgsWithTypes =
+    std::tuple<llvm::Intrinsic::ID, llvm::SmallVector<llvm::Value *>,
+               llvm::SmallVector<llvm::Type *>>;
 
 /// Return the element type and number of elements associated with a wmma matrix
 /// of given chracteristics. This matches the logic in IntrinsicsNVVM.td
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 9d93b4efe7a5b..168060aae2c3e 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -410,9 +410,16 @@ def NVVM_ReduxOp :
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto intId = getReduxIntrinsicId($_resultType, $kind, $abs, $nan);
-      $res = createIntrinsicCall(builder, intId, {$val, $mask_and_clamp});
+      auto [id, args, types] = 
+      NVVM::ReduxOp::getIIDAndArgsWithTypes(
+                        *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, id, args);
   }];
   let assemblyFormat = [{
     $kind $val `,` $mask_and_clamp  attr-dict `:` type($val) `->` type($res)
@@ -876,11 +883,17 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   }];
 
   let assemblyFormat = "$scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
+  
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   let llvmBuilder = [{
-    createIntrinsicCall(
-        builder,
-        getUnidirectionalFenceProxyID($fromProxy, $toProxy, $scope, false),
-        {$addr, $size});
+    auto [intId, args, types] = 
+    NVVM::FenceProxyAcquireOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
 
   let hasVerifier = 1;
@@ -904,9 +917,16 @@ def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
   }];
 
   let assemblyFormat = "$scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
+
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   let llvmBuilder = [{
-    createIntrinsicCall(builder, getUnidirectionalFenceProxyID(
-                                     $fromProxy, $toProxy, $scope, true));
+    auto [intId, args, types] = NVVM::FenceProxyReleaseOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
 
   let hasVerifier = 1;
@@ -985,11 +1005,15 @@ def NVVM_ShflOp :
     
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto intId = getShflIntrinsicId(
-          $_resultType, $kind, static_cast<bool>($return_value_and_is_valid));
-      $res = createIntrinsicCall(builder,
-          intId, {$thread_mask, $val, $offset, $mask_and_clamp});
+      auto [intId, args, types] = NVVM::ShflOp::getIIDAndArgsWithTypes(
+          *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = [{
     $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp  attr-dict
@@ -1035,9 +1059,16 @@ def NVVM_VoteSyncOp
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-vote-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op,
+        LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getVoteSyncIntrinsicId($kind);
-    $res = createIntrinsicCall(builder, intId, {$mask, $pred});
+    auto [intId, args, types] = 
+    NVVM::VoteSyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = "$kind $mask `,` $pred attr-dict `->` type($res)";
   let hasVerifier = 1;
@@ -1108,15 +1139,14 @@ def NVVM_CpAsyncOp : NVVM_Op<"cp.async.shared.global">,
   let assemblyFormat = "$dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)";
   let hasVerifier = 1;
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    auto id = NVVM::CpAsyncOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, translatedOperands);
-    createIntrinsicCall(builder, id, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2107,10 +2137,16 @@ def NVVM_StMatrixOp: NVVM_Op<"stmatrix">,
     
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto operands = moduleTranslation.lookupValues(opInst.getOperands());
-      auto intId = getStMatrixIntrinsicId($layout, $sources.size(), $shape, $eltType);
-      createIntrinsicCall(builder, intId, operands, operands[0]->getType());
+      auto [intId, args, types] = 
+      NVVM::StMatrixOp::getIIDAndArgsWithTypes(
+        *op, moduleTranslation, builder);
+      createIntrinsicCall(builder, intId, args, types);
   }];
   let assemblyFormat = "$ptr `,` $sources attr-dict `:` type(operands)";
   let hasVerifier = 1;
@@ -2125,10 +2161,16 @@ def NVVM_LdMatrixOp: NVVM_Op<"ldmatrix">,
 
   let summary = "cooperative matrix load";
 
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op,
+        LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto operands = moduleTranslation.lookupValues(opInst.getOperands());
-      auto intId = getLdMatrixIntrinsicId($layout, $num, $shape, $eltType);
-      $res = createIntrinsicCall(builder, intId, operands, {operands[0]->getType()});
+      auto [intId, args, types] = 
+      NVVM::LdMatrixOp::getIIDAndArgsWithTypes(
+        *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, intId, args, types);
   }];
 
   string baseDescription = [{
@@ -2543,8 +2585,8 @@ def NVVM_CpAsyncBulkTensorSharedCTAToGlobalOp :
   let extraClassDeclaration = [{
     bool hasIntrinsic() { return !getPredicate(); }
 
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
@@ -2565,7 +2607,7 @@ def NVVM_CpAsyncBulkTensorSharedCTAToGlobalOp :
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2631,8 +2673,8 @@ def NVVM_PrefetchOp : NVVM_Op<"prefetch",
   let hasVerifier = 1;
 
   let extraClassDeclaration = [{
-    static NVVM::IDArgPair
-    getIntrinsicIDAndArgs(NVVM::PrefetchOp &op,LLVM::ModuleTranslation &mt,
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase &builder);
     bool hasIntrinsic() { return !getPredicate() || !getTensormap(); }
   }];
@@ -2643,7 +2685,7 @@ def NVVM_PrefetchOp : NVVM_Op<"prefetch",
     }
   }];
   let llvmBuilder = [{
-    auto [id, args] = NVVM::PrefetchOp::getIntrinsicIDAndArgs(op,
+    auto [id, args, types] = NVVM::PrefetchOp::getIIDAndArgsWithTypes(*op,
                                           moduleTranslation, builder);
 
     if(op.getTensormap())
@@ -2685,13 +2727,13 @@ def NVVM_CpAsyncBulkPrefetchOp : NVVM_Op<"cp.async.bulk.prefetch"> {
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkPrefetchOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkPrefetchOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2726,15 +2768,15 @@ def NVVM_CpAsyncBulkTensorPrefetchOp :
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkTensorPrefetchOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorPrefetchOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2795,35 +2837,17 @@ def NVVM_CpAsyncBulkTensorReduceOp :
   }];
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID getIntrinsicID(int tensorDims,
-                                              NVVM::TMAReduxKind kind,
-                                              bool isIm2Col);
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase& builder);
   }];
 
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    // Arguments to the intrinsic:
-    // shared_mem_ptr, tmaDesc, tensorDims
-    // cache_hint(if applicable) and flag(boolean)
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    translatedOperands.push_back($srcMem);
-    translatedOperands.push_back($tmaDescriptor);
-
-    for (auto v : op.getCoordinates())
-      translatedOperands.push_back(moduleTranslation.lookupValue(v));
-
-    llvm::LLVMContext &ctx = moduleTranslation.getLLVMContext();
-    auto *i64Undef = llvm::UndefValue::get(llvm::IntegerType::get(ctx, 64));
-
-    bool isCacheHint = op.getL2CacheHint() ? true : false;
-    translatedOperands.push_back(isCacheHint ? $l2CacheHint : i64Undef);
-    translatedOperands.push_back(builder.getInt1(isCacheHint));
-
-    auto intId = NVVM::CpAsyncBulkTensorReduceOp::getIntrinsicID(
-                 op.getCoordinates().size(), $redKind,
-                 (op.getMode() == NVVM::TMAStoreMode::IM2COL));
-    createIntrinsicCall(builder, intId, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorReduceOp::getIIDAndArgsWithTypes(
+                      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2860,36 +2884,17 @@ def NVVM_CpAsyncBulkGlobalToSharedClusterOp :
     (`l2_cache_hint` `=` $l2CacheHint^ )?
     attr-dict  `:` type($dstMem) `,` type($srcMem)
   }];
+  
+  let extraClassDeclaration = [{
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase& builder);
+  }];
 
   string llvmBuilder = [{
-    // Arguments to the intrinsic:
-    // dst, mbar, src, size
-    // multicast_mask, cache_hint,
-    // flag for multicast_mask,
-    // flag for cache_hint
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    translatedOperands.push_back($dstMem);
-    translatedOperands.push_back($mbar);
-    translatedOperands.push_back($srcMem);
-    translatedOperands.push_back($size);
-
-    // Multicast, if available
-    llvm::LLVMContext &ctx = moduleTranslation.getLLVMContext();
-    auto *i16Unused = llvm::ConstantInt::get(llvm::Type::getInt16Ty(ctx), 0);
-    bool isMulticast = op.getMulticastMask() ? true : false;
-    translatedOperands.push_back(isMulticast ? $multicastMask : i16Unused);
-
-    // Cachehint, if available
-    auto *i64Unused = llvm::ConstantInt::get(llvm::Type::getInt64Ty(ctx), 0);
-    bool isCacheHint = op.getL2CacheHint() ? true : false;
-    translatedOperands.push_back(isCacheHint ? $l2CacheHint : i64Unused);
-
-    // Flag arguments for multicast and cachehint
-    translatedOperands.push_back(builder.getInt1(isMulticast));
-    translatedOperands.push_back(builder.getInt1(isCacheHint));
-
-    createIntrinsicCall(builder,
-      llvm::Intrinsic::nvvm_cp_async_bulk_global_to_shared_cluster, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncBulkGlobalToSharedClusterOp::getIIDAndArgsWithTypes(
+                      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2971,12 +2976,12 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkSharedCTAToGlobalOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkSharedCTAToGlobalOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -3276,11 +3281,16 @@ def NVVM_MatchSyncOp : NVVM_Op<"match.sync">,
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-match-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getMatchSyncIntrinsicId(
-        op.getVal().getType(), $kind);
-    $res = createIntrinsicCall(builder,
-        intId, {$thread_mask, $val});
+    auto [intId, args, types] = 
+    NVVM::MatchSyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = "$kind $thread_mask `,` $val attr-dict `:` type($val) `->` type($res)";
   let hasVerifier = 1;
@@ -3304,11 +3314,16 @@ def NVVM_BulkStoreOp: NVVM_Op<"st.bulk"> {
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk)
   }];
 
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getStBulkIntrinsicId(
-          llvm::cast<LLVM::LLVMPointerType>(op.getAddr().getType()));
-    createIntrinsicCall(builder, intId,
-                      {$addr, $size, builder.getInt64($initVal)});
+    auto [intId, args, types] = 
+    NVVM::BulkStoreOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
   
   let assemblyFormat = "$addr `,` `size` `=` $size (`,` `init` `=` $initVal^)? attr-dict `:` type($addr)";
@@ -3392,14 +3407,13 @@ def NVVM_Tcgen05AllocOp : NVVM_Op<"tcgen05.alloc", [NVVMRequiresSMa<[100, 101]>]
   let assemblyFormat = "$addr `,` $nCols attr-dict `:` type(operands)";
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05AllocOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05AllocOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3420,14 +3434,13 @@ def NVVM_Tcgen05DeallocOp : NVVM_Op<"tcgen05.dealloc", [NVVMRequiresSMa<[100, 10
   let assemblyFormat = "$taddr `,` $nCols attr-dict `:` type(operands)";
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05DeallocOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05DeallocOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3524,15 +3537,14 @@ def NVVM_Tcgen05CommitOp : NVVM_Op<"tcgen05.commit", [NVVMRequiresSMa<[100, 101]
   }];
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
 
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05CommitOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05CommitOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3636,12 +3648,14 @@ def NVVM_Tcgen05CpOp : NVVM_Op<"tcgen05.cp", [NVVMRequiresSMa<[100,...
[truncated]

llvmbot · 2025-09-05T11:42:10Z

@llvm/pr-subscribers-mlir

Author: Srinivasa Ravi (Wolfram70)

Changes

This change adds standardizes the usage of getIntrinsicIDAndArgsMaybeWithTypes
across NVVM Ops for intrinsic lowering which returns the intrinsic ID, arguments,
and in the case of overloaded intrinsics, the types of the arguments as well.

It also moves the get*Intrinsic* functions defined in NVVMToLLVMIRTranslation.cpp
to NVVMDialect.cpp and refactors them to be of the same signature as other
getIntrinsicIDAndArgsMaybeWithTypes functions to keep consistency and allow for a
future refactor.

Patch is 84.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157079.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h (+8-4)
(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+161-163)
(modified) mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp (+781-97)
(modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (-374)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h b/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
index 6137bb087c576..3c463db548011 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
@@ -55,10 +55,14 @@ enum NVVMMemorySpace {
   kSharedClusterMemorySpace = 7,
 };
 
-/// A pair type of LLVM's Intrinsic ID and args (which are llvm values).
-/// This type is returned by the getIntrinsicIDAndArgs() methods.
-using IDArgPair =
-    std::pair<llvm::Intrinsic::ID, llvm::SmallVector<llvm::Value *>>;
+/// A tuple type of LLVM's Intrinsic ID, args (which are llvm values),
+/// and args types (which are llvm types).
+/// Args types are only required for overloaded intrinsics to provide the
+/// correct argument types to the createIntrinsicCall() method.
+/// This type is returned by the getIIDAndArgsWithTypes() methods.
+using IIDArgsWithTypes =
+    std::tuple<llvm::Intrinsic::ID, llvm::SmallVector<llvm::Value *>,
+               llvm::SmallVector<llvm::Type *>>;
 
 /// Return the element type and number of elements associated with a wmma matrix
 /// of given chracteristics. This matches the logic in IntrinsicsNVVM.td
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 9d93b4efe7a5b..168060aae2c3e 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -410,9 +410,16 @@ def NVVM_ReduxOp :
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto intId = getReduxIntrinsicId($_resultType, $kind, $abs, $nan);
-      $res = createIntrinsicCall(builder, intId, {$val, $mask_and_clamp});
+      auto [id, args, types] = 
+      NVVM::ReduxOp::getIIDAndArgsWithTypes(
+                        *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, id, args);
   }];
   let assemblyFormat = [{
     $kind $val `,` $mask_and_clamp  attr-dict `:` type($val) `->` type($res)
@@ -876,11 +883,17 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   }];
 
   let assemblyFormat = "$scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
+  
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   let llvmBuilder = [{
-    createIntrinsicCall(
-        builder,
-        getUnidirectionalFenceProxyID($fromProxy, $toProxy, $scope, false),
-        {$addr, $size});
+    auto [intId, args, types] = 
+    NVVM::FenceProxyAcquireOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
 
   let hasVerifier = 1;
@@ -904,9 +917,16 @@ def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
   }];
 
   let assemblyFormat = "$scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict";
+
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   let llvmBuilder = [{
-    createIntrinsicCall(builder, getUnidirectionalFenceProxyID(
-                                     $fromProxy, $toProxy, $scope, true));
+    auto [intId, args, types] = NVVM::FenceProxyReleaseOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
 
   let hasVerifier = 1;
@@ -985,11 +1005,15 @@ def NVVM_ShflOp :
     
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto intId = getShflIntrinsicId(
-          $_resultType, $kind, static_cast<bool>($return_value_and_is_valid));
-      $res = createIntrinsicCall(builder,
-          intId, {$thread_mask, $val, $offset, $mask_and_clamp});
+      auto [intId, args, types] = NVVM::ShflOp::getIIDAndArgsWithTypes(
+          *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = [{
     $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp  attr-dict
@@ -1035,9 +1059,16 @@ def NVVM_VoteSyncOp
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-vote-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op,
+        LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getVoteSyncIntrinsicId($kind);
-    $res = createIntrinsicCall(builder, intId, {$mask, $pred});
+    auto [intId, args, types] = 
+    NVVM::VoteSyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = "$kind $mask `,` $pred attr-dict `->` type($res)";
   let hasVerifier = 1;
@@ -1108,15 +1139,14 @@ def NVVM_CpAsyncOp : NVVM_Op<"cp.async.shared.global">,
   let assemblyFormat = "$dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)";
   let hasVerifier = 1;
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    auto id = NVVM::CpAsyncOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, translatedOperands);
-    createIntrinsicCall(builder, id, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2107,10 +2137,16 @@ def NVVM_StMatrixOp: NVVM_Op<"stmatrix">,
     
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto operands = moduleTranslation.lookupValues(opInst.getOperands());
-      auto intId = getStMatrixIntrinsicId($layout, $sources.size(), $shape, $eltType);
-      createIntrinsicCall(builder, intId, operands, operands[0]->getType());
+      auto [intId, args, types] = 
+      NVVM::StMatrixOp::getIIDAndArgsWithTypes(
+        *op, moduleTranslation, builder);
+      createIntrinsicCall(builder, intId, args, types);
   }];
   let assemblyFormat = "$ptr `,` $sources attr-dict `:` type(operands)";
   let hasVerifier = 1;
@@ -2125,10 +2161,16 @@ def NVVM_LdMatrixOp: NVVM_Op<"ldmatrix">,
 
   let summary = "cooperative matrix load";
 
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op,
+        LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-      auto operands = moduleTranslation.lookupValues(opInst.getOperands());
-      auto intId = getLdMatrixIntrinsicId($layout, $num, $shape, $eltType);
-      $res = createIntrinsicCall(builder, intId, operands, {operands[0]->getType()});
+      auto [intId, args, types] = 
+      NVVM::LdMatrixOp::getIIDAndArgsWithTypes(
+        *op, moduleTranslation, builder);
+      $res = createIntrinsicCall(builder, intId, args, types);
   }];
 
   string baseDescription = [{
@@ -2543,8 +2585,8 @@ def NVVM_CpAsyncBulkTensorSharedCTAToGlobalOp :
   let extraClassDeclaration = [{
     bool hasIntrinsic() { return !getPredicate(); }
 
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
@@ -2565,7 +2607,7 @@ def NVVM_CpAsyncBulkTensorSharedCTAToGlobalOp :
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2631,8 +2673,8 @@ def NVVM_PrefetchOp : NVVM_Op<"prefetch",
   let hasVerifier = 1;
 
   let extraClassDeclaration = [{
-    static NVVM::IDArgPair
-    getIntrinsicIDAndArgs(NVVM::PrefetchOp &op,LLVM::ModuleTranslation &mt,
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase &builder);
     bool hasIntrinsic() { return !getPredicate() || !getTensormap(); }
   }];
@@ -2643,7 +2685,7 @@ def NVVM_PrefetchOp : NVVM_Op<"prefetch",
     }
   }];
   let llvmBuilder = [{
-    auto [id, args] = NVVM::PrefetchOp::getIntrinsicIDAndArgs(op,
+    auto [id, args, types] = NVVM::PrefetchOp::getIIDAndArgsWithTypes(*op,
                                           moduleTranslation, builder);
 
     if(op.getTensormap())
@@ -2685,13 +2727,13 @@ def NVVM_CpAsyncBulkPrefetchOp : NVVM_Op<"cp.async.bulk.prefetch"> {
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkPrefetchOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkPrefetchOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2726,15 +2768,15 @@ def NVVM_CpAsyncBulkTensorPrefetchOp :
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
 
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkTensorPrefetchOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorPrefetchOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -2795,35 +2837,17 @@ def NVVM_CpAsyncBulkTensorReduceOp :
   }];
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID getIntrinsicID(int tensorDims,
-                                              NVVM::TMAReduxKind kind,
-                                              bool isIm2Col);
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase& builder);
   }];
 
   let hasVerifier = 1;
 
   string llvmBuilder = [{
-    // Arguments to the intrinsic:
-    // shared_mem_ptr, tmaDesc, tensorDims
-    // cache_hint(if applicable) and flag(boolean)
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    translatedOperands.push_back($srcMem);
-    translatedOperands.push_back($tmaDescriptor);
-
-    for (auto v : op.getCoordinates())
-      translatedOperands.push_back(moduleTranslation.lookupValue(v));
-
-    llvm::LLVMContext &ctx = moduleTranslation.getLLVMContext();
-    auto *i64Undef = llvm::UndefValue::get(llvm::IntegerType::get(ctx, 64));
-
-    bool isCacheHint = op.getL2CacheHint() ? true : false;
-    translatedOperands.push_back(isCacheHint ? $l2CacheHint : i64Undef);
-    translatedOperands.push_back(builder.getInt1(isCacheHint));
-
-    auto intId = NVVM::CpAsyncBulkTensorReduceOp::getIntrinsicID(
-                 op.getCoordinates().size(), $redKind,
-                 (op.getMode() == NVVM::TMAStoreMode::IM2COL));
-    createIntrinsicCall(builder, intId, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncBulkTensorReduceOp::getIIDAndArgsWithTypes(
+                      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2860,36 +2884,17 @@ def NVVM_CpAsyncBulkGlobalToSharedClusterOp :
     (`l2_cache_hint` `=` $l2CacheHint^ )?
     attr-dict  `:` type($dstMem) `,` type($srcMem)
   }];
+  
+  let extraClassDeclaration = [{
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase& builder);
+  }];
 
   string llvmBuilder = [{
-    // Arguments to the intrinsic:
-    // dst, mbar, src, size
-    // multicast_mask, cache_hint,
-    // flag for multicast_mask,
-    // flag for cache_hint
-    llvm::SmallVector<llvm::Value *> translatedOperands;
-    translatedOperands.push_back($dstMem);
-    translatedOperands.push_back($mbar);
-    translatedOperands.push_back($srcMem);
-    translatedOperands.push_back($size);
-
-    // Multicast, if available
-    llvm::LLVMContext &ctx = moduleTranslation.getLLVMContext();
-    auto *i16Unused = llvm::ConstantInt::get(llvm::Type::getInt16Ty(ctx), 0);
-    bool isMulticast = op.getMulticastMask() ? true : false;
-    translatedOperands.push_back(isMulticast ? $multicastMask : i16Unused);
-
-    // Cachehint, if available
-    auto *i64Unused = llvm::ConstantInt::get(llvm::Type::getInt64Ty(ctx), 0);
-    bool isCacheHint = op.getL2CacheHint() ? true : false;
-    translatedOperands.push_back(isCacheHint ? $l2CacheHint : i64Unused);
-
-    // Flag arguments for multicast and cachehint
-    translatedOperands.push_back(builder.getInt1(isMulticast));
-    translatedOperands.push_back(builder.getInt1(isCacheHint));
-
-    createIntrinsicCall(builder,
-      llvm::Intrinsic::nvvm_cp_async_bulk_global_to_shared_cluster, translatedOperands);
+    auto [id, args, types] = NVVM::CpAsyncBulkGlobalToSharedClusterOp::getIIDAndArgsWithTypes(
+                      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, id, args);
   }];
 }
 
@@ -2971,12 +2976,12 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
   }];
 
   let extraClassDeclaration = [{
-    static mlir::NVVM::IDArgPair
-    getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
+    static mlir::NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
                           llvm::IRBuilderBase& builder);
   }];
   string llvmBuilder = [{
-    auto [id, args] = NVVM::CpAsyncBulkSharedCTAToGlobalOp::getIntrinsicIDAndArgs(
+    auto [id, args, types] = NVVM::CpAsyncBulkSharedCTAToGlobalOp::getIIDAndArgsWithTypes(
                       *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
@@ -3276,11 +3281,16 @@ def NVVM_MatchSyncOp : NVVM_Op<"match.sync">,
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-match-sync)
   }];
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getMatchSyncIntrinsicId(
-        op.getVal().getType(), $kind);
-    $res = createIntrinsicCall(builder,
-        intId, {$thread_mask, $val});
+    auto [intId, args, types] = 
+    NVVM::MatchSyncOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    $res = createIntrinsicCall(builder, intId, args);
   }];
   let assemblyFormat = "$kind $thread_mask `,` $val attr-dict `:` type($val) `->` type($res)";
   let hasVerifier = 1;
@@ -3304,11 +3314,16 @@ def NVVM_BulkStoreOp: NVVM_Op<"st.bulk"> {
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk)
   }];
 
+  let extraClassDeclaration = [{
+    static NVVM::IIDArgsWithTypes
+    getIIDAndArgsWithTypes(Operation &op,
+      LLVM::ModuleTranslation &mt, llvm::IRBuilderBase &builder);
+  }];
   string llvmBuilder = [{
-    auto intId = getStBulkIntrinsicId(
-          llvm::cast<LLVM::LLVMPointerType>(op.getAddr().getType()));
-    createIntrinsicCall(builder, intId,
-                      {$addr, $size, builder.getInt64($initVal)});
+    auto [intId, args, types] = 
+    NVVM::BulkStoreOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
+    createIntrinsicCall(builder, intId, args);
   }];
   
   let assemblyFormat = "$addr `,` `size` `=` $size (`,` `init` `=` $initVal^)? attr-dict `:` type($addr)";
@@ -3392,14 +3407,13 @@ def NVVM_Tcgen05AllocOp : NVVM_Op<"tcgen05.alloc", [NVVMRequiresSMa<[100, 101]>]
   let assemblyFormat = "$addr `,` $nCols attr-dict `:` type(operands)";
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05AllocOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05AllocOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3420,14 +3434,13 @@ def NVVM_Tcgen05DeallocOp : NVVM_Op<"tcgen05.dealloc", [NVVMRequiresSMa<[100, 10
   let assemblyFormat = "$taddr `,` $nCols attr-dict `:` type(operands)";
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05DeallocOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05DeallocOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3524,15 +3537,14 @@ def NVVM_Tcgen05CommitOp : NVVM_Op<"tcgen05.commit", [NVVMRequiresSMa<[100, 101]
   }];
 
   let extraClassDeclaration = [{
-    static llvm::Intrinsic::ID
-      getIntrinsicIDAndArgs(Operation &op, LLVM::ModuleTranslation &mt,
-                            llvm::SmallVector<llvm::Value *> &args);
+    static NVVM::IIDArgsWithTypes
+      getIIDAndArgsWithTypes(Operation &op, LLVM::ModuleTranslation &mt,
+                            llvm::IRBuilderBase &builder);
   }];
 
   string llvmBuilder = [{
-    llvm::SmallVector<llvm::Value *> args;
-    auto id = NVVM::Tcgen05CommitOp::getIntrinsicIDAndArgs(
-      *op, moduleTranslation, args);
+    auto [id, args, types] = NVVM::Tcgen05CommitOp::getIIDAndArgsWithTypes(
+      *op, moduleTranslation, builder);
     createIntrinsicCall(builder, id, args);
   }];
 }
@@ -3636,12 +3648,14 @@ def NVVM_Tcgen05CpOp : NVVM_Op<"tcgen05.cp", [NVVMRequiresSMa<[100,...
[truncated]

github-actions · 2025-09-05T11:44:14Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2025-09-05T11:44:15Z

✅ With the latest revision this PR passed the undef deprecator.

mlir/test/Target/LLVMIR/nvvm/tma_store_reduce.mlir

rajatbajpai · 2025-09-16T10:41:55Z

mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h

+/// correct argument types to the createIntrinsicCall() method.
+/// This type is returned by the getIIDAndArgsWithTypes() methods.
+struct IIDArgsWithTypes {
+  IIDArgsWithTypes(llvm::Intrinsic::ID id,


Why we need a constructor?

This is because when returning this struct with the ternary ? operator, aggregate initialization doesn't work and gives an error, so I added this constructor to explicitly construct the object in those cases instead.

I see, can't we use below syntax?

return cond == some_cond ? S1{a1, b1} : S1{a2, b2};

mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp

rajatbajpai · 2025-09-16T11:12:09Z

mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp

    args.push_back(mt.lookupValue(cpAsyncOp.getCpSize()));

-  return id;
+  return {id, std::move(args), {}};


is it possible to drop {}?

rajatbajpai · 2025-09-16T11:22:31Z

mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp

+    // the latter as that's the variant exposed by CUDA API.
+    id = valType.isInteger(32) ? llvm::Intrinsic::nvvm_match_all_sync_i32p
+                               : llvm::Intrinsic::nvvm_match_all_sync_i64p;
+    break;


Please add default case with llvm unreachable. See other places as well.

Actually this results in a warning (treated as an error) due to the use of default when all the possible values of the enum are already added as cases.

Great, then it is fine. Would there be an error if we don't handle any possible values?

I'm not sure but I think if we're missing any enum value case, we'll need to include the default case with llvm_unreachable.

durga4github · 2025-09-16T11:58:01Z

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp

-    return GET_REDUX_F32_ID(max, hasAbs, hasNaN);
-  }
-  llvm_unreachable("unknown redux kind");
-}


ok, I feel moving functions from here to the other file, should be a separate change/PR in itself.
Would leave it up to you to do it before or after the refactor. But let us not mix the refactor with this NFC code-movement.

I kept that change in this PR as well since we are changing the signatures of these functions as well and making them static member functions of the Op (instead of static to this file) so that fits within the theme of the refactor and is not an NFC per se.
But please let me know if you think it is better to do this refactor + move in a seperate PR to do this refactor basically in 2 parts and reduce the footprint of this PR.

Yes, this PR is accumulating many things. It is better to split this into at least two PRs.

Wolfram70 added 4 commits September 5, 2025 17:05

fix typo

81d3f9a

move functions in NVVMToLLVMIRTranslation.cpp to NVVMDialect.cpp

3fe6daa

simplify names

8e6b622

Wolfram70 requested a review from durga4github September 5, 2025 11:41

Wolfram70 self-assigned this Sep 5, 2025

Wolfram70 requested a review from grypp as a code owner September 5, 2025 11:41

llvmbot added mlir:llvm mlir labels Sep 5, 2025

Wolfram70 added 5 commits September 5, 2025 17:28

fix format

04310fc

fix warnings

a4d09da

use struct instead of tuple

f6a90df

replace unused variables with _

0ae224f

replace undef with poison

84b8e76

Wolfram70 force-pushed the dev/Wolfram70/mlir-nvvm-refactor-intrins branch from e9a892f to 97b7206 Compare September 8, 2025 04:51

fix warnings

78c6e07

Wolfram70 force-pushed the dev/Wolfram70/mlir-nvvm-refactor-intrins branch from 97b7206 to 78c6e07 Compare September 8, 2025 05:24

fix tma_store_reduce test

5697c92

schwarzschild-radius reviewed Sep 8, 2025

View reviewed changes

mlir/test/Target/LLVMIR/nvvm/tma_store_reduce.mlir Show resolved Hide resolved

Wolfram70 requested a review from rajatbajpai September 8, 2025 10:42

rajatbajpai reviewed Sep 16, 2025

View reviewed changes

durga4github reviewed Sep 16, 2025

View reviewed changes

[MLIR][NVVM][Refactor] Refactor intrinsic lowering for NVVM Ops #157079

Are you sure you want to change the base?

[MLIR][NVVM][Refactor] Refactor intrinsic lowering for NVVM Ops #157079

Conversation

Wolfram70 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

llvmbot commented Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wolfram70 Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

durga4github Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wolfram70 commented Sep 5, 2025 •

edited

Loading

github-actions bot commented Sep 5, 2025 •

edited

Loading

github-actions bot commented Sep 5, 2025 •

edited

Loading

Wolfram70 Sep 16, 2025 •

edited

Loading

durga4github Sep 16, 2025 •

edited

Loading