[mlir][NVVM] Add InferTypeOpInterface to NVVM MBarrier ops with deterministic result types#188173
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-gpu Author: Bastian Hagedorn (bastianhagedorn) ChangesAdd result type inference to 10 NVVM ops whose result types can be fully determined from their operands and attributes. This enables the Python binding generator to emit Ops with always-present results (using
Ops with optional results (using
The optional-result ops use a permissive Also updates the NVGPUToNVVM conversion to work with the new inferred result types for This is a source-breaking change for Python callers that pass result types positionally (e.g. Patch is 35.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/188173.diff 6 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 57a0c67e82c47..fccab166ac5ae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -720,7 +720,8 @@ def NVVM_MBarrierCompleteTxOp : NVVM_VoidIntrinsicOp<"mbarrier.complete_tx"> {
let hasVerifier = 1;
}
-def NVVM_MBarrierArriveOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive"> {
+def NVVM_MBarrierArriveOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive",
+ [InferTypeOpAdaptorWithIsCompatible]> {
let summary = "MBarrier Arrive Operation";
let description = [{
The `nvvm.mbarrier.arrive` operation performs an arrive-on operation on the
@@ -768,7 +769,8 @@ def NVVM_MBarrierArriveOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive"> {
let hasVerifier = 1;
}
-def NVVM_MBarrierArriveDropOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive_drop"> {
+def NVVM_MBarrierArriveDropOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive_drop",
+ [InferTypeOpAdaptorWithIsCompatible]> {
let summary = "MBarrier Arrive-Drop Operation";
let description = [{
The `nvvm.mbarrier.arrive_drop` operation decrements the expected arrival
@@ -844,7 +846,8 @@ def NVVM_MBarrierArriveDropNocompleteOp : NVVM_SingleResultIntrinsicOp<"mbarrier
let assemblyFormat = "$addr `,` $count attr-dict `:` type(operands) `->` type($res)";
}
-def NVVM_MBarrierArriveExpectTxOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_tx"> {
+def NVVM_MBarrierArriveExpectTxOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_tx",
+ [InferTypeOpAdaptorWithIsCompatible]> {
let summary = "MBarrier Arrive with Expected Transaction Count";
let description = [{
The `nvvm.mbarrier.arrive.expect_tx` operation performs an expect-tx operation
@@ -910,7 +913,8 @@ def NVVM_MBarrierArriveExpectTxOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_t
}];
}
-def NVVM_MBarrierArriveDropExpectTxOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive_drop.expect_tx"> {
+def NVVM_MBarrierArriveDropExpectTxOp : NVVM_SingleResultIntrinsicOp<"mbarrier.arrive_drop.expect_tx",
+ [InferTypeOpAdaptorWithIsCompatible]> {
let summary = "MBarrier arrive_drop with expected transaction count";
let description = [{
The `nvvm.mbarrier.arrive_drop.expect_tx` operation is similar to the
@@ -1123,7 +1127,8 @@ def BarrierReductionAttr
let assemblyFormat = "`<` $value `>`";
}
-def NVVM_BarrierOp : NVVM_SingleResultIntrinsicOp<"barrier", [AttrSizedOperandSegments]> {
+def NVVM_BarrierOp : NVVM_SingleResultIntrinsicOp<"barrier",
+ [AttrSizedOperandSegments, InferTypeOpAdaptorWithIsCompatible]> {
let summary = "CTA Barrier Synchronization Op";
let description = [{
The `nvvm.barrier` operation performs barrier synchronization and communication
@@ -1504,7 +1509,7 @@ def ShflKind : I32EnumAttr<"ShflKind", "NVVM shuffle kind",
def ShflKindAttr : EnumAttr<NVVM_Dialect, ShflKind, "shfl_kind">;
def NVVM_ShflOp :
- NVVM_Op<"shfl.sync", [NVVMRequiresSM<30>]>,
+ NVVM_Op<"shfl.sync", [NVVMRequiresSM<30>, InferTypeOpAdaptor]>,
Results<(outs AnyTypeOf<[I32, F32, LLVMStructType]>:$res)>,
Arguments<(ins I32:$thread_mask,
AnyTypeOf<[I32, F32]>:$val,
@@ -1558,7 +1563,7 @@ def VoteSyncKind : I32EnumAttr<"VoteSyncKind", "NVVM vote sync kind",
def VoteSyncKindAttr : EnumAttr<NVVM_Dialect, VoteSyncKind, "vote_sync_kind">;
def NVVM_VoteSyncOp
- : NVVM_Op<"vote.sync">,
+ : NVVM_Op<"vote.sync", [InferTypeOpAdaptor]>,
Results<(outs AnyTypeOf<[I32, I1]>:$res)>,
Arguments<(ins I32:$mask, I1:$pred, VoteSyncKindAttr:$kind)> {
let summary = "Vote across thread group";
@@ -3028,7 +3033,7 @@ def NVVM_StMatrixOp: NVVM_Op<"stmatrix">,
let hasVerifier = 1;
}
-def NVVM_LdMatrixOp: NVVM_Op<"ldmatrix">,
+def NVVM_LdMatrixOp: NVVM_Op<"ldmatrix", [InferTypeOpAdaptor]>,
Results<(outs AnyType:$res)>,
Arguments<(ins LLVM_PointerShared:$ptr, I32Attr:$num,
MMALayoutAttr:$layout,
@@ -4783,7 +4788,7 @@ def MatchSyncKind : I32EnumAttr<"MatchSyncKind", "NVVM match sync kind",
def MatchSyncKindAttr : EnumAttr<NVVM_Dialect, MatchSyncKind, "match_sync_kind">;
-def NVVM_MatchSyncOp : NVVM_Op<"match.sync">,
+def NVVM_MatchSyncOp : NVVM_Op<"match.sync", [InferTypeOpAdaptor]>,
Results<(outs AnyTypeOf<[I32, LLVMStructType]>:$res)>,
Arguments<(ins I32:$thread_mask,
AnyTypeOf<[I32, I64]>:$val,
@@ -5723,7 +5728,8 @@ def ClusterLaunchControlQueryTypeAttr
}
def NVVM_ClusterLaunchControlQueryCancelOp
- : NVVM_Op<"clusterlaunchcontrol.query.cancel", [NVVMRequiresSM<100>]> {
+ : NVVM_Op<"clusterlaunchcontrol.query.cancel",
+ [NVVMRequiresSM<100>, InferTypeOpAdaptor]> {
let summary = "Query the response of a clusterlaunchcontrol.try.cancel operation";
let description = [{
`clusterlaunchcontrol.query.cancel` queries the response of a
diff --git a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
index 6edc8f5c86dd3..956019f490c4c 100644
--- a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
+++ b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
@@ -852,9 +852,7 @@ struct NVGPUMBarrierArriveLowering
Value barrier =
getMbarrierPtr(b, op.getBarriers().getType(), adaptor.getBarriers(),
adaptor.getMbarId(), rewriter);
- Type tokenType = getTypeConverter()->convertType(
- nvgpu::MBarrierTokenType::get(op->getContext()));
- rewriter.replaceOpWithNewOp<NVVM::MBarrierArriveOp>(op, tokenType, barrier);
+ rewriter.replaceOpWithNewOp<NVVM::MBarrierArriveOp>(op, barrier);
return success();
}
};
@@ -911,12 +909,17 @@ struct NVGPUMBarrierArriveExpectTxLowering
getMbarrierPtr(b, op.getBarriers().getType(), adaptor.getBarriers(),
adaptor.getMbarId(), rewriter);
Value txcount = truncToI32(b, adaptor.getTxcount());
- rewriter.replaceOpWithNewOp<NVVM::MBarrierArriveExpectTxOp>(
- op, Type{}, // return-value is optional and is void by default
+ // Use create+eraseOp instead of replaceOpWithNewOp because the nvgpu op
+ // has 0 results, but the NVVM op now infers an i64 result (via
+ // InferTypeOpInterface) for non-cluster pointers. The create() call
+ // triggers InferTypeOpInterface automatically when no result types are
+ // provided; the inferred i64 result is simply unused.
+ NVVM::MBarrierArriveExpectTxOp::create(rewriter, op->getLoc(),
barrier, txcount, // barrier and txcount
NVVM::MemScopeKind::CTA, // default scope is CTA
false, // relaxed-semantics is false
adaptor.getPredicate());
+ rewriter.eraseOp(op);
return success();
}
};
diff --git a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
index 7d49aa3878ebe..edf142cfdca58 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
@@ -302,6 +302,88 @@ LogicalResult MBarrierArriveDropExpectTxOp::verify() {
getRes());
}
+//===----------------------------------------------------------------------===//
+// inferReturnTypes for mbarrier arrive-like ops
+//===----------------------------------------------------------------------===//
+
+/// Shared inference: shared_cluster addr -> no result; otherwise -> i64.
+/// Generic pointers (address space 0) are treated as shared::cta per the PTX
+/// ISA: "If no state space is specified then Generic Addressing is used. If the
+/// address does not fall within .shared::cta then the behavior is undefined."
+/// Therefore only ptr<7> (shared_cluster) produces zero results.
+static LogicalResult inferMBarrierArriveResultTypes(
+ MLIRContext *context, Value addr,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ auto ptrTy = llvm::dyn_cast<LLVM::LLVMPointerType>(addr.getType());
+ if (!ptrTy)
+ return failure();
+ if (ptrTy.getAddressSpace() !=
+ static_cast<unsigned>(NVVMMemorySpace::SharedCluster))
+ inferredReturnTypes.push_back(IntegerType::get(context, 64));
+ return success();
+}
+
+LogicalResult MBarrierArriveOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ MBarrierArriveOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ return inferMBarrierArriveResultTypes(context, adaptor.getAddr(),
+ inferredReturnTypes);
+}
+
+LogicalResult MBarrierArriveDropOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ MBarrierArriveDropOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ return inferMBarrierArriveResultTypes(context, adaptor.getAddr(),
+ inferredReturnTypes);
+}
+
+LogicalResult MBarrierArriveExpectTxOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ MBarrierArriveExpectTxOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ // Predicate forces no return value (inline PTX path).
+ // Note: predicate + shared_cluster is rejected by the verifier separately.
+ if (adaptor.getPredicate())
+ return success();
+ return inferMBarrierArriveResultTypes(context, adaptor.getAddr(),
+ inferredReturnTypes);
+}
+
+LogicalResult MBarrierArriveDropExpectTxOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ MBarrierArriveDropExpectTxOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ return inferMBarrierArriveResultTypes(context, adaptor.getAddr(),
+ inferredReturnTypes);
+}
+
+/// For ops with optional results, allow the user to omit the result even when
+/// inference would produce one. This preserves backward compatibility: the
+/// result can be silently discarded (e.g., for fire-and-forget arrive ops).
+static bool isCompatibleReturnTypesOptionalResult(TypeRange inferred,
+ TypeRange actual) {
+ if (actual.empty())
+ return true;
+ return inferred == actual;
+}
+
+bool MBarrierArriveOp::isCompatibleReturnTypes(TypeRange l, TypeRange r) {
+ return isCompatibleReturnTypesOptionalResult(l, r);
+}
+bool MBarrierArriveDropOp::isCompatibleReturnTypes(TypeRange l, TypeRange r) {
+ return isCompatibleReturnTypesOptionalResult(l, r);
+}
+bool MBarrierArriveExpectTxOp::isCompatibleReturnTypes(TypeRange l,
+ TypeRange r) {
+ return isCompatibleReturnTypesOptionalResult(l, r);
+}
+bool MBarrierArriveDropExpectTxOp::isCompatibleReturnTypes(TypeRange l,
+ TypeRange r) {
+ return isCompatibleReturnTypesOptionalResult(l, r);
+}
+
LogicalResult MBarrierExpectTxOp::verify() {
return verifyMBarrierArriveLikeOp(getOperation(), getAddr(), getScope());
}
@@ -2244,6 +2326,23 @@ LogicalResult ShflOp::verify() {
return success();
}
+LogicalResult ShflOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ ShflOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ Type valType = adaptor.getVal().getType();
+ if (adaptor.getReturnValueAndIsValid()) {
+ // struct<(val_type, i1)>
+ SmallVector<Type> body = {valType, IntegerType::get(context, 1)};
+ inferredReturnTypes.push_back(
+ LLVM::LLVMStructType::getLiteral(context, body));
+ } else {
+ // same type as val
+ inferredReturnTypes.push_back(valType);
+ }
+ return success();
+}
+
std::pair<mlir::Type, unsigned> NVVM::inferMMAType(NVVM::MMATypes type,
NVVM::MMAFrag frag, int nRow,
int nCol,
@@ -2450,6 +2549,26 @@ LogicalResult NVVM::LdMatrixOp::verify() {
return success();
}
+LogicalResult LdMatrixOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ LdMatrixOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ uint32_t num = adaptor.getNum();
+ uint32_t m = adaptor.getShape().getM();
+ uint32_t n = adaptor.getShape().getN();
+ uint32_t numElements = (m == 16 && n == 16) ? num * 2 : num;
+
+ Type i32 = IntegerType::get(context, 32);
+ if (numElements == 1) {
+ inferredReturnTypes.push_back(i32);
+ } else {
+ inferredReturnTypes.push_back(
+ LLVM::LLVMStructType::getLiteral(context,
+ SmallVector<Type>(numElements, i32)));
+ }
+ return success();
+}
+
LogicalResult NVVM::StMatrixOp::verify() {
int numMatrix = getSources().size();
if (numMatrix != 1 && numMatrix != 2 && numMatrix != 4)
@@ -2798,6 +2917,19 @@ LogicalResult NVVM::BarrierOp::verify() {
return success();
}
+LogicalResult BarrierOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ BarrierOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ if (adaptor.getReductionOp())
+ inferredReturnTypes.push_back(IntegerType::get(context, 32));
+ return success();
+}
+
+bool BarrierOp::isCompatibleReturnTypes(TypeRange l, TypeRange r) {
+ return isCompatibleReturnTypesOptionalResult(l, r);
+}
+
LogicalResult NVVM::Tcgen05CpOp::verify() {
auto mc = getMulticast();
@@ -2840,6 +2972,23 @@ LogicalResult NVVM::MatchSyncOp::verify() {
return success();
}
+LogicalResult MatchSyncOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ MatchSyncOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ if (adaptor.getKind() == NVVM::MatchSyncKind::all) {
+ // all: returns struct<(i32, i1)>
+ SmallVector<Type> body = {IntegerType::get(context, 32),
+ IntegerType::get(context, 1)};
+ inferredReturnTypes.push_back(
+ LLVM::LLVMStructType::getLiteral(context, body));
+ } else {
+ // any: returns i32
+ inferredReturnTypes.push_back(IntegerType::get(context, 32));
+ }
+ return success();
+}
+
LogicalResult NVVM::VoteSyncOp::verify() {
if (getKind() == NVVM::VoteSyncKind::ballot) {
if (!getType().isInteger(32)) {
@@ -2853,6 +3002,17 @@ LogicalResult NVVM::VoteSyncOp::verify() {
return success();
}
+LogicalResult VoteSyncOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ VoteSyncOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ if (adaptor.getKind() == NVVM::VoteSyncKind::ballot)
+ inferredReturnTypes.push_back(IntegerType::get(context, 32));
+ else
+ inferredReturnTypes.push_back(IntegerType::get(context, 1));
+ return success();
+}
+
LogicalResult NVVM::PrefetchOp::verify() {
using MemSpace = NVVM::NVVMMemorySpace;
using CacheLevel = NVVM::PrefetchCacheLevel;
@@ -2946,6 +3106,18 @@ LogicalResult NVVM::ClusterLaunchControlQueryCancelOp::verify() {
return success();
}
+LogicalResult ClusterLaunchControlQueryCancelOp::inferReturnTypes(
+ MLIRContext *context, std::optional<Location> location,
+ ClusterLaunchControlQueryCancelOp::Adaptor adaptor,
+ SmallVectorImpl<Type> &inferredReturnTypes) {
+ if (adaptor.getQueryType() ==
+ NVVM::ClusterLaunchControlQueryType::IS_CANCELED)
+ inferredReturnTypes.push_back(IntegerType::get(context, 1));
+ else
+ inferredReturnTypes.push_back(IntegerType::get(context, 32));
+ return success();
+}
+
LogicalResult NVVM::ReduxOp::verify() {
mlir::Type reduxType = getType();
diff --git a/mlir/test/Dialect/LLVMIR/nvvm.mlir b/mlir/test/Dialect/LLVMIR/nvvm.mlir
index b11ba944fe4ac..fad9c66bc694a 100644
--- a/mlir/test/Dialect/LLVMIR/nvvm.mlir
+++ b/mlir/test/Dialect/LLVMIR/nvvm.mlir
@@ -443,6 +443,13 @@ llvm.func private @mbarrier_arrive_shared(%barrier: !llvm.ptr<3>) {
llvm.return
}
+// CHECK-LABEL: @mbarrier_arrive_cluster
+llvm.func private @mbarrier_arrive_cluster(%barrier: !llvm.ptr<7>) {
+ // CHECK: nvvm.mbarrier.arrive %{{.*}} : !llvm.ptr<7>{{$}}
+ nvvm.mbarrier.arrive %barrier : !llvm.ptr<7>
+ llvm.return
+}
+
llvm.func private @mbarrier_arrive_nocomplete(%barrier: !llvm.ptr) {
%count = nvvm.read.ptx.sreg.ntid.x : i32
// CHECK: nvvm.mbarrier.arrive.nocomplete %{{.*}} : !llvm.ptr
@@ -457,6 +464,55 @@ llvm.func private @mbarrier_arrive_nocomplete_shared(%barrier: !llvm.ptr<3>) {
llvm.return
}
+// CHECK-LABEL: @mbarrier_arrive_drop_shared
+llvm.func private @mbarrier_arrive_drop_shared(%barrier: !llvm.ptr<3>) {
+ // CHECK: nvvm.mbarrier.arrive_drop %{{.*}} : !llvm.ptr<3> -> i64
+ %0 = nvvm.mbarrier.arrive_drop %barrier : !llvm.ptr<3> -> i64
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_drop_cluster
+llvm.func private @mbarrier_arrive_drop_cluster(%barrier: !llvm.ptr<7>) {
+ // CHECK: nvvm.mbarrier.arrive_drop %{{.*}} : !llvm.ptr<7>{{$}}
+ nvvm.mbarrier.arrive_drop %barrier : !llvm.ptr<7>
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_expect_tx_shared
+llvm.func private @mbarrier_arrive_expect_tx_shared(%barrier: !llvm.ptr<3>, %txcount: i32) {
+ // CHECK: nvvm.mbarrier.arrive.expect_tx %{{.*}}, %{{.*}} : !llvm.ptr<3>, i32 -> i64
+ %0 = nvvm.mbarrier.arrive.expect_tx %barrier, %txcount : !llvm.ptr<3>, i32 -> i64
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_expect_tx_cluster
+llvm.func private @mbarrier_arrive_expect_tx_cluster(%barrier: !llvm.ptr<7>, %txcount: i32) {
+ // CHECK: nvvm.mbarrier.arrive.expect_tx %{{.*}}, %{{.*}} : !llvm.ptr<7>, i32{{$}}
+ nvvm.mbarrier.arrive.expect_tx %barrier, %txcount : !llvm.ptr<7>, i32
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_expect_tx_predicate
+llvm.func private @mbarrier_arrive_expect_tx_predicate(%barrier: !llvm.ptr<3>, %txcount: i32, %pred: i1) {
+ // CHECK: nvvm.mbarrier.arrive.expect_tx %{{.*}}, %{{.*}}, predicate = %{{.*}} : !llvm.ptr<3>, i32, i1{{$}}
+ nvvm.mbarrier.arrive.expect_tx %barrier, %txcount, predicate = %pred : !llvm.ptr<3>, i32, i1
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_drop_expect_tx_shared
+llvm.func private @mbarrier_arrive_drop_expect_tx_shared(%barrier: !llvm.ptr<3>, %txcount: i32) {
+ // CHECK: nvvm.mbarrier.arrive_drop.expect_tx %{{.*}}, %{{.*}} : !llvm.ptr<3>, i32 -> i64
+ %0 = nvvm.mbarrier.arrive_drop.expect_tx %barrier, %txcount : !llvm.ptr<3>, i32 -> i64
+ llvm.return
+}
+
+// CHECK-LABEL: @mbarrier_arrive_drop_expect_tx_cluster
+llvm.func private @mbarrier_arrive_drop_expect_tx_cluster(%barrier: !llvm.ptr<7>, %txcount: i32) {
+ // CHECK: nvvm.mbarrier.arrive_drop.expect_tx %{{.*}}, %{{.*}} : !llvm.ptr<7>, i32{{$}}
+ nvvm.mbarrier.arrive_drop.expect_tx %barrier, %txcount : !llvm.ptr<7>, i32
+ llvm.return
+}
+
// CHECK-LABEL: @wgmma_fence_aligned
func.func @wgmma_fence_aligned() {
// CHECK: nvvm.wgmma.fence.aligned
@@ -616,6 +672,135 @@ llvm.func @kernel_func(%arg0: !llvm.ptr {nvvm.grid_constant}) attributes {nvvm.k
// -----
+// CHECK-LABEL: @vote_sync_infer_type
+func.func @vote_sync_infer_type(%mask : i32, %pred : i1) {
+ // Ballot infers i32 result
+ // CHECK: nvvm.vote.sync ballot %{{.*}}, %{{.*}} -> i32
+ %0 = nvvm.vote.sync ballot %mask, %pred -> i32
+ // All infers i1 result
+ // CHECK: nvvm.vote.sync all %{{.*}}, %{{.*}} -> i1
+ %1 = nvvm.vote.sync all %mask, %pred -> i1
+ // Any infers i1 result
+ // CHECK: nvvm.vote.sync any %{{.*}}, %{{.*}} -> i1
+ %2 = nvvm.vote.sync any %mask, %pred -> i1
+ // Uni infers i1 result
+ // CHECK: nvvm.vote.sync uni %{{.*}}, %{{.*}} -> i1
+ %3 = nvvm.vote.sync uni %mask, %pred -> i1
+ llvm.return
+}
+
+// -----
+
+// CHECK-LABEL: @clusterlaunchcontrol_query_cancel_infer_type
+llvm.func @clusterlaunchcontrol_query_cancel_infer_type(%response : i128) {
+ // is_canceled infers i1
+ // CHECK: nvvm.clusterlaunchcontrol.query.cancel query = is_canceled, %{{.*}} : i1
+ %0 = nvvm.clusterlaunchcontrol.query.cancel query = is_canceled, %response : i1
+ // get_first_cta_id_x infers i32
+ // CHECK: nvvm.clusterlaunchcontrol.query.cancel query = get_first_cta_id_x, %{{.*}} : i32
+ %1 = nvvm.clusterlaunchcontrol.query.cancel query = get_first_cta_id_x, %response : i32
+ // get_first_cta_id_y infers i32
+ // CHECK: nvvm.clusterlaunchcontrol.query.cancel query = get_first_cta_id_y, %{{.*}} : i32
+ %2 = nvvm.clusterlaunchco...
[truncated]
|
72dfd8d to
66b36cc
Compare
|
Could you please run the GPU/CUDA integration tests locally and confirm if they are still good after this change? Generally, whenever we updated bar/mbar items, we have had a few issues to iron out. So, I request to split this into two parts: 1) All the mbar updates 2) Others to keep triage simpler. |
07b6471 to
5a03833
Compare
Also take a look at the NVDSL examples. |
70eaad1 to
3fc7c8a
Compare
|
The MR is split now. This one only contains the barrier changes. This one contains the rest: #188238 |
The NVDSL examples use nvgpu ops (nvgpu.mbarrier_arrive, nvgpu.mbarrier_arrive_expect_tx), not NVVM ops directly. The NVGPUToNVVM lowering for these ops is already updated and tested in this PR, the nvgpu-to-nvvm.mlir conversion test and the sm_90 CUDA integration tests both pass. |
63ad0f2 to
e228838
Compare
Thanks for checking sm_90 integration tests! |
Sure, thanks for splitting this out. Let us update the commit message too. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Confirmed — all 6 NVGPU examples (Ch0–Ch5) pass on both H200 and B200. |
e228838 to
d4c4660
Compare
Great, thanks a ton! |
durga4github
left a comment
There was a problem hiding this comment.
The latest revision LGTM. Let us wait to hear from Guray.
d4c4660 to
eb5f291
Compare
schwarzschild-radius
left a comment
There was a problem hiding this comment.
Just a minor nit! otherwise LGTM, Thanks!
1d772d6 to
365478a
Compare
Add result type inference to the 5 NVVM barrier/mbarrier ops with optional results, using InferTypeOpAdaptorWithIsCompatible. - MBarrierArriveOp, MBarrierArriveDropOp, MBarrierArriveDropExpectTxOp: i64 result for non-shared_cluster pointers, no result otherwise - MBarrierArriveExpectTxOp: same, plus no result when predicate is set - BarrierOp: i32 result when reductionOp attribute is present A permissive isCompatibleReturnTypes allows omitting the result, preserving backward compatibility with the existing zero-result assembly form. This causes the Python binding generator to emit `results=None` as a default parameter instead of requiring explicit result types. Also updates the NVGPUToNVVM conversion to work with the new inferred result types for MBarrierArriveOp and MBarrierArriveExpectTxOp. Note: this is a source-breaking change for Python callers that pass result types positionally. Co-Authored-By: Claude <noreply@anthropic.com>
365478a to
be7ade0
Compare
|
@bastianhagedorn Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/42390 Here is the relevant piece of the build log for the reference |
Add result type inference to 5 NVVM ops whose result types can be fully determined from their operands and attributes. This enables the Python binding generator to emit
results=Noneas a default parameter, removing the need for callers to pass explicit result types.Ops with optional results (using
InferTypeOpAdaptorWithIsCompatible):MBarrierArriveOp: i64 for non-cluster pointers, no result for shared_clusterMBarrierArriveDropOp: same as aboveMBarrierArriveExpectTxOp: same, plus no result when predicate is setMBarrierArriveDropExpectTxOp: same as MBarrierArriveOpBarrierOp: i32 when reductionOp is present, no result otherwiseThe optional-result ops use a permissive
isCompatibleReturnTypesthat allows omitting the result, preserving backward compatibility with the existing zero-result assembly form.Also updates the NVGPUToNVVM conversion to work with the new inferred result types for
MBarrierArriveOpandMBarrierArriveExpectTxOp.This is a source-breaking change for Python callers that pass result types positionally (e.g.
mbarrier_arrive(res, addr, ...)becomesmbarrier_arrive(addr, *, ..., results=None, ...)). Existing MLIR assembly is fully backward compatible.