[MLIR][NVVM] Add definition for nvvm dialect #156807

grypp · 2025-09-04T06:04:24Z

No description provided.

llvmbot · 2025-09-04T06:04:57Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Guray Ozen (grypp)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/156807.diff

1 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+42-2)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 8537c7030aa8f..3dd6baa058d55 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -35,6 +35,46 @@ def LLVM_PointerSharedCluster : LLVM_PointerInAddressSpace<7>;
 //===----------------------------------------------------------------------===//
 
 def NVVM_Dialect : Dialect {
+  let summary = "The NVVM dialect that models NVIDIA's public ISA";
+
+  let description = [{
+  The NVVM dialect is MLIR's LLVM-IR-based, NVIDIA-specific backend dialect. It
+  models NVVM intrinsics and public ISA functionality and introduces NVIDIA
+  extensions to the LLVM type system and address spaces (e.g., global, shared,
+  and cluster memory), enabling faithful lowering of GPU kernels to the NVPTX
+  toolchain. Many ops have a one-to-many mapping to NVVM/PTX: a single overloaded op
+  emits one intrinsic, selected by its operand types.
+
+  **Scope and capabilities:** The dialect covers core GPU features such as
+  thread/block builtins, barriers and atomics, warp-level collectives (e.g.,
+  shuffle/vote), matrix/tensor core operations (e.g., `mma.sync`, `wgmma`),
+  tensor memory accelerator (TMA) operations, asynchronous copies (`cp.async`,
+  bulk/tensor variants) with memory barriers, cache and prefetch controls, and
+  NVVM-specific attributes and enums (e.g., FP rounding modes, memory scopes, and
+  MMA types/layouts).
+
+  **Non-goals:** NVVM is not a place for convenience or “wrapper” ops. It is not
+  intended to introduce high-level ops that expand into multiple unrelated NVVM
+  intrinsics or that lower to no intrinsic at all. Such abstractions belong in
+  higher-level dialects (e.g., `nvgpu`, `gpu`, or project-specific dialects). The
+  design intent is a thin, predictable, low-level surface with near-mechanical
+  lowering to NVVM/LLVM IR.
+
+  **Placement in the lowering pipeline:** NVVM sits below target-agnostic dialects
+  like `gpu` and NVIDIA's `nvgpu`. Typical pipelines convert `gpu`/`nvgpu` ops
+  into NVVM using `-convert-gpu-to-nvvm` and `-convert-nvgpu-to-nvvm`, then lower
+  to the LLVM dialect for final code generation via LLVM's NVPTX backend.
+
+  **Target configuration and serialization:** NVVM provides a `#nvvm.target`
+  attribute to describe the GPU target (SM, features, and flags). In conjunction
+  with `gpu` serialization (e.g., `gpu-module-to-binary`), this enables producing
+  architecture-specific GPU binaries (such as CUBIN) from nested GPU modules.
+
+  **Inline PTX:** When an intrinsic is unavailable or a performance-critical sequence
+  must be expressed directly, NVVM provides an `nvvm.inline_ptx` op to embed PTX
+  inline as a last-resort escape hatch, with explicit operands and results.
+  }];
+
   let name = "nvvm";
   let cppNamespace = "::mlir::NVVM";
   let dependentDialects = ["LLVM::LLVMDialect"];
@@ -976,7 +1016,7 @@ def NVVM_ShflOp :
   let description = [{
     The `shfl.sync` Op implements data shuffle within threads of a warp.
     The `thread_mask` denotes the threads participating in the Op where
-    the bit position corresponds to a particular thread’s laneid.
+    the bit position corresponds to a particular thread's laneid.
     The `offset` specifies a source lane or source lane offset
     (depending on `kind`). The `val` is the input value to be copied from
     the source. The `mask_and_clamp` contains two packed values specifying
@@ -1031,7 +1071,7 @@ def NVVM_VoteSyncOp
     - `ballot`: In the ballot form, the destination result is a 32 bit integer.
       In this form, the predicate from each thread in membermask are copied into
       the corresponding bit position of the result, where the bit position
-      corresponds to the thread’s lane id.
+      corresponds to the thread's lane id.
 
     [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-vote-sync)
   }];

Copilot

Pull Request Overview

This PR adds comprehensive documentation for the NVVM dialect in MLIR, including a detailed description of its purpose, scope, capabilities, and position in the lowering pipeline. It also fixes minor whitespace issues in existing documentation.

Adds detailed summary and description fields to the NVVM_Dialect definition
Documents the dialect's role as MLIR's NVIDIA-specific backend dialect for GPU programming
Fixes indentation issues in two existing operation descriptions

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

joker-eph

LG overall, but I wonder if @fabianmcg has thoughts on this as well?

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

joker-eph · 2025-09-04T15:20:41Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+  **Placement in the lowering pipeline:** NVVM sits below target-agnostic dialects
+  like `gpu` and NVIDIA's `nvgpu`. Typical pipelines convert `gpu`/`nvgpu` ops
+  into NVVM using `-convert-gpu-to-nvvm` and `-convert-nvgpu-to-nvvm`, then lower
+  to the LLVM dialect for final code generation via LLVM's NVPTX backend.


Actually that's a bit confusing: this reads as if the NVVM dialect lowers to the LLVM dialect, whereas I expect it to translate to LLVM IR.

Another thing I'm not sure if how we should be able to integrate with convert-to-llvm, but that's another story.

changed the definition

Another thing I'm not sure if how we should be able to integrate with convert-to-llvm, but that's another story.

Technically, it's already integrated on the dynamic=true case.
https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm-target-attr.mlir

fabianmcg · 2025-09-04T16:06:24Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+  **Scope and capabilities:** The dialect covers core GPU features such as
+  thread/block builtins, barriers and atomics, warp-level collectives (e.g.,
+  shuffle/vote), matrix/tensor core operations (e.g., `mma.sync`, `wgmma`),
+  tensor memory accelerator (TMA) operations, asynchronous copies (`cp.async`,
+  bulk/tensor variants) with memory barriers, cache and prefetch controls, and
+  NVVM-specific attributes and enums (e.g., FP rounding modes, memory scopes, and
+  MMA types/layouts).
+
+  **Non-goals:** NVVM is not a place for convenience or “wrapper” ops. It is not
+  intended to introduce high-level ops that expand into multiple unrelated NVVM
+  intrinsics or that lower to no intrinsic at all. Such abstractions belong in
+  higher-level dialects (e.g., `nvgpu`, `gpu`, or project-specific dialects). The
+  design intent is a thin, predictable, low-level surface with near-mechanical
+  lowering to NVVM/LLVM IR.
+
+  **Placement in the lowering pipeline:** NVVM sits below target-agnostic dialects
+  like `gpu` and NVIDIA's `nvgpu`. Typical pipelines convert `gpu`/`nvgpu` ops
+  into NVVM using `-convert-gpu-to-nvvm` and `-convert-nvgpu-to-nvvm`, then lower
+  to the LLVM dialect for final code generation via LLVM's NVPTX backend.
+
+  **Target configuration and serialization:** NVVM provides a `#nvvm.target`
+  attribute to describe the GPU target (SM, features, and flags). In conjunction
+  with `gpu` serialization (e.g., `gpu-module-to-binary`), this enables producing
+  architecture-specific GPU binaries (such as CUBIN) from nested GPU modules.


In general it looks good to me. However, I'm wondering if these paragraphs would be better suited in the dialect docs page, instead of having them in the dialect description.

Do you mean markdown files?

afaik, there is no md files today for nvvm. I can create it if this description grows further.

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

durga4github · 2025-09-04T17:34:13Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+  let description = [{
+  The NVVM dialect is MLIR's LLVM-IR-based, NVIDIA-specific backend dialect. It
+  models NVVM intrinsics and public ISA functionality and introduces NVIDIA
+  extensions to the LLVM type system and address spaces (e.g., global, shared,


LLVM/MLIR type system...

durga4github · 2025-09-04T17:36:13Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+  and cluster memory), enabling faithful lowering of GPU kernels to the NVPTX
+  toolchain. While a NVVM op maps to a single LLVM IR intrinsics, the NVVM dialect uses type
+  polymorphism and other attributes that make it so that a NVVM op can map to different LLVM 
+  intrinsics.


nit: slight wording updates:

While an NVVM op usually maps to a single LLVM IR intrinsic, the NVVM dialect uses type polymorphism and other attributes so that a single NVVM op can map to different LLVM intrinsics.

durga4github · 2025-09-04T17:37:41Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+  **Placement in the lowering pipeline:** NVVM sits below target-agnostic dialects
+  like `gpu` and NVIDIA's `nvgpu`. Typical pipelines convert `gpu`/`nvgpu` ops
+  into NVVM using `-convert-gpu-to-nvvm` and `-convert-nvgpu-to-nvvm`, then translate
+  into the LLVM for final code generation via NVPTX backend.


nit: into LLVM for ...

durga4github

LGTM with a few minor suggestions.

[MLIR][NVVM] Add definition for nvvm dialect

8cee923

grypp requested review from clementval, joker-eph, schwarzschild-radius, Wolfram70, durga4github and Copilot September 4, 2025 06:04

llvmbot added mlir:llvm mlir labels Sep 4, 2025

Copilot AI reviewed Sep 4, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

joker-eph approved these changes Sep 4, 2025

View reviewed changes

fabianmcg reviewed Sep 4, 2025

View reviewed changes

grypp and others added 2 commits September 4, 2025 18:25

Update mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

d1b2e39

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

Update NVVMOps.td

4774782

durga4github reviewed Sep 4, 2025

View reviewed changes

durga4github approved these changes Sep 4, 2025

View reviewed changes

Wolfram70 approved these changes Sep 4, 2025

View reviewed changes

clementval approved these changes Sep 4, 2025

View reviewed changes

schwarzschild-radius approved these changes Sep 5, 2025

View reviewed changes

fx

19cd0aa

grypp merged commit 6ecbed8 into llvm:main Sep 5, 2025
9 checks passed

[MLIR][NVVM] Add definition for nvvm dialect #156807

[MLIR][NVVM] Add definition for nvvm dialect #156807

Uh oh!

Conversation

grypp commented Sep 4, 2025

Uh oh!

llvmbot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

joker-eph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianmcg Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

durga4github left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Sep 4, 2025 •

edited

Loading

fabianmcg Sep 4, 2025 •

edited

Loading