Skip to content

Conversation

@grypp
Copy link
Member

@grypp grypp commented Nov 14, 2025

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Guray Ozen (grypp)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/168059.diff

1 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+39)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 1cc5b74a3cb67..5992abc8efcfd 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -79,6 +79,45 @@ def NVVM_Dialect : Dialect {
     sequence must be expressed directly, NVVM provides an `nvvm.inline_ptx` op to
     embed PTX inline as a last-resort escape hatch, with explicit operands and
     results.
+
+
+    **Memory Spaces:** The NVVM dialect introduces the following memory spaces,
+    each with distinct scopes and lifetimes:
+
+    | Memory Space      | Scope                | Lifetime          |
+    |-------------------|----------------------|-------------------|
+    | `generic`         | All threads          | Context-dependent |
+    | `global`          | All threads (device) | Application       |
+    | `shared`          | Thread block (CTA)   | Kernel execution  |
+    | `constant`        | All threads (RO)     | Application       |
+    | `local`           | Single thread        | Kernel execution  |
+    | `tensor`          | Thread block (CTA)   | Kernel execution  |
+    | `shared_cluster`  | Thread block cluster | Kernel execution  |
+
+    **Memory Space Details:**
+    - **generic**: Can point to any memory space; requires runtime resolution of
+      actual address space. Use when pointer origin is unknown at compile time.
+      Performance varies based on the underlying memory space.
+    - **global**: Accessible by all threads across all blocks; persists across
+      kernel launches. Highest latency (~400-800 cycles) but largest capacity
+      (device memory). Best for large data and inter-kernel communication.
+    - **shared**: Shared within a thread block (CTA); very fast on-chip memory
+      (~20-40 cycles) for cooperation between threads in the same block. Limited
+      capacity (48-164KB depending on architecture). Ideal for block-level
+      collaboration, caching, and reducing global memory traffic.
+    - **constant**: Read-only memory cached per SM; optimized for broadcast
+      patterns where all threads access the same location. Fast access when cached
+      (~20 cycles). Size typically limited to 64KB. Best for read-only data and
+      uniform values accessed by all threads.
+    - **local**: Private to each thread; used for stack frames and register spills.
+      Actually resides in global memory but cached in L1 (~100-200 cycles). Use for
+      per-thread private data and automatic variables that don't fit in registers.
+    - **tensor**: Special memory space for Tensor Memory Accelerator (TMA)
+      operations on SM 80+ architectures; used with async tensor operations and
+      wgmma instructions. Provides very fast access for matrix operations.
+    - **shared_cluster**: Shared across thread blocks within a cluster (SM 90+);
+      enables collaboration beyond single-block scope with distributed shared
+      memory. Fast access (~40-80 cycles) across cluster threads.
   }];
 
   let name = "nvvm";

@durga4github durga4github changed the title [MLIR][NVVM] Explain memory spaces [MLIR][NVVM][Docs] Explain memory spaces Nov 14, 2025
Copy link
Contributor

@durga4github durga4github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest revision LGTM

Copy link
Contributor

@schwarzschild-radius schwarzschild-radius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@grypp grypp merged commit 35ae515 into llvm:main Nov 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants