[mlir][gpu] `gpu-module-to-binary`: add option to dump intermediate files #170016

Hardcode84 · 2025-11-30T00:43:42Z

Add option to specify dir to dump intermediate files during gpu binaries generation for debugging.

Also fix ROCDL lowering bug where callbacks weren't propagated.

…iles Add option to specify dir to dump inetrmediate files during gpu binaries generation for debug. Also fix ROCDL lowering bug where callbacks weren't propagated.

llvmbot · 2025-11-30T00:44:10Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Ivan Butygin (Hardcode84)

Changes

Add option to specify dir to dump intermediate files during gpu binaries generation for debugging.

Also fix ROCDL lowering bug where callbacks weren't propagated.

Full diff: https://github.com/llvm/llvm-project/pull/170016.diff

6 Files Affected:

(modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+3-1)
(modified) mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp (+62-1)
(modified) mlir/lib/Target/LLVM/ModuleToObject.cpp (+5-1)
(modified) mlir/lib/Target/LLVM/ROCDL/Target.cpp (+9-1)
(added) mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir (+17)
(added) mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir (+17)

diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
index 0c8a0c7a677ab..bfb407b3d7907 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
@@ -113,7 +113,9 @@ def GpuModuleToBinaryPass
     Option<"compilationTarget", "format", "std::string", [{"fatbin"}],
            "The target representation of the compilation process.">,
     Option<"elfSection", "section", "std::string", [{""}],
-           "ELF section where binary is to be located.">
+           "ELF section where binary is to be located.">,
+    Option<"dumpIntermediates", "dump-intermediates", "std::string", [{""}],
+           "Directory to dump intermediate artifacts (LLVM IR, device assembly).">
   ];
 }
 
diff --git a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
index 95d5cadbd4e1a..c55cfa25c3482 100644
--- a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
@@ -17,6 +17,12 @@
 
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringSwitch.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/ToolOutputFile.h"
+
+#define DEBUG_TYPE "gpu-module-to-binary"
 
 using namespace mlir;
 using namespace mlir::gpu;
@@ -26,6 +32,27 @@ namespace mlir {
 #include "mlir/Dialect/GPU/Transforms/Passes.h.inc"
 } // namespace mlir
 
+static void dumpToFile(StringRef dumpDir, const llvm::Twine &filename,
+                       function_ref<void(llvm::raw_ostream &)> writeContent) {
+  if (dumpDir.empty())
+    return;
+
+  llvm::SmallString<128> path(dumpDir);
+  llvm::sys::path::append(path, filename);
+
+  std::error_code ec;
+  llvm::ToolOutputFile output(path, ec, llvm::sys::fs::OF_None);
+  if (ec) {
+    LLVM_DEBUG(llvm::dbgs() << "Failed to create file '" << path
+                            << "': " << ec.message() << "\n");
+    return;
+  }
+
+  writeContent(output.os());
+  output.keep();
+  LLVM_DEBUG(llvm::dbgs() << "Dumped intermediate to: " << path << "\n");
+}
+
 namespace {
 class GpuModuleToBinaryPass
     : public impl::GpuModuleToBinaryPassBase<GpuModuleToBinaryPass> {
@@ -64,8 +91,42 @@ void GpuModuleToBinaryPass::runOnOperation() {
   SmallVector<Attribute> librariesToLink;
   for (const std::string &path : linkFiles)
     librariesToLink.push_back(StringAttr::get(&getContext(), path));
+
+  // Create dump directory if specified
+  if (!dumpIntermediates.empty()) {
+    if (std::error_code ec =
+            llvm::sys::fs::create_directories(dumpIntermediates)) {
+      getOperation()->emitError() << "Failed to create dump directory '"
+                                  << dumpIntermediates << "': " << ec.message();
+      return signalPassFailure();
+    }
+  }
+
+  // Create callbacks for dumping intermediate artifacts if requested
+  auto initialIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".initial.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto linkedIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".linked.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto optimizedIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".opt.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto isaCallback = [&](StringRef isa) {
+    dumpToFile(dumpIntermediates, "kernel.isa",
+               [&](llvm::raw_ostream &os) { os << isa; });
+  };
+
   TargetOptions targetOptions(toolkitPath, librariesToLink, cmdOptions,
-                              elfSection, *targetFormat, lazyTableBuilder);
+                              elfSection, *targetFormat, lazyTableBuilder,
+                              initialIRCallback, linkedIRCallback,
+                              optimizedIRCallback, isaCallback);
   if (failed(transformGpuModulesToBinaries(
           getOperation(), OffloadingLLVMTranslationAttrInterface(nullptr),
           targetOptions)))
diff --git a/mlir/lib/Target/LLVM/ModuleToObject.cpp b/mlir/lib/Target/LLVM/ModuleToObject.cpp
index 4098ccc548dc1..ddbf568bc6568 100644
--- a/mlir/lib/Target/LLVM/ModuleToObject.cpp
+++ b/mlir/lib/Target/LLVM/ModuleToObject.cpp
@@ -143,7 +143,11 @@ LogicalResult ModuleToObject::loadBitcodeFilesFromList(
 
 std::unique_ptr<llvm::Module>
 ModuleToObject::translateToLLVMIR(llvm::LLVMContext &llvmContext) {
-  return translateModuleToLLVMIR(&getOperation(), llvmContext);
+  Operation &op = getOperation();
+  // Try to get nicer name from the operation.
+  auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");
+  StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";
+  return translateModuleToLLVMIR(&op, llvmContext, name);
 }
 
 LogicalResult
diff --git a/mlir/lib/Target/LLVM/ROCDL/Target.cpp b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
index f813f8db8fc94..6b3cbbddcea08 100644
--- a/mlir/lib/Target/LLVM/ROCDL/Target.cpp
+++ b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
@@ -95,7 +95,11 @@ SerializeGPUModuleBase::SerializeGPUModuleBase(
     Operation &module, ROCDLTargetAttr target,
     const gpu::TargetOptions &targetOptions)
     : ModuleToObject(module, target.getTriple(), target.getChip(),
-                     target.getFeatures(), target.getO()),
+                     target.getFeatures(), target.getO(),
+                     targetOptions.getInitialLlvmIRCallback(),
+                     targetOptions.getLinkedLlvmIRCallback(),
+                     targetOptions.getOptimizedLlvmIRCallback(),
+                     targetOptions.getISACallback()),
       target(target), toolkitPath(targetOptions.getToolkitPath()),
       librariesToLink(targetOptions.getLibrariesToLink()) {
 
@@ -428,6 +432,10 @@ std::optional<SmallVector<char, 0>> SerializeGPUModuleBase::moduleToObjectImpl(
     getOperation().emitError() << "failed translating the module to ISA";
     return std::nullopt;
   }
+
+  if (isaCallback)
+    isaCallback(serializedISA.value());
+
 #define DEBUG_TYPE "serialize-to-isa"
   LLVM_DEBUG({
     llvm::dbgs() << "ISA for module: "
diff --git a/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir b/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir
new file mode 100644
index 0000000000000..af3e42cf346bb
--- /dev/null
+++ b/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir
@@ -0,0 +1,17 @@
+// REQUIRES: host-supports-nvptx
+// RUN: rm -rf %t || true
+// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
+// RUN: test -f %t/kernel_module.initial.ll
+// RUN: test -f %t/kernel_module.linked.ll
+// RUN: test -f %t/kernel_module.opt.ll
+// RUN: test -f %t/kernel.isa
+
+module attributes {gpu.container_module} {
+  // CHECK-LABEL: gpu.binary @kernel_module
+
+  gpu.module @kernel_module [#nvvm.target<chip = "sm_70">] {
+    llvm.func @kernel(%arg0: f32) {
+      llvm.return
+    }
+  }
+}
diff --git a/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir b/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir
new file mode 100644
index 0000000000000..ad5af5e9742e4
--- /dev/null
+++ b/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir
@@ -0,0 +1,17 @@
+// REQUIRES: host-supports-amdgpu
+// RUN: rm -rf %t || true
+// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
+// RUN: test -f %t/kernel_module.initial.ll
+// RUN: test -f %t/kernel_module.linked.ll
+// RUN: test -f %t/kernel_module.opt.ll
+// RUN: test -f %t/kernel.isa
+
+module attributes {gpu.container_module} {
+  // CHECK-LABEL: gpu.binary @kernel_module
+
+  gpu.module @kernel_module [#rocdl.target<chip = "gfx942">] {
+    llvm.func @kernel(%arg0: f32) {
+      llvm.return
+    }
+  }
+}

mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp

mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir

fabianmcg

LGTM in general, just fix the sym_name comment before merging.

fabianmcg · 2025-11-30T16:55:56Z

mlir/lib/Target/LLVM/ModuleToObject.cpp

+  // Try to get nicer name from the operation.
+  auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");
+  StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";


Let's not use sym_name directly, this omits ops using properties.

Suggested change

// Try to get nicer name from the operation.

auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");

StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";

StringRef name = "LLVMDialectModule";

// Try to get nicer name from the operation.

if (auto symOp = dyn_cast<SymbolOpInterface>(op); symOp && symOp.getNameAttr())

name = symOp.getNameAttr().getValue();

joker-eph · 2025-12-01T09:58:44Z

mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp

+  llvm::ToolOutputFile output(path, ec, llvm::sys::fs::OF_None);
+  if (ec) {
+    LDBG() << "Failed to create file '" << path << "': " << ec.message()
+           << "\n";


Looks to me like the kind of things that should be a user-visible diagnostic.

Updated. We cannot signal errors from these callbacks directly so I have to use signalPassFailure to make pass fail later.

mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp

fabianmcg · 2025-12-01T10:58:51Z

mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp

+    if (failed(
+            dumpToFile(op, dumpIntermediates, mod.getName() + ".initial.ll",
+                       [&](llvm::raw_ostream &os) { mod.print(os, nullptr); })))
+      signalPassFailure();


Either we make callbacks always successful, ie. no signalPassFailure(), or we need to make callbacks return a LogicalResult and update everywhere, eg. https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h#L130-L133

IMO it's better making them return a LogicalResult.

Either we make callbacks always successful, ie. no signalPassFailure(),

I don't quite see why? The LogicalResult does not necessarily seem relevant to the ModuleToObject::run logic right now.
The callbacks are setup here and the failure can be handled out-of-band.

Ok, this actually looks straightforward enough change, I can do it.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

In this particular case, while failing to dump it's unlikely to crash ModuleToObject, It's a waste of resources to let it continue if we know much earlier that the pass it's going to fail.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

I don't agree: from the perspective other the ModuleToObject logic. we want to preserve this assumption that these callbacks don't create invalid states.
The failure is unrelated to the ModuleToObject logic here, hence it should be transparent and handled in the pass.

How does the pass knows if ModuleToObject succeeds?

ModuleToObject isn't failable right now, but I'm failing to link the question to the current discussion actually?

Also, in a frail system (take at least disk almost full, low ram, SWAP full...), the initial callback can fail, put the system in a an even more fragile state. Since ModuleToObject never became aware, it will continue, create some temp files (both NVVM and ROCDL targets create files), and fail again, but with a higher potential to crash, and it will be harder to debug where the problem started.

I don't quite see a concern with this, LLVM isn't resilient to system issues such as "swap filing up" or "machine running out of memory" in general.

It is failable: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Target/LLVM/ModuleToObject.h#L44

(TODO for me, use FailureOr instead of std::optional)

While LLVM it's not resilient to the issues I mentioned. It makes for a better debugging/user experience to fail at the earliest.

Actually, I think making these callbacks fallible is good improvement regardless of this PR, and it doesn't looks like particularly complicated change, I can do it as separate PR.

I may have objections with the change depending on how it looks, I'm not convinced that these should be failable conceptually (in terms of what these APIs are meant to provide).

Hardcode84 · 2025-12-01T11:16:18Z

FYI, IREE dump-intermediates against which this option was modeled is actually ignores errors.

[mlir][gpu] gpu-module-to-binary: add option to dump intermediate f…

8ce37a2

…iles Add option to specify dir to dump inetrmediate files during gpu binaries generation for debug. Also fix ROCDL lowering bug where callbacks weren't propagated.

Hardcode84 requested review from ftynse, joker-eph and kuhar November 30, 2025 00:43

Hardcode84 requested a review from fabianmcg as a code owner November 30, 2025 00:43

llvmbot added mlir:llvm mlir:gpu mlir labels Nov 30, 2025

kuhar reviewed Nov 30, 2025

View reviewed changes

review comments

b102b57

Hardcode84 force-pushed the dump-inter branch from beed7ae to b102b57 Compare November 30, 2025 16:19

fabianmcg approved these changes Nov 30, 2025

View reviewed changes

Hardcode84 added 2 commits November 30, 2025 18:08

SymbolOpInterface

526849e

update test

30abee7

kuhar approved these changes Nov 30, 2025

View reviewed changes

joker-eph reviewed Dec 1, 2025

View reviewed changes

Hardcode84 added 2 commits December 1, 2025 11:46

report error if fgalied to create intermediate files

64a0a28

return emitError()

4ba95d3

fabianmcg requested changes Dec 1, 2025

View reviewed changes

Hardcode84 mentioned this pull request Dec 1, 2025

[mlir][gpu] Propagate errors from ModuleToObject callbacks #170134

Open

[mlir][gpu] gpu-module-to-binary: add option to dump intermediate files #170016

Are you sure you want to change the base?

[mlir][gpu] gpu-module-to-binary: add option to dump intermediate files #170016

Conversation

Hardcode84 commented Nov 30, 2025

Uh oh!

llvmbot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fabianmcg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianmcg Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hardcode84 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[mlir][gpu] `gpu-module-to-binary`: add option to dump intermediate files #170016

[mlir][gpu] `gpu-module-to-binary`: add option to dump intermediate files #170016

llvmbot commented Nov 30, 2025 •

edited

Loading

fabianmcg Dec 1, 2025 •

edited

Loading