Skip to content

Conversation

@Hardcode84
Copy link
Contributor

Add option to specify dir to dump intermediate files during gpu binaries generation for debugging.

Also fix ROCDL lowering bug where callbacks weren't propagated.

…iles

Add option to specify dir to dump inetrmediate files during gpu binaries generation for debug.

Also fix ROCDL lowering bug where callbacks weren't propagated.
@llvmbot
Copy link
Member

llvmbot commented Nov 30, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Ivan Butygin (Hardcode84)

Changes

Add option to specify dir to dump intermediate files during gpu binaries generation for debugging.

Also fix ROCDL lowering bug where callbacks weren't propagated.


Full diff: https://github.com/llvm/llvm-project/pull/170016.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+3-1)
  • (modified) mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp (+62-1)
  • (modified) mlir/lib/Target/LLVM/ModuleToObject.cpp (+5-1)
  • (modified) mlir/lib/Target/LLVM/ROCDL/Target.cpp (+9-1)
  • (added) mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir (+17)
  • (added) mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir (+17)
diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
index 0c8a0c7a677ab..bfb407b3d7907 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
@@ -113,7 +113,9 @@ def GpuModuleToBinaryPass
     Option<"compilationTarget", "format", "std::string", [{"fatbin"}],
            "The target representation of the compilation process.">,
     Option<"elfSection", "section", "std::string", [{""}],
-           "ELF section where binary is to be located.">
+           "ELF section where binary is to be located.">,
+    Option<"dumpIntermediates", "dump-intermediates", "std::string", [{""}],
+           "Directory to dump intermediate artifacts (LLVM IR, device assembly).">
   ];
 }
 
diff --git a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
index 95d5cadbd4e1a..c55cfa25c3482 100644
--- a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
@@ -17,6 +17,12 @@
 
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringSwitch.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/ToolOutputFile.h"
+
+#define DEBUG_TYPE "gpu-module-to-binary"
 
 using namespace mlir;
 using namespace mlir::gpu;
@@ -26,6 +32,27 @@ namespace mlir {
 #include "mlir/Dialect/GPU/Transforms/Passes.h.inc"
 } // namespace mlir
 
+static void dumpToFile(StringRef dumpDir, const llvm::Twine &filename,
+                       function_ref<void(llvm::raw_ostream &)> writeContent) {
+  if (dumpDir.empty())
+    return;
+
+  llvm::SmallString<128> path(dumpDir);
+  llvm::sys::path::append(path, filename);
+
+  std::error_code ec;
+  llvm::ToolOutputFile output(path, ec, llvm::sys::fs::OF_None);
+  if (ec) {
+    LLVM_DEBUG(llvm::dbgs() << "Failed to create file '" << path
+                            << "': " << ec.message() << "\n");
+    return;
+  }
+
+  writeContent(output.os());
+  output.keep();
+  LLVM_DEBUG(llvm::dbgs() << "Dumped intermediate to: " << path << "\n");
+}
+
 namespace {
 class GpuModuleToBinaryPass
     : public impl::GpuModuleToBinaryPassBase<GpuModuleToBinaryPass> {
@@ -64,8 +91,42 @@ void GpuModuleToBinaryPass::runOnOperation() {
   SmallVector<Attribute> librariesToLink;
   for (const std::string &path : linkFiles)
     librariesToLink.push_back(StringAttr::get(&getContext(), path));
+
+  // Create dump directory if specified
+  if (!dumpIntermediates.empty()) {
+    if (std::error_code ec =
+            llvm::sys::fs::create_directories(dumpIntermediates)) {
+      getOperation()->emitError() << "Failed to create dump directory '"
+                                  << dumpIntermediates << "': " << ec.message();
+      return signalPassFailure();
+    }
+  }
+
+  // Create callbacks for dumping intermediate artifacts if requested
+  auto initialIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".initial.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto linkedIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".linked.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto optimizedIRCallback = [&](llvm::Module &module) {
+    dumpToFile(dumpIntermediates, module.getName() + ".opt.ll",
+               [&](llvm::raw_ostream &os) { module.print(os, nullptr); });
+  };
+
+  auto isaCallback = [&](StringRef isa) {
+    dumpToFile(dumpIntermediates, "kernel.isa",
+               [&](llvm::raw_ostream &os) { os << isa; });
+  };
+
   TargetOptions targetOptions(toolkitPath, librariesToLink, cmdOptions,
-                              elfSection, *targetFormat, lazyTableBuilder);
+                              elfSection, *targetFormat, lazyTableBuilder,
+                              initialIRCallback, linkedIRCallback,
+                              optimizedIRCallback, isaCallback);
   if (failed(transformGpuModulesToBinaries(
           getOperation(), OffloadingLLVMTranslationAttrInterface(nullptr),
           targetOptions)))
diff --git a/mlir/lib/Target/LLVM/ModuleToObject.cpp b/mlir/lib/Target/LLVM/ModuleToObject.cpp
index 4098ccc548dc1..ddbf568bc6568 100644
--- a/mlir/lib/Target/LLVM/ModuleToObject.cpp
+++ b/mlir/lib/Target/LLVM/ModuleToObject.cpp
@@ -143,7 +143,11 @@ LogicalResult ModuleToObject::loadBitcodeFilesFromList(
 
 std::unique_ptr<llvm::Module>
 ModuleToObject::translateToLLVMIR(llvm::LLVMContext &llvmContext) {
-  return translateModuleToLLVMIR(&getOperation(), llvmContext);
+  Operation &op = getOperation();
+  // Try to get nicer name from the operation.
+  auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");
+  StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";
+  return translateModuleToLLVMIR(&op, llvmContext, name);
 }
 
 LogicalResult
diff --git a/mlir/lib/Target/LLVM/ROCDL/Target.cpp b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
index f813f8db8fc94..6b3cbbddcea08 100644
--- a/mlir/lib/Target/LLVM/ROCDL/Target.cpp
+++ b/mlir/lib/Target/LLVM/ROCDL/Target.cpp
@@ -95,7 +95,11 @@ SerializeGPUModuleBase::SerializeGPUModuleBase(
     Operation &module, ROCDLTargetAttr target,
     const gpu::TargetOptions &targetOptions)
     : ModuleToObject(module, target.getTriple(), target.getChip(),
-                     target.getFeatures(), target.getO()),
+                     target.getFeatures(), target.getO(),
+                     targetOptions.getInitialLlvmIRCallback(),
+                     targetOptions.getLinkedLlvmIRCallback(),
+                     targetOptions.getOptimizedLlvmIRCallback(),
+                     targetOptions.getISACallback()),
       target(target), toolkitPath(targetOptions.getToolkitPath()),
       librariesToLink(targetOptions.getLibrariesToLink()) {
 
@@ -428,6 +432,10 @@ std::optional<SmallVector<char, 0>> SerializeGPUModuleBase::moduleToObjectImpl(
     getOperation().emitError() << "failed translating the module to ISA";
     return std::nullopt;
   }
+
+  if (isaCallback)
+    isaCallback(serializedISA.value());
+
 #define DEBUG_TYPE "serialize-to-isa"
   LLVM_DEBUG({
     llvm::dbgs() << "ISA for module: "
diff --git a/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir b/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir
new file mode 100644
index 0000000000000..af3e42cf346bb
--- /dev/null
+++ b/mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir
@@ -0,0 +1,17 @@
+// REQUIRES: host-supports-nvptx
+// RUN: rm -rf %t || true
+// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
+// RUN: test -f %t/kernel_module.initial.ll
+// RUN: test -f %t/kernel_module.linked.ll
+// RUN: test -f %t/kernel_module.opt.ll
+// RUN: test -f %t/kernel.isa
+
+module attributes {gpu.container_module} {
+  // CHECK-LABEL: gpu.binary @kernel_module
+
+  gpu.module @kernel_module [#nvvm.target<chip = "sm_70">] {
+    llvm.func @kernel(%arg0: f32) {
+      llvm.return
+    }
+  }
+}
diff --git a/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir b/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir
new file mode 100644
index 0000000000000..ad5af5e9742e4
--- /dev/null
+++ b/mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir
@@ -0,0 +1,17 @@
+// REQUIRES: host-supports-amdgpu
+// RUN: rm -rf %t || true
+// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
+// RUN: test -f %t/kernel_module.initial.ll
+// RUN: test -f %t/kernel_module.linked.ll
+// RUN: test -f %t/kernel_module.opt.ll
+// RUN: test -f %t/kernel.isa
+
+module attributes {gpu.container_module} {
+  // CHECK-LABEL: gpu.binary @kernel_module
+
+  gpu.module @kernel_module [#rocdl.target<chip = "gfx942">] {
+    llvm.func @kernel(%arg0: f32) {
+      llvm.return
+    }
+  }
+}

Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, just fix the sym_name comment before merging.

Comment on lines 147 to 149
// Try to get nicer name from the operation.
auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");
StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use sym_name directly, this omits ops using properties.

Suggested change
// Try to get nicer name from the operation.
auto nameAttr = op.getAttrOfType<StringAttr>("sym_name");
StringRef name = nameAttr ? nameAttr.getValue() : "LLVMDialectModule";
StringRef name = "LLVMDialectModule";
// Try to get nicer name from the operation.
if (auto symOp = dyn_cast<SymbolOpInterface>(op); symOp && symOp.getNameAttr())
name = symOp.getNameAttr().getValue();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

llvm::ToolOutputFile output(path, ec, llvm::sys::fs::OF_None);
if (ec) {
LDBG() << "Failed to create file '" << path << "': " << ec.message()
<< "\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like the kind of things that should be a user-visible diagnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. We cannot signal errors from these callbacks directly so I have to use signalPassFailure to make pass fail later.

Comment on lines +107 to +110
if (failed(
dumpToFile(op, dumpIntermediates, mod.getName() + ".initial.ll",
[&](llvm::raw_ostream &os) { mod.print(os, nullptr); })))
signalPassFailure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we make callbacks always successful, ie. no signalPassFailure(), or we need to make callbacks return a LogicalResult and update everywhere, eg. https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h#L130-L133

IMO it's better making them return a LogicalResult.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we make callbacks always successful, ie. no signalPassFailure(),

I don't quite see why? The LogicalResult does not necessarily seem relevant to the ModuleToObject::run logic right now.
The callbacks are setup here and the failure can be handled out-of-band.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this actually looks straightforward enough change, I can do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

In this particular case, while failing to dump it's unlikely to crash ModuleToObject, It's a waste of resources to let it continue if we know much earlier that the pass it's going to fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

I don't agree: from the perspective other the ModuleToObject logic. we want to preserve this assumption that these callbacks don't create invalid states.
The failure is unrelated to the ModuleToObject logic here, hence it should be transparent and handled in the pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the pass knows if ModuleToObject succeeds?

ModuleToObject isn't failable right now, but I'm failing to link the question to the current discussion actually?

Also, in a frail system (take at least disk almost full, low ram, SWAP full...), the initial callback can fail, put the system in a an even more fragile state. Since ModuleToObject never became aware, it will continue, create some temp files (both NVVM and ROCDL targets create files), and fail again, but with a higher potential to crash, and it will be harder to debug where the problem started.

I don't quite see a concern with this, LLVM isn't resilient to system issues such as "swap filing up" or "machine running out of memory" in general.

Copy link
Contributor

@fabianmcg fabianmcg Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is failable: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Target/LLVM/ModuleToObject.h#L44

(TODO for me, use FailureOr instead of std::optional)

While LLVM it's not resilient to the issues I mentioned. It makes for a better debugging/user experience to fail at the earliest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think making these callbacks fallible is good improvement regardless of this PR, and it doesn't looks like particularly complicated change, I can do it as separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have objections with the change depending on how it looks, I'm not convinced that these should be failable conceptually (in terms of what these APIs are meant to provide).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hardcode84
Copy link
Contributor Author

FYI, IREE dump-intermediates against which this option was modeled is actually ignores errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants