Skip to content

Conversation

pabloantoniom
Copy link
Contributor

@pabloantoniom pabloantoniom commented Sep 8, 2025

This PR deletes the createLowerGpuOpsToROCDLOpsPass constructor from
the .td file, making the createConvertGpuOpsToROCDLOps pass available to
users. This has the following effects:

  1. createLowerGpuOpsToROCDLOpsPass is not available anymore. Instead,
    createConvertGpuOpsToROCDLOps should be used. This makes the interface
    consistent with ConvertGpuOpsToNVVMOps.

  2. To call createConvertGpuOpsToROCDLOps, the options must be passed
    via ConvertGpuOpsToROCDLOpsOptions. This has the side effect of
    making the allowed-dialects option available, which was not
    accessible via C++ before.

The `convert-gpu-to-rocdl` pass provides the option `allowed-dialects`,
which allows users to control which dialects can be used to populate
conversions.

This PR adds a C++ argument to createLowerGpuOpsToROCDLOpsPass, so
that this option can also be controlled programatically when creating
the pass.
@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Pablo Antonio Martinez (pabloantoniom)

Changes

The convert-gpu-to-rocdl pass provides the option allowed-dialects, which allows users to control which dialects can be used to populate conversions.

This PR adds a C++ argument to createLowerGpuOpsToROCDLOpsPass, so that this option can also be controlled programatically when creating the pass.

cc: @dhernandez0


Full diff: https://github.com/llvm/llvm-project/pull/157402.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h (+5-1)
  • (modified) mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (+15-8)
diff --git a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
index 291b809071ce9..a6099bde2a70e 100644
--- a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
+++ b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
@@ -10,6 +10,8 @@
 
 #include "mlir/Conversion/GPUToROCDL/Runtimes.h"
 #include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
+#include "llvm/ADT/DenseSet.h"
+#include <cstddef>
 #include <memory>
 
 namespace mlir {
@@ -50,7 +52,9 @@ createLowerGpuOpsToROCDLOpsPass(
     const std::string &chipset = "gfx900",
     unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout,
     bool useBarePtrCallConv = false,
-    gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown);
+    gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown,
+    const std::optional<llvm::SmallDenseSet<llvm::StringRef>> &allowedDialects =
+        std::nullopt);
 
 } // namespace mlir
 
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 807d1f52ee69b..965089df0303e 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -288,9 +288,10 @@ struct GPUShuffleOpLowering : public ConvertOpToLLVMPattern<gpu::ShuffleOp> {
 struct LowerGpuOpsToROCDLOpsPass final
     : public impl::ConvertGpuOpsToROCDLOpsBase<LowerGpuOpsToROCDLOpsPass> {
   LowerGpuOpsToROCDLOpsPass() = default;
-  LowerGpuOpsToROCDLOpsPass(const std::string &chipset, unsigned indexBitwidth,
-                            bool useBarePtrCallConv,
-                            gpu::amd::Runtime runtime) {
+  LowerGpuOpsToROCDLOpsPass(
+      const std::string &chipset, unsigned indexBitwidth,
+      bool useBarePtrCallConv, gpu::amd::Runtime runtime,
+      std::optional<llvm::SmallDenseSet<StringRef>> allowedDialects) {
     if (this->chipset.getNumOccurrences() == 0)
       this->chipset = chipset;
     if (this->indexBitwidth.getNumOccurrences() == 0)
@@ -299,6 +300,12 @@ struct LowerGpuOpsToROCDLOpsPass final
       this->useBarePtrCallConv = useBarePtrCallConv;
     if (this->runtime.getNumOccurrences() == 0)
       this->runtime = runtime;
+    if (this->allowedDialects.getNumOccurrences() == 0 &&
+        allowedDialects.has_value()) {
+      for (auto &str : allowedDialects.value()) {
+        this->allowedDialects.push_back(str.str());
+      }
+    }
   }
 
   void getDependentDialects(DialectRegistry &registry) const override {
@@ -501,10 +508,10 @@ void mlir::populateGpuToROCDLConversionPatterns(
 }
 
 std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
-mlir::createLowerGpuOpsToROCDLOpsPass(const std::string &chipset,
-                                      unsigned indexBitwidth,
-                                      bool useBarePtrCallConv,
-                                      gpu::amd::Runtime runtime) {
+mlir::createLowerGpuOpsToROCDLOpsPass(
+    const std::string &chipset, unsigned indexBitwidth, bool useBarePtrCallConv,
+    gpu::amd::Runtime runtime,
+    const std::optional<llvm::SmallDenseSet<StringRef>> &allowedDialects) {
   return std::make_unique<LowerGpuOpsToROCDLOpsPass>(
-      chipset, indexBitwidth, useBarePtrCallConv, runtime);
+      chipset, indexBitwidth, useBarePtrCallConv, runtime, allowedDialects);
 }

gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown);
gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown,
const std::optional<llvm::SmallDenseSet<llvm::StringRef>> &allowedDialects =
std::nullopt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableGen already generates all the suitable creation function, we should be able to remove this entirely and use the generated one instead (what is missing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableGen already generates all the suitable creation function, we should be able to remove this entirely and use the generated one instead (what is missing?)

I'm not sure I understand your suggestion, sorry. If you try to call createLowerGpuOpsToROCDLOpsPass (i.e., from the pass manager) in C++, prior to this PR, there was no way to specify allowedDialects, so this PR adds support for that (Unless I'm missing some TableGen functionality that I did not know about)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mehdi is suggesting that you remove https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Conversion/Passes.td#L627 and let tablegen generate the create* calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understood that tablegen was supposed to generate the same create* function as the handwritten one, thanks @fabianmcg for the clarification. I have pushed a commit to address this.

@fabianmcg
Copy link
Contributor

fabianmcg commented Sep 8, 2025

@pabloantoniom do you have an use case for this? Or is just an improvement? I'm wondering if it's the latter, because I'm thinking on working in removing* these passes and only keep convert-to-llvm.

*There would be a transition period where the pass names would be available as pipelines.

@pabloantoniom
Copy link
Contributor Author

@pabloantoniom do you have an use case for this? Or is just an improvement? I'm wondering if it's the latter, because I'm thinking on working in removing* these passes and only keep convert-to-llvm.

*There would be a transition period where the pass names would be available as pipelines.

Yes, there is an use case for this. You can check this downstream user.

Comment on lines 289 to 296
LowerGpuOpsToROCDLOpsPass() = default;
LowerGpuOpsToROCDLOpsPass(const std::string &chipset, unsigned indexBitwidth,
bool useBarePtrCallConv,
gpu::amd::Runtime runtime) {
LowerGpuOpsToROCDLOpsPass(ConvertGpuOpsToROCDLOpsOptions options)
: ConvertGpuOpsToROCDLOpsBase(options) {}
LowerGpuOpsToROCDLOpsPass(
const std::string &chipset, unsigned indexBitwidth,
bool useBarePtrCallConv, gpu::amd::Runtime runtime,
std::optional<llvm::SmallDenseSet<StringRef>> allowedDialects) {
if (this->chipset.getNumOccurrences() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need these? Can't we remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, thanks for the suggestion. By removing this we also match what LowerGpuOpsToNVVMOpsPass does so it should be easier for you if you want to do some cleanup later.

@fabianmcg fabianmcg self-requested a review September 8, 2025 16:25
Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preemptively blocking while these get addressed:

Please remove the username from the description:
https://discourse.llvm.org/t/forbidding-username-in-commits/86997

Fix description, and the PR title.

And see my other comment.

#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
#include "mlir/Pass/Pass.h"
#include "llvm/ADT/DenseSet.h"
#include <cstddef>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you're only removing something from the header, so I'm not sure why you need to add new includes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot to remove those after deleting the constructor in the .td. Only Pass is needed, since now TableGen is creating the create* function which uses mlir::Pass, but I have removed the include and added Pass to the namespace below instead.

@joker-eph
Copy link
Collaborator

@pabloantoniom do you have an use case for this? Or is just an improvement? I'm wondering if it's the latter, because I'm thinking on working in removing* these passes and only keep convert-to-llvm.
*There would be a transition period where the pass names would be available as pipelines.

Yes, there is an use case for this. You can check this downstream user.

I think the question behind the question is: what can you do with this pass that you can't do with convert-to-llvm and what would it take to make it possible to do with convert-to-llvm?

@krzysz00
Copy link
Contributor

krzysz00 commented Sep 8, 2025

I think the question behind the question is: what can you do with this pass that you can't do with convert-to-llvm and what would it take to make it possible to do with convert-to-llvm?

  1. The pass loads in target-specific patterns - amdgpu-to-rocdl and the gpu to rocdl patterns themselves. Those could maybe be run before general convert-to-llvm, but they depend on the LLVM conversion infrastructure. However, the AMDGPU dialect to ROCDL dialect conversions add entries to the LLVM type converter (for pointer address spaces attributes), and so those would want to be run at the same time as the rest of the LLVM conversion patterns ... which there's no mechanism for with convert-to-llvm
  2. This pass sets up other address space handling. There's a call to populateGpuMemorySpaceAttributeConversions in here, which maps memory spaces like #gpu.address_space<workgroup> to their correct platform-specific values. There's no generic mechanism that I know of for populating that mapping in a convert-to-llvm usage.
  3. There's the section that starts
    // Manually rewrite known block size attributes so the LLVMIR translation
    // infrastructure can pick them up.
    
    which compensates for limitations of the gpu-to-llvm rewrites being generic.

In short, convert-gpu-to-rocdl is a pass that does a bunch of non-trivial target-specific setup (and, in one case, post-processing, though maybe that's a bit of a hack) work and adds extra conversion patterns so that we're converting to AMDGPU LLVM, not generic LLVM.

If convert-to-llvm were set up in a way that would let us put this sort of setup on some entity in the context (ex. a target attribute) and that were plumbed through reliably, we wouldn't need this patt. For now, we do.

@krzysz00
Copy link
Contributor

krzysz00 commented Sep 8, 2025

Oh, and 4., it sets the data layout to the relevant AMDGPU string.

(or, to give the flippant answer, see LowerGpuOpsToROCDLOpsPass::runOnOperation() )

@fabianmcg
Copy link
Contributor

fabianmcg commented Sep 8, 2025

@krzysz00 almost all the underlying technical issues to support what you describe have been solved for a while:
https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm-target-attr.mlir#L5-L24

It's just I haven't had the time to add it for ROCDL, and there were a couple of lingering issues on the data layout side that prevented me from doing it, but some of those were solved here #145899 .

My question was directed more towards, do I need to be aware of some extra complication I need to deal when removing GPU to ROCDL, or was this patch mostly a NFC quality of life improvement.

@krzysz00
Copy link
Contributor

krzysz00 commented Sep 8, 2025

This reads as NFC to me

@pabloantoniom
Copy link
Contributor Author

Preemptively blocking while these get addressed:

Please remove the username from the description: https://discourse.llvm.org/t/forbidding-username-in-commits/86997

Fix description, and the PR title.

And see my other comment.

Good catch, thanks. Removed username from description, PR title is fine.

@pabloantoniom
Copy link
Contributor Author

@krzysz00 almost all the underlying technical issues to support what you describe have been solved for a while: https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm-target-attr.mlir#L5-L24

It's just I haven't had the time to add it for ROCDL, and there were a couple of lingering issues on the data layout side that prevented me from doing it, but some of those were solved here #145899 .

My question was directed more towards, do I need to be aware of some extra complication I need to deal when removing GPU to ROCDL, or was this patch mostly a NFC quality of life improvement.

In my opinion, my original commit was indeed NFC. However, after Mehdi's suggestion, I'm not sure anymore, since it's changing the interface, thus forcing users of createLowerGpuOpsToROCDLOpsPass (with 4 arguments) to use createConvertGpuOpsToROCDLOps with 1 argument (ConvertGpuOpsToROCDLOpsOptions). This also has the good thing of making the ROCDL pass consistent with the NVVM one, as the latter does not have the constructor in TableGen, whereas the former had it prior this commit. I will make this clear in the commit description to make everyone aware.

Coming back to the NFC discussion, I guess it depends on where you draw the boundary of NFC. I looked up here, but it does not give enough context, maybe an opportunity to improve on this?

@pabloantoniom pabloantoniom changed the title [mlir][gpu] GPUToROCDL: Add C++ argument to populate allowedDialects [mlir][gpu] Refactor GpuOpsToROCDLOps pass interface Sep 9, 2025
@pabloantoniom
Copy link
Contributor Author

Thank you all for the reviews. I have updated PR title and description to make it more consistent with the latest changes. Hope this is compatible with what you were expecting @fabianmcg

Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the cleanup!

@joker-eph
Copy link
Collaborator

In my opinion, my original commit was indeed NFC. However, after Mehdi's suggestion, I'm not sure anymore, since it's changing the interface,

NFC is about the compiler behavior, not the API changes. Our APIs change all the time, but if there is no test change, it better be NFC.

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM here too

@pabloantoniom pabloantoniom changed the title [mlir][gpu] Refactor GpuOpsToROCDLOps pass interface [mlir][gpu] Refactor GpuOpsToROCDLOps pass interface (NFC) Sep 10, 2025
@pabloantoniom pabloantoniom merged commit dd04668 into llvm:main Sep 10, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants