-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][gpu] Refactor GpuOpsToROCDLOps pass interface (NFC) #157402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][gpu] Refactor GpuOpsToROCDLOps pass interface (NFC) #157402
Conversation
The `convert-gpu-to-rocdl` pass provides the option `allowed-dialects`, which allows users to control which dialects can be used to populate conversions. This PR adds a C++ argument to createLowerGpuOpsToROCDLOpsPass, so that this option can also be controlled programatically when creating the pass.
@llvm/pr-subscribers-mlir-gpu @llvm/pr-subscribers-mlir Author: Pablo Antonio Martinez (pabloantoniom) ChangesThe This PR adds a C++ argument to createLowerGpuOpsToROCDLOpsPass, so that this option can also be controlled programatically when creating the pass. cc: @dhernandez0 Full diff: https://github.com/llvm/llvm-project/pull/157402.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
index 291b809071ce9..a6099bde2a70e 100644
--- a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
+++ b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
@@ -10,6 +10,8 @@
#include "mlir/Conversion/GPUToROCDL/Runtimes.h"
#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
+#include "llvm/ADT/DenseSet.h"
+#include <cstddef>
#include <memory>
namespace mlir {
@@ -50,7 +52,9 @@ createLowerGpuOpsToROCDLOpsPass(
const std::string &chipset = "gfx900",
unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout,
bool useBarePtrCallConv = false,
- gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown);
+ gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown,
+ const std::optional<llvm::SmallDenseSet<llvm::StringRef>> &allowedDialects =
+ std::nullopt);
} // namespace mlir
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 807d1f52ee69b..965089df0303e 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -288,9 +288,10 @@ struct GPUShuffleOpLowering : public ConvertOpToLLVMPattern<gpu::ShuffleOp> {
struct LowerGpuOpsToROCDLOpsPass final
: public impl::ConvertGpuOpsToROCDLOpsBase<LowerGpuOpsToROCDLOpsPass> {
LowerGpuOpsToROCDLOpsPass() = default;
- LowerGpuOpsToROCDLOpsPass(const std::string &chipset, unsigned indexBitwidth,
- bool useBarePtrCallConv,
- gpu::amd::Runtime runtime) {
+ LowerGpuOpsToROCDLOpsPass(
+ const std::string &chipset, unsigned indexBitwidth,
+ bool useBarePtrCallConv, gpu::amd::Runtime runtime,
+ std::optional<llvm::SmallDenseSet<StringRef>> allowedDialects) {
if (this->chipset.getNumOccurrences() == 0)
this->chipset = chipset;
if (this->indexBitwidth.getNumOccurrences() == 0)
@@ -299,6 +300,12 @@ struct LowerGpuOpsToROCDLOpsPass final
this->useBarePtrCallConv = useBarePtrCallConv;
if (this->runtime.getNumOccurrences() == 0)
this->runtime = runtime;
+ if (this->allowedDialects.getNumOccurrences() == 0 &&
+ allowedDialects.has_value()) {
+ for (auto &str : allowedDialects.value()) {
+ this->allowedDialects.push_back(str.str());
+ }
+ }
}
void getDependentDialects(DialectRegistry ®istry) const override {
@@ -501,10 +508,10 @@ void mlir::populateGpuToROCDLConversionPatterns(
}
std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
-mlir::createLowerGpuOpsToROCDLOpsPass(const std::string &chipset,
- unsigned indexBitwidth,
- bool useBarePtrCallConv,
- gpu::amd::Runtime runtime) {
+mlir::createLowerGpuOpsToROCDLOpsPass(
+ const std::string &chipset, unsigned indexBitwidth, bool useBarePtrCallConv,
+ gpu::amd::Runtime runtime,
+ const std::optional<llvm::SmallDenseSet<StringRef>> &allowedDialects) {
return std::make_unique<LowerGpuOpsToROCDLOpsPass>(
- chipset, indexBitwidth, useBarePtrCallConv, runtime);
+ chipset, indexBitwidth, useBarePtrCallConv, runtime, allowedDialects);
}
|
gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown); | ||
gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown, | ||
const std::optional<llvm::SmallDenseSet<llvm::StringRef>> &allowedDialects = | ||
std::nullopt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableGen already generates all the suitable creation function, we should be able to remove this entirely and use the generated one instead (what is missing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableGen already generates all the suitable creation function, we should be able to remove this entirely and use the generated one instead (what is missing?)
I'm not sure I understand your suggestion, sorry. If you try to call createLowerGpuOpsToROCDLOpsPass
(i.e., from the pass manager) in C++, prior to this PR, there was no way to specify allowedDialects
, so this PR adds support for that (Unless I'm missing some TableGen functionality that I did not know about)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mehdi is suggesting that you remove https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Conversion/Passes.td#L627 and let tablegen generate the create* calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I understood that tablegen was supposed to generate the same create* function as the handwritten one, thanks @fabianmcg for the clarification. I have pushed a commit to address this.
@pabloantoniom do you have an use case for this? Or is just an improvement? I'm wondering if it's the latter, because I'm thinking on working in removing* these passes and only keep convert-to-llvm. *There would be a transition period where the pass names would be available as pipelines. |
…legen and delete handwritten version
Yes, there is an use case for this. You can check this downstream user. |
LowerGpuOpsToROCDLOpsPass() = default; | ||
LowerGpuOpsToROCDLOpsPass(const std::string &chipset, unsigned indexBitwidth, | ||
bool useBarePtrCallConv, | ||
gpu::amd::Runtime runtime) { | ||
LowerGpuOpsToROCDLOpsPass(ConvertGpuOpsToROCDLOpsOptions options) | ||
: ConvertGpuOpsToROCDLOpsBase(options) {} | ||
LowerGpuOpsToROCDLOpsPass( | ||
const std::string &chipset, unsigned indexBitwidth, | ||
bool useBarePtrCallConv, gpu::amd::Runtime runtime, | ||
std::optional<llvm::SmallDenseSet<StringRef>> allowedDialects) { | ||
if (this->chipset.getNumOccurrences() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need these? Can't we remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, thanks for the suggestion. By removing this we also match what LowerGpuOpsToNVVMOpsPass
does so it should be easier for you if you want to do some cleanup later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preemptively blocking while these get addressed:
Please remove the username from the description:
https://discourse.llvm.org/t/forbidding-username-in-commits/86997
Fix description, and the PR title.
And see my other comment.
#include "mlir/Conversion/LLVMCommon/LoweringOptions.h" | ||
#include "mlir/Pass/Pass.h" | ||
#include "llvm/ADT/DenseSet.h" | ||
#include <cstddef> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you're only removing something from the header, so I'm not sure why you need to add new includes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot to remove those after deleting the constructor in the .td. Only Pass is needed, since now TableGen is creating the create* function which uses mlir::Pass
, but I have removed the include and added Pass to the namespace below instead.
I think the question behind the question is: what can you do with this pass that you can't do with convert-to-llvm and what would it take to make it possible to do with convert-to-llvm? |
In short, convert-gpu-to-rocdl is a pass that does a bunch of non-trivial target-specific setup (and, in one case, post-processing, though maybe that's a bit of a hack) work and adds extra conversion patterns so that we're converting to AMDGPU LLVM, not generic LLVM. If |
Oh, and 4., it sets the data layout to the relevant AMDGPU string. (or, to give the flippant answer, see LowerGpuOpsToROCDLOpsPass::runOnOperation() ) |
@krzysz00 almost all the underlying technical issues to support what you describe have been solved for a while: It's just I haven't had the time to add it for ROCDL, and there were a couple of lingering issues on the data layout side that prevented me from doing it, but some of those were solved here #145899 . My question was directed more towards, do I need to be aware of some extra complication I need to deal when removing GPU to ROCDL, or was this patch mostly a NFC quality of life improvement. |
This reads as NFC to me |
Good catch, thanks. Removed username from description, PR title is fine. |
In my opinion, my original commit was indeed NFC. However, after Mehdi's suggestion, I'm not sure anymore, since it's changing the interface, thus forcing users of Coming back to the NFC discussion, I guess it depends on where you draw the boundary of NFC. I looked up here, but it does not give enough context, maybe an opportunity to improve on this? |
Thank you all for the reviews. I have updated PR title and description to make it more consistent with the latest changes. Hope this is compatible with what you were expecting @fabianmcg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the cleanup!
NFC is about the compiler behavior, not the API changes. Our APIs change all the time, but if there is no test change, it better be NFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM here too
This PR deletes the
createLowerGpuOpsToROCDLOpsPass
constructor fromthe .td file, making the
createConvertGpuOpsToROCDLOps
pass available tousers. This has the following effects:
createLowerGpuOpsToROCDLOpsPass
is not available anymore. Instead,createConvertGpuOpsToROCDLOps
should be used. This makes the interfaceconsistent with ConvertGpuOpsToNVVMOps.
To call
createConvertGpuOpsToROCDLOps
, the options must be passedvia ConvertGpuOpsToROCDLOpsOptions. This has the side effect of
making the
allowed-dialects
option available, which was notaccessible via C++ before.