-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang] support fir.alloca operations inside of omp reduction ops #84952
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-codegen Author: Tom Eccles (tblah) ChangesAdvise to place the alloca at the start of the first block of whichever region (init or combiner) we are currently inside. It probably isn't safe to put an alloca inside of a combiner region because this will be executed multiple times. But that would be a bug to fix in Lower/OpenMP.cpp, not here. OpenMP array reductions 1/6 Full diff: https://github.com/llvm/llvm-project/pull/84952.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/Builder/FIRBuilder.cpp b/flang/lib/Optimizer/Builder/FIRBuilder.cpp
index 12da7412888a3b..f7327a299d9a5e 100644
--- a/flang/lib/Optimizer/Builder/FIRBuilder.cpp
+++ b/flang/lib/Optimizer/Builder/FIRBuilder.cpp
@@ -208,6 +208,8 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
.getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>()) {
return ompOutlineableIface.getAllocaBlock();
}
+ if (mlir::isa<mlir::omp::ReductionDeclareOp>(getRegion().getParentOp()))
+ return &getRegion().front();
if (auto accRecipeIface =
getRegion().getParentOfType<mlir::acc::RecipeInterface>()) {
return accRecipeIface.getAllocaBlock(getRegion());
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index f81a08388da722..123eb6e4e6a255 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -410,8 +410,15 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern<FromOp> {
mlir::ConversionPatternRewriter &rewriter) const {
auto thisPt = rewriter.saveInsertionPoint();
mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
- mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
- rewriter.setInsertionPointToStart(insertBlock);
+ if (mlir::isa<mlir::omp::ReductionDeclareOp>(parentOp)) {
+ // ReductionDeclareOp has multiple child regions. We want to get the first
+ // block of whichever of those regions we are currently in
+ mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent();
+ rewriter.setInsertionPointToStart(&parentRegion->front());
+ } else {
+ mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
+ rewriter.setInsertionPointToStart(insertBlock);
+ }
auto size = genI32Constant(loc, rewriter, 1);
unsigned allocaAs = getAllocaAddressSpace(rewriter);
unsigned programAs = getProgramAddressSpace(rewriter);
|
if (mlir::isa<mlir::omp::ReductionDeclareOp>(parentOp)) { | ||
// ReductionDeclareOp has multiple child regions. We want to get the first | ||
// block of whichever of those regions we are currently in | ||
mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent(); | ||
rewriter.setInsertionPointToStart(&parentRegion->front()); | ||
} else { | ||
mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp); | ||
rewriter.setInsertionPointToStart(insertBlock); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not sufficient to modify getBlockForAllocaInsert
to handle ReductionDeclareOp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function doesn't have access to the rewriter. I could have added it as an argument, but I felt that it reads a bit weird there because it isn't obvious out of context what the conditions on the rewriter are.
The awkwardness comes from the use of multiple regions in the reduction declare op. We need to make sure the alloca ends up in the same region that we are currently inserting into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. Can we add a test?
…4952) Advise to place the alloca at the start of the first block of whichever region (init or combiner) we are currently inside. It probably isn't safe to put an alloca inside of a combiner region because this will be executed multiple times. But that would be a bug to fix in Lower/OpenMP.cpp, not here.
636566e
to
595cea7
Compare
Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 Previous PR: #84952 Next PR: #84954 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 Previous PR: llvm#84952 Next PR: llvm#84954 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Advise to place the alloca at the start of the first block of whichever region (init or combiner) we are currently inside.
It probably isn't safe to put an alloca inside of a combiner region because this will be executed multiple times. But that would be a bug to fix in Lower/OpenMP.cpp, not here.
OpenMP array reductions 1/6
Next PR: #84953