-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang] Minloc elemental intrinsic lowering #74828
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: David Green (davemgreen) ChangesCurrently the lowering of a minloc intrinsic with a mask will look something like
The elemental will be expanded into a temporary+loop, the minloc into a This patch attempt to do that, adding a hlfir version of the expansion code It is currently added as port of the OptimizedBufferizationPass. I originally Patch is 148.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74828.diff 15 Files Affected:
diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index f4933785a8caa6..1f5bc42c43e65c 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -458,6 +458,32 @@ def hlfir_MinvalOp : hlfir_Op<"minval", [AttrSizedOperandSegments,
let hasVerifier = 1;
}
+def hlfir_MinlocOp : hlfir_Op<"minloc", [AttrSizedOperandSegments,
+ DeclareOpInterfaceMethods<ArithFastMathInterface>,
+ DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {
+ let summary = "MINLOC transformational intrinsic";
+ let description = [{
+ Minlocs of an array.
+ }];
+
+ let arguments = (ins
+ AnyFortranArrayObject:$array,
+ Optional<AnyIntegerType>:$dim,
+ Optional<AnyFortranLogicalOrI1ArrayObject>:$mask,
+ Optional<Type<AnyLogicalLike.predicate>>:$back,
+ DefaultValuedAttr<Arith_FastMathAttr,
+ "::mlir::arith::FastMathFlags::none">:$fastmath
+ );
+
+ let results = (outs AnyFortranValue);
+
+ let assemblyFormat = [{
+ $array (`dim` $dim^)? (`mask` $mask^)? (`back` $back^)? attr-dict `:` functional-type(operands, results)
+ }];
+
+ let hasVerifier = 1;
+}
+
def hlfir_ProductOp : hlfir_Op<"product", [AttrSizedOperandSegments,
DeclareOpInterfaceMethods<ArithFastMathInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {
diff --git a/flang/include/flang/Optimizer/Support/Utils.h b/flang/include/flang/Optimizer/Support/Utils.h
index 34c8e79173bcd4..93caa8b23d320c 100644
--- a/flang/include/flang/Optimizer/Support/Utils.h
+++ b/flang/include/flang/Optimizer/Support/Utils.h
@@ -133,6 +133,22 @@ inline void intrinsicTypeTODO(fir::FirOpBuilder &builder, mlir::Type type,
fir::numericMlirTypeToFortran(builder, type, loc, intrinsicName) +
" in " + intrinsicName);
}
+
+using MinlocBodyOpGeneratorTy = llvm::function_ref<mlir::Value(
+ fir::FirOpBuilder &, mlir::Location, const mlir::Type &, mlir::Value,
+ mlir::Value, mlir::Value, const llvm::SmallVectorImpl<mlir::Value> &)>;
+using InitValGeneratorTy = llvm::function_ref<mlir::Value(
+ fir::FirOpBuilder &, mlir::Location, const mlir::Type &)>;
+
+// Produces a loop nest for a Minloc intrinsic.
+void genMinlocReductionLoop(fir::FirOpBuilder &builder, mlir::Value array,
+ InitValGeneratorTy initVal,
+ MinlocBodyOpGeneratorTy genBody, unsigned rank,
+ mlir::Type elementType, mlir::Location loc,
+ mlir::Type maskElemType, mlir::Value resultArr,
+ bool maskMayBeLogicalScalar);
+
+
} // namespace fir
#endif // FORTRAN_OPTIMIZER_SUPPORT_UTILS_H
diff --git a/flang/lib/Lower/HlfirIntrinsics.cpp b/flang/lib/Lower/HlfirIntrinsics.cpp
index 9f764b61425226..6e5ba92bee86a7 100644
--- a/flang/lib/Lower/HlfirIntrinsics.cpp
+++ b/flang/lib/Lower/HlfirIntrinsics.cpp
@@ -93,6 +93,19 @@ using HlfirMinvalLowering = HlfirReductionIntrinsic<hlfir::MinvalOp, true>;
using HlfirAnyLowering = HlfirReductionIntrinsic<hlfir::AnyOp, false>;
using HlfirAllLowering = HlfirReductionIntrinsic<hlfir::AllOp, false>;
+template <typename OP>
+class HlfirMinMaxLocIntrinsic : public HlfirTransformationalIntrinsic {
+public:
+ using HlfirTransformationalIntrinsic::HlfirTransformationalIntrinsic;
+
+protected:
+ mlir::Value
+ lowerImpl(const Fortran::lower::PreparedActualArguments &loweredActuals,
+ const fir::IntrinsicArgumentLoweringRules *argLowering,
+ mlir::Type stmtResultType) override;
+};
+using HlfirMinlocLowering = HlfirMinMaxLocIntrinsic<hlfir::MinlocOp>;
+
template <typename OP>
class HlfirProductIntrinsic : public HlfirTransformationalIntrinsic {
public:
@@ -180,6 +193,31 @@ mlir::Value HlfirTransformationalIntrinsic::loadBoxAddress(
return boxOrAbsent;
}
+static mlir::Value loadOptionalValue(
+ mlir::Location loc, fir::FirOpBuilder &builder,
+ const std::optional<Fortran::lower::PreparedActualArgument> &arg,
+ hlfir::Entity actual) {
+ if (!arg->handleDynamicOptional())
+ return hlfir::loadTrivialScalar(loc, builder, actual);
+
+ mlir::Value isPresent = arg->getIsPresent();
+ mlir::Type eleType = hlfir::getFortranElementType(actual.getType());
+ return builder
+ .genIfOp(loc, {eleType}, isPresent,
+ /*withElseRegion=*/true)
+ .genThen([&]() {
+ assert(actual.isScalar() && fir::isa_trivial(eleType) &&
+ "must be a numerical or logical scalar");
+ hlfir::Entity val = hlfir::loadTrivialScalar(loc, builder, actual);
+ builder.create<fir::ResultOp>(loc, val);
+ })
+ .genElse([&]() {
+ mlir::Value zero = fir::factory::createZeroValue(builder, loc, eleType);
+ builder.create<fir::ResultOp>(loc, zero);
+ })
+ .getResults()[0];
+}
+
llvm::SmallVector<mlir::Value> HlfirTransformationalIntrinsic::getOperandVector(
const Fortran::lower::PreparedActualArguments &loweredActuals,
const fir::IntrinsicArgumentLoweringRules *argLowering) {
@@ -206,6 +244,9 @@ llvm::SmallVector<mlir::Value> HlfirTransformationalIntrinsic::getOperandVector(
else if (!argRules.handleDynamicOptional &&
argRules.lowerAs != fir::LowerIntrinsicArgAs::Inquired)
valArg = hlfir::derefPointersAndAllocatables(loc, builder, actual);
+ else if (argRules.handleDynamicOptional &&
+ argRules.lowerAs == fir::LowerIntrinsicArgAs::Value)
+ valArg = loadOptionalValue(loc, builder, arg, actual);
else if (argRules.handleDynamicOptional)
TODO(loc, "hlfir transformational intrinsic dynamically optional "
"argument without box lowering");
@@ -260,6 +301,27 @@ mlir::Value HlfirReductionIntrinsic<OP, HAS_MASK>::lowerImpl(
return op;
}
+template <typename OP>
+mlir::Value HlfirMinMaxLocIntrinsic<OP>::lowerImpl(
+ const Fortran::lower::PreparedActualArguments &loweredActuals,
+ const fir::IntrinsicArgumentLoweringRules *argLowering,
+ mlir::Type stmtResultType) {
+ auto operands = getOperandVector(loweredActuals, argLowering);
+ mlir::Value array = operands[0];
+ mlir::Value dim = operands[1];
+ mlir::Value mask = operands[2];
+ mlir::Value back = operands[4];
+ // dim, mask and back can be NULL if these arguments are not given.
+ if (dim)
+ dim = hlfir::loadTrivialScalar(loc, builder, hlfir::Entity{dim});
+ if (back)
+ back = hlfir::loadTrivialScalar(loc, builder, hlfir::Entity{back});
+
+ mlir::Type resultTy = computeResultType(array, stmtResultType);
+
+ return createOp<OP>(resultTy, array, dim, mask, back);
+}
+
template <typename OP>
mlir::Value HlfirProductIntrinsic<OP>::lowerImpl(
const Fortran::lower::PreparedActualArguments &loweredActuals,
@@ -364,6 +426,9 @@ std::optional<hlfir::EntityWithAttributes> Fortran::lower::lowerHlfirIntrinsic(
if (name == "minval")
return HlfirMinvalLowering{builder, loc}.lower(loweredActuals, argLowering,
stmtResultType);
+ if (name == "minloc")
+ return HlfirMinlocLowering{builder, loc}.lower(loweredActuals, argLowering,
+ stmtResultType);
if (mlir::isa<fir::CharacterType>(stmtResultType)) {
if (name == "min")
return HlfirCharExtremumLowering{builder, loc,
diff --git a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
index a276e5fc65dd59..94a2213306bfd5 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
@@ -870,6 +870,81 @@ void hlfir::MinvalOp::getEffects(
getIntrinsicEffects(getOperation(), effects);
}
+//===----------------------------------------------------------------------===//
+// MinlocOp
+//===----------------------------------------------------------------------===//
+
+mlir::LogicalResult hlfir::MinlocOp::verify() {
+ mlir::Operation *op = getOperation();
+
+ auto results = op->getResultTypes();
+ assert(results.size() == 1);
+ mlir::Value array = getArray();
+ mlir::Value dim = getDim();
+ mlir::Value mask = getMask();
+
+ fir::SequenceType arrayTy =
+ hlfir::getFortranElementOrSequenceType(array.getType())
+ .cast<fir::SequenceType>();
+ llvm::ArrayRef<int64_t> arrayShape = arrayTy.getShape();
+
+ if (mask) {
+ fir::SequenceType maskSeq =
+ hlfir::getFortranElementOrSequenceType(mask.getType())
+ .dyn_cast<fir::SequenceType>();
+ llvm::ArrayRef<int64_t> maskShape;
+
+ if (maskSeq)
+ maskShape = maskSeq.getShape();
+
+ if (!maskShape.empty()) {
+ if (maskShape.size() != arrayShape.size())
+ return emitWarning("MASK must be conformable to ARRAY");
+ static_assert(fir::SequenceType::getUnknownExtent() ==
+ hlfir::ExprType::getUnknownExtent());
+ constexpr int64_t unknownExtent = fir::SequenceType::getUnknownExtent();
+ for (std::size_t i = 0; i < arrayShape.size(); ++i) {
+ int64_t arrayExtent = arrayShape[i];
+ int64_t maskExtent = maskShape[i];
+ if ((arrayExtent != maskExtent) && (arrayExtent != unknownExtent) &&
+ (maskExtent != unknownExtent))
+ return emitWarning("MASK must be conformable to ARRAY");
+ }
+ }
+ }
+
+ mlir::Type resultType = results[0];
+ if (dim && arrayShape.size() == 1) {
+ if (!fir::isa_integer(resultType))
+ return emitOpError("result must be scalar integer");
+ } else if (auto resultExpr =
+ mlir::dyn_cast_or_null<hlfir::ExprType>(resultType)) {
+ if (!resultExpr.isArray())
+ return emitOpError("result must be an array");
+
+ if (!fir::isa_integer(resultExpr.getEleTy()))
+ return emitOpError("result must have integer elements");
+
+ llvm::ArrayRef<int64_t> resultShape = resultExpr.getShape();
+ // With dim the result has rank n-1
+ if (dim && resultShape.size() != (arrayShape.size() - 1))
+ return emitOpError("result rank must be one less than ARRAY");
+ // With dim the result has rank n
+ if (!dim && resultShape.size() != 1)
+ return emitOpError("result rank must be 1");
+ } else {
+ return emitOpError("result must be of numerical expr type");
+ }
+ return mlir::success();
+}
+
+void hlfir::MinlocOp::getEffects(
+ llvm::SmallVectorImpl<
+ mlir::SideEffects::EffectInstance<mlir::MemoryEffects::Effect>>
+ &effects) {
+ getIntrinsicEffects(getOperation(), effects);
+}
+
//===----------------------------------------------------------------------===//
// SetLengthOp
//===----------------------------------------------------------------------===//
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIRIntrinsics.cpp b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIRIntrinsics.cpp
index f2628fcb970bc4..bfebe26fe1d532 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIRIntrinsics.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIRIntrinsics.cpp
@@ -201,6 +201,23 @@ class HlfirReductionIntrinsicConversion : public HlfirIntrinsicConversion<OP> {
return lowerArguments(operation, inArgs, rewriter, argLowering);
};
+ auto buildMinMaxLocArgs(OP operation, mlir::Type i32, mlir::Type logicalType,
+ mlir::PatternRewriter &rewriter, std::string opName,
+ fir::FirOpBuilder builder) const {
+ llvm::SmallVector<IntrinsicArgument, 3> inArgs;
+ inArgs.push_back({operation.getArray(), operation.getArray().getType()});
+ inArgs.push_back({operation.getDim(), i32});
+ inArgs.push_back({operation.getMask(), logicalType});
+ mlir::Type T = hlfir::getFortranElementType(operation.getType());
+ unsigned width = T.cast<mlir::IntegerType>().getWidth();
+ mlir::Value kind =
+ builder.createIntegerConstant(operation->getLoc(), i32, width / 8);
+ inArgs.push_back({kind, i32});
+ inArgs.push_back({operation.getBack(), i32});
+ auto *argLowering = fir::getIntrinsicArgumentLowering(opName);
+ return lowerArguments(operation, inArgs, rewriter, argLowering);
+ };
+
auto buildLogicalArgs(OP operation, mlir::Type i32, mlir::Type logicalType,
mlir::PatternRewriter &rewriter,
std::string opName) const {
@@ -224,6 +241,8 @@ class HlfirReductionIntrinsicConversion : public HlfirIntrinsicConversion<OP> {
opName = "maxval";
} else if constexpr (std::is_same_v<OP, hlfir::MinvalOp>) {
opName = "minval";
+ } else if constexpr (std::is_same_v<OP, hlfir::MinlocOp>) {
+ opName = "minloc";
} else if constexpr (std::is_same_v<OP, hlfir::AnyOp>) {
opName = "any";
} else if constexpr (std::is_same_v<OP, hlfir::AllOp>) {
@@ -246,6 +265,9 @@ class HlfirReductionIntrinsicConversion : public HlfirIntrinsicConversion<OP> {
std::is_same_v<OP, hlfir::MaxvalOp> ||
std::is_same_v<OP, hlfir::MinvalOp>) {
args = buildNumericalArgs(operation, i32, logicalType, rewriter, opName);
+ } else if constexpr (std::is_same_v<OP, hlfir::MinlocOp>) {
+ args = buildMinMaxLocArgs(operation, i32, logicalType, rewriter, opName,
+ builder);
} else {
args = buildLogicalArgs(operation, i32, logicalType, rewriter, opName);
}
@@ -269,6 +291,8 @@ using MaxvalOpConversion = HlfirReductionIntrinsicConversion<hlfir::MaxvalOp>;
using MinvalOpConversion = HlfirReductionIntrinsicConversion<hlfir::MinvalOp>;
+using MinlocOpConversion = HlfirReductionIntrinsicConversion<hlfir::MinlocOp>;
+
using AnyOpConversion = HlfirReductionIntrinsicConversion<hlfir::AnyOp>;
using AllOpConversion = HlfirReductionIntrinsicConversion<hlfir::AllOp>;
@@ -441,12 +465,12 @@ class LowerHLFIRIntrinsics
mlir::ModuleOp module = this->getOperation();
mlir::MLIRContext *context = &getContext();
mlir::RewritePatternSet patterns(context);
- patterns
- .insert<MatmulOpConversion, MatmulTransposeOpConversion,
- AllOpConversion, AnyOpConversion, SumOpConversion,
- ProductOpConversion, TransposeOpConversion, CountOpConversion,
- DotProductOpConversion, MaxvalOpConversion, MinvalOpConversion>(
- context);
+ patterns.insert<MatmulOpConversion, MatmulTransposeOpConversion,
+ AllOpConversion, AnyOpConversion, SumOpConversion,
+ ProductOpConversion, TransposeOpConversion,
+ CountOpConversion, DotProductOpConversion,
+ MaxvalOpConversion, MinvalOpConversion, MinlocOpConversion>(
+ context);
mlir::ConversionTarget target(*context);
target.addLegalDialect<mlir::BuiltinDialect, mlir::arith::ArithDialect,
mlir::func::FuncDialect, fir::FIROpsDialect,
@@ -454,7 +478,7 @@ class LowerHLFIRIntrinsics
target.addIllegalOp<hlfir::MatmulOp, hlfir::MatmulTransposeOp, hlfir::SumOp,
hlfir::ProductOp, hlfir::TransposeOp, hlfir::AnyOp,
hlfir::AllOp, hlfir::DotProductOp, hlfir::CountOp,
- hlfir::MaxvalOp, hlfir::MinvalOp>();
+ hlfir::MaxvalOp, hlfir::MinvalOp, hlfir::MinlocOp>();
target.markUnknownOpDynamicallyLegal(
[](mlir::Operation *) { return true; });
if (mlir::failed(
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 7abfa20493c736..218ddd2a6a7b7e 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -20,6 +20,7 @@
#include "flang/Optimizer/HLFIR/HLFIRDialect.h"
#include "flang/Optimizer/HLFIR/HLFIROps.h"
#include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/Support/Utils.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/IR/Dominance.h"
#include "mlir/IR/PatternMatch.h"
@@ -98,7 +99,8 @@ class ElementalAssignBufferization
/// the same block. If any operations with unknown effects are found,
/// std::nullopt is returned
static std::optional<mlir::SmallVector<mlir::MemoryEffects::EffectInstance>>
-getEffectsBetween(mlir::Operation *start, mlir::Operation *end) {
+getEffectsBetween(mlir::Operation *start, mlir::Operation *end,
+ mlir::Operation *ignoring) {
mlir::SmallVector<mlir::MemoryEffects::EffectInstance> ret;
if (start == end)
return ret;
@@ -108,6 +110,10 @@ getEffectsBetween(mlir::Operation *start, mlir::Operation *end) {
mlir::Operation *nextOp = start;
while (nextOp && nextOp != end) {
+ if (nextOp == ignoring) {
+ nextOp = nextOp->getNextNode();
+ continue;
+ }
std::optional<mlir::SmallVector<mlir::MemoryEffects::EffectInstance>>
effects = mlir::getEffectsRecursively(nextOp);
if (!effects)
@@ -293,80 +299,10 @@ static bool areIdenticalOrDisjointSlices(mlir::Value ref1, mlir::Value ref2) {
return false;
}
-std::optional<ElementalAssignBufferization::MatchInfo>
-ElementalAssignBufferization::findMatch(hlfir::ElementalOp elemental) {
- mlir::Operation::user_range users = elemental->getUsers();
- // the only uses of the elemental should be the assignment and the destroy
- if (std::distance(users.begin(), users.end()) != 2) {
- LLVM_DEBUG(llvm::dbgs() << "Too many uses of the elemental\n");
- return std::nullopt;
- }
-
- // If the ElementalOp must produce a temporary (e.g. for
- // finalization purposes), then we cannot inline it.
- if (hlfir::elementalOpMustProduceTemp(elemental)) {
- LLVM_DEBUG(llvm::dbgs() << "ElementalOp must produce a temp\n");
- return std::nullopt;
- }
-
- MatchInfo match;
- for (mlir::Operation *user : users)
- mlir::TypeSwitch<mlir::Operation *, void>(user)
- .Case([&](hlfir::AssignOp op) { match.assign = op; })
- .Case([&](hlfir::DestroyOp op) { match.destroy = op; });
-
- if (!match.assign || !match.destroy) {
- LLVM_DEBUG(llvm::dbgs() << "Couldn't find assign or destroy\n");
- return std::nullopt;
- }
-
- // the array is what the elemental is assigned into
- // TODO: this could be extended to also allow hlfir.expr by first bufferizing
- // the incoming expression
- match.array = match.assign.getLhs();
- mlir::Type arrayType = mlir::dyn_cast<fir::SequenceType>(
- fir::unwrapPassByRefType(match.array.getType()));
- if (!arrayType)
- return std::nullopt;
-
- // require that the array elements are trivial
- // TODO: this is just to make the pass easier to think about. Not an inherent
- // limitation
- mlir::Type eleTy = hlfir::getFortranElementType(arrayType);
- if (!fir::isa_trivial(eleTy))
- return std::nullopt;
-
- // the array must have the same shape as the elemental. CSE should have
- // deduplicated the fir.shape operations where they are provably the same
- // so we just have to check for the same ssa value
- // TODO: add more ways of getting the shape of the array
- mlir::Value arrayShape;
- if (match.array.getDefiningOp())
- arrayShape =
- mlir::TypeSwitch<mlir::Operation *, mlir::Value>(
- match.array.getDefiningOp())
- .Case([](hlfir::DesignateOp designate) {
- return designate.getShape();
- })
- .Case([](hlfir::DeclareOp declare) { return declare.getShape(); })
- .Default([](mlir::Operation *) { return mlir::Value{}; });
- if (!arrayShape) {
- LLVM_DEBUG(llvm::dbgs() << "Can't get shape of " << match.array << " at "
- << elemental->getLoc() << "\n");
- return std::nullopt;
- }
- if (arrayShape != elemental.getShape()) {
- // f2018 10.2.1.2 (3) requires the lhs and rhs of an assignment to be
- // conformable unless the lhs is an allocatable array. In HLFIR we can
- // see this from the presence or absence of the realloc attribute on
- // hlfir.assign. If it is not a realloc assignment, we can trust that
- // the shapes do conform
- if (match.assign.getRealloc())
- return std::nullopt;
- }
-
- // the transformation wants to apply the elemental in a do-loop at the
- // hlfir.assign, check there are no effects which make this unsafe
+static bool checkForElementalEffectsBetween(hlfir::ElementalOp elemental,
+ ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Really nice work. Just two comments
|
||
for (unsigned int i = 0; i < rank; ++i) { | ||
mlir::Value index = builder.createIntegerConstant(loc, idxTy, i); | ||
mlir::Value resultElemAddr = builder.create<fir::CoordinateOp>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use hlfir::DesignateOp for indexing in hlfir code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still one use of Coordicate left from the shared code. Do you think it is worth trying to change it over too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I guess not. @jeanPerier do you have a strong feeling on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if CoordinateOp works, that is because we already bufferized the hlfir.expr. Which I guess makes sense in OptimizedBufferization, but not if you follow Jean's suggestion to move this to SimplifyHLFIRIntrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a lot of duplication rewriting it to an hlfir.elemental in in SimplifyHLFIRIntrinsics?
I would favor doing it there because I think having it as an hlfir.elemental is more powerful in general since it can then be composed if it is used as an operand to some other array operation.
Think: array1 = offset + minloc(array2, dim=dim)
That is the direction I wanted to go to implement https://github.com/llvm/llvm-project/blob/main/flang/docs/ArrayComposition.md and avoid array temporaries as much as possible.
That being said, I think only the case with |
Yeah the implementation in SimplifyHLFIRIntrinsics was the same code, just called from a different place (the only real differences was it needed to be more careful about how it was called, because of it using applyFullConversion). With it in OptimizedBufferization it should be after InlineElementals, so should get the benefits of inlining elementals into each other at least. It sounds like for cases that are reductions like |
bbdb55b
to
d3e4951
Compare
2f9bc65
to
381097d
Compare
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
Outdated
Show resolved
Hide resolved
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
Outdated
Show resolved
Hide resolved
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've built and tested this patch successfully. I ran both check-flang and NVIDIA's internal tests, and all looks good.
I don't understand the code very well, so someone who understands the code should approve.
Thanks. That's good to hear. I'll try and get back to addressing the review comments soon, some of the comments here need more work to not produce worse performance. I was hoping to focus on #75774 first. |
This adds a ReductionElementalConversion transform to OptimizedBufferizationPass, taking hlfir::count(hlfir::elemental) and generating the inline loop to perform the count of true elements. This lets us generate a single loop instead of ending up as two plus a temporary. This is currently part of OptimizedBufferization, similar to llvm#74828. I attempted to move it to LowerHLFIRIntrinsics to make it part of the existing lowering, but it hit problems with inlining elementals that contain operations that are being legalized by the same pass. Any and All should be able to share the same code with a different function/initial value.
15d65d5
to
571d4a0
Compare
Ping, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this
Currently the lowering of a minloc intrinsic with a mask will look something like: %e = hlfir.elemental %shape ({ ... }) %m = hlfir.minloc %array mask %e hlfir.assign %m to %result hlfir.destroy %m The elemental will be expanded into a temporary+loop, the minloc into a FortranAMinloc call (which hopefully gets simplified to a specialized call that can be inlined at the call site), and the assign might get expanded to a FortranAAssign. It would be better to generate the entire construct as single loop if we can - one that performs the minloc calculation with the mask elemental computed inline. This patch attempt to do that, adding a hlfir version of the expansion code from SimplifyIntrinsics that turns an minloc+elemental into a single combined loop nest. It attempts to reuse the methods in genMinlocReductionLoop for constructing the loop with a modified loop body. The declaration for the function is curently in Optimizer/Support/Utils.h, but there might be a better place for it. It is currently added as port of the OptimizedBufferizationPass, like the similar count/any/all that have been added recently.
571d4a0
to
5f79431
Compare
Thanks. I gave this another test and it looks like it did OK. I will push this now, let me know if any problems come up or anything needs adjusting. I'll try and expand this to maxloc too, to keep them more symmetrical. |
I needed to move genMinMaxlocReductionLoop because of the shared-libraries build, in 202917f |
This is an extension to llvm#74828 to handle maxloc too, to keep the minloc and maxloc symmetric.
This is an extension to llvm#74828 to handle maxloc too, to keep the minloc and maxloc symmetric.
This is an extension to #74828 to handle maxloc too, to keep the minloc and maxloc symmetric.
Currently the lowering of a minloc intrinsic with a mask will look something like
The elemental will be expanded into a temporary+loop, the minloc into a
FortranAMinloc call (which hopefully gets simplified to a specialized call that
can be inlined at the call site), and the assign might get expanded to a
FortranAAssign. The assign we could inline too, but it would be better to
generate the entire construct as single loop if we can - one that performs the
minloc calculation with the mask elemental computed inline and assigns directly
to the output array.
This patch attempt to do that, adding a hlfir version of the expansion code
from SimplifyIntrinsics that turns an assign+minloc+elemental into a single
combined loop nest. It attempts to reuse the methods in genMinlocReductionLoop
for constructing the loop with a modified loop body. The declaration for the
function is currently in Optimizer/Support/Utils.h, but there might be a better
place for it.
It is currently added as port of the OptimizedBufferizationPass. I originally
had it as part of the SimplifyHLFIRIntrinsics pass, but there were already some
methods doing similar things in OptimizedBufferization. It just needs to happen
before the elementals are expanded. I think I would like to do a similar thing
for maxloc and any/all/count too if this looks OK. I will rebase over #74436
once that goes in.