-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[flang] Added LoopInvariantCodeMotion pass for [HL]FIR. #173438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We may recurse into `HasRecursiveMemoryEffects` operations and use `getModRef` recursively to get more precise results for regions with `fir.call` operations. This patch also modifies `AliasAnalysis` to set the instantiation point for cases where the tracked data is accessed through a load from `!fir.ref<!fir.box<>>`: without this change the mod-ref analysis was not able to recognize user pointer/allocatable variables.
The new pass allows hoisting some `fir.load` operations early in MLIR. For example, many descriptor load might be hoisted out of the loops, though it does not make much difference in performance, because LLVM is able to optimize such loads (which are lowered as `llvm.memcpy` into temporary descriptors), given that proper TBAA information is generated by Flang. Further hoisting improvements are possible in [HL]FIR LICM, e.g. getting proper mod-ref results for Fortran runtime calls may allow hoisting loads from global variables, which LLVM cannot do due to lack of alias information.
|
@llvm/pr-subscribers-flang-driver Author: Slava Zakharin (vzakhari) ChangesThe new pass allows hoisting some Further hoisting improvements are possible in [HL]FIR LICM, This patch also contains improvements for FIR mod-ref analysis: This patch also modifies Patch is 198.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173438.diff 16 Files Affected:
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index f50202784e2dc..b6b78ec7b21b0 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -596,4 +596,15 @@ def MIFOpConversion : Pass<"mif-convert", "mlir::ModuleOp"> {
let dependentDialects = ["fir::FIROpsDialect", "mlir::LLVM::LLVMDialect"];
}
+def LoopInvariantCodeMotion : Pass<"flang-licm", "::mlir::func::FuncOp"> {
+ let summary = "Hoist invariants from loops";
+ let description = [{
+ Hoist invariants from loops. This is a FIR-specific version of loop
+ invariant code motion, which relies on FIR types, operations (such as
+ fir.declare) and interfaces such as FortranObjectViewOpInterface.
+ The pass only moves existing operations, so there are no dependent
+ dialects.
+ }];
+}
+
#endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
index d2656fa7b8ea8..0dc9dda3aa0f5 100644
--- a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+++ b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
@@ -468,7 +468,8 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
// TODO: limit to Fortran functions??
// 1. Detect variables that can be accessed indirectly.
fir::AliasAnalysis aliasAnalysis;
- fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+ fir::AliasAnalysis::Source varSrc =
+ aliasAnalysis.getSource(var, /*getLastInstantiationPoint=*/true);
// If the variable is not a user variable, we cannot safely assume that
// Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
// be placed in an allocatable/pointer descriptor and escape).
@@ -498,6 +499,7 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
// At that stage, it has been ruled out that local (including the saved ones)
// and dummy cannot be indirectly accessed in the call.
if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+ varSrc.kind != fir::AliasAnalysis::SourceKind::Argument &&
!varSrc.isDummyArgument()) {
if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
!isSavedLocal(varSrc))
@@ -523,19 +525,36 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
/// flow analysis to come 2) Allocate and Free effects are considered
/// modifying
ModRefResult AliasAnalysis::getModRef(Operation *op, Value location) {
- MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
- if (!interface) {
- if (auto call = llvm::dyn_cast<fir::CallOp>(op))
- return getCallModRef(call, location);
- return ModRefResult::getModAndRef();
- }
+ if (auto call = llvm::dyn_cast<fir::CallOp>(op))
+ return getCallModRef(call, location);
// Build a ModRefResult by merging the behavior of the effects of this
// operation.
+ ModRefResult result = ModRefResult::getNoModRef();
+ MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
+ if (op->hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
+ for (mlir::Region ®ion : op->getRegions()) {
+ result = result.merge(getModRef(region, location));
+ if (result.isModAndRef())
+ break;
+ }
+
+ // In MLIR, RecursiveMemoryEffects can be combined with
+ // MemoryEffectOpInterface to describe extra effects on top of the
+ // effects of the nested operations. However, the presence of
+ // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
+ // implies the operation has no other memory effects than the one of its
+ // nested operations.
+ if (!interface)
+ return result;
+ }
+
+ if (!interface || result.isModAndRef())
+ return ModRefResult::getModAndRef();
+
SmallVector<MemoryEffects::EffectInstance> effects;
interface.getEffects(effects);
- ModRefResult result = ModRefResult::getNoModRef();
for (const MemoryEffects::EffectInstance &effect : effects) {
// Check for an alias between the effect and our memory location.
@@ -563,22 +582,6 @@ ModRefResult AliasAnalysis::getModRef(mlir::Region ®ion,
mlir::Value location) {
ModRefResult result = ModRefResult::getNoModRef();
for (mlir::Operation &op : region.getOps()) {
- if (op.hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
- for (mlir::Region &subRegion : op.getRegions()) {
- result = result.merge(getModRef(subRegion, location));
- // Fast return is already mod and ref.
- if (result.isModAndRef())
- return result;
- }
- // In MLIR, RecursiveMemoryEffects can be combined with
- // MemoryEffectOpInterface to describe extra effects on top of the
- // effects of the nested operations. However, the presence of
- // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
- // implies the operation has no other memory effects than the one of its
- // nested operations.
- if (!mlir::isa<mlir::MemoryEffectOpInterface>(op))
- continue;
- }
result = result.merge(getModRef(&op, location));
if (result.isModAndRef())
return result;
@@ -674,6 +677,13 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v,
isCapturedInInternalProcedure |=
boxSrc.isCapturedInInternalProcedure;
+ if (getLastInstantiationPoint) {
+ if (!instantiationPoint)
+ instantiationPoint = boxSrc.origin.instantiationPoint;
+ } else {
+ instantiationPoint = boxSrc.origin.instantiationPoint;
+ }
+
global = llvm::dyn_cast<mlir::SymbolRefAttr>(boxSrc.origin.u);
if (global) {
type = SourceKind::Global;
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 103e736accca0..038ceb1d4bd75 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -206,6 +206,10 @@ void createDefaultFIROptimizerPassPipeline(mlir::PassManager &pm,
pm.addPass(fir::createSimplifyRegionLite());
pm.addPass(mlir::createCSEPass());
+ // Run LICM after CSE, which may reduce the number of operations to hoist.
+ if (pc.OptLevel.isOptimizingForSpeed())
+ pm.addPass(fir::createLoopInvariantCodeMotion());
+
// Polymorphic types
pm.addPass(fir::createPolymorphicOpConversion());
pm.addPass(fir::createAssumedRankOpConversion());
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 619f3adc67c85..fad6f34f478ba 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -1,43 +1,44 @@
add_flang_library(FIRTransforms
AbstractResult.cpp
AddAliasTags.cpp
- AffinePromotion.cpp
+ AddDebugInfo.cpp
AffineDemotion.cpp
+ AffinePromotion.cpp
+ AlgebraicSimplification.cpp
AnnotateConstant.cpp
+ ArrayValueCopy.cpp
AssumedRankOpConversion.cpp
- CharacterConversion.cpp
- CompilerGeneratedNames.cpp
- ConstantArgumentGlobalisation.cpp
- ControlFlowConverter.cpp
CUDA/CUFAllocationConversion.cpp
CUFAddConstructor.cpp
+ CUFComputeSharedMemoryOffsetsAndSize.cpp
CUFDeviceGlobal.cpp
- CUFOpConversion.cpp
CUFGPUToLLVMConversion.cpp
- CUFComputeSharedMemoryOffsetsAndSize.cpp
- ArrayValueCopy.cpp
+ CUFOpConversion.cpp
+ CharacterConversion.cpp
+ CompilerGeneratedNames.cpp
+ ConstantArgumentGlobalisation.cpp
+ ControlFlowConverter.cpp
+ ConvertComplexPow.cpp
+ DebugTypeGenerator.cpp
ExternalNameConversion.cpp
FIRToSCF.cpp
- MemoryUtils.cpp
- MemoryAllocation.cpp
- StackArrays.cpp
+ FunctionAttr.cpp
+ GenRuntimeCallsForTest.cpp
+ LoopInvariantCodeMotion.cpp
+ LoopVersioning.cpp
+ MIFOpConversion.cpp
MemRefDataFlowOpt.cpp
- SimplifyRegionLite.cpp
- AlgebraicSimplification.cpp
- SimplifyIntrinsics.cpp
- AddDebugInfo.cpp
+ MemoryAllocation.cpp
+ MemoryUtils.cpp
+ OptimizeArrayRepacking.cpp
PolymorphicOpConversion.cpp
- LoopVersioning.cpp
- StackReclaim.cpp
- VScaleAttr.cpp
- FunctionAttr.cpp
- DebugTypeGenerator.cpp
SetRuntimeCallAttributes.cpp
- GenRuntimeCallsForTest.cpp
SimplifyFIROperations.cpp
- OptimizeArrayRepacking.cpp
- ConvertComplexPow.cpp
- MIFOpConversion.cpp
+ SimplifyIntrinsics.cpp
+ SimplifyRegionLite.cpp
+ StackArrays.cpp
+ StackReclaim.cpp
+ VScaleAttr.cpp
DEPENDS
CUFAttrs
@@ -63,12 +64,14 @@ add_flang_library(FIRTransforms
MLIR_LIBS
MLIRAffineUtils
+ MLIRAnalysis
MLIRFuncDialect
MLIRGPUDialect
- MLIRLLVMDialect
MLIRLLVMCommonConversion
+ MLIRLLVMDialect
MLIRMathTransforms
MLIROpenACCDialect
MLIROpenACCToLLVMIRTranslation
MLIROpenMPDialect
+ MLIRTransformUtils
)
diff --git a/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
new file mode 100644
index 0000000000000..c033d5e278c8d
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
@@ -0,0 +1,233 @@
+//===- LoopInvariantCodeMotion.cpp ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// FIR-specific Loop Invariant Code Motion pass.
+/// The pass relies on FIR types and interfaces to prove the safety
+/// of hoisting invariant operations out of loop-like operations.
+/// It may be run on both HLFIR and FIR representations.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Analysis/AliasAnalysis.h"
+#include "flang/Optimizer/Dialect/FortranVariableInterface.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/LoopInvariantCodeMotionUtils.h"
+#include "llvm/Support/DebugLog.h"
+
+namespace fir {
+#define GEN_PASS_DEF_LOOPINVARIANTCODEMOTION
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "flang-licm"
+
+// Temporary engineering option for triaging LICM.
+static llvm::cl::opt<bool> disableFlangLICM(
+ "disable-flang-licm", llvm::cl::init(false), llvm::cl::Hidden,
+ llvm::cl::desc("Disable Flang's loop invariant code motion"));
+
+namespace {
+
+using namespace mlir;
+
+/// The pass tries to hoist loop invariant operations with only
+/// MemoryEffects::Read effects (MemoryEffects::Write support
+/// may be added later).
+/// The safety of hoisting is proven by:
+/// * Proving that the loop runs at least one iteration.
+/// * Proving that is is always safe to load from this location
+/// (see isSafeToHoistLoad() comments below).
+struct LoopInvariantCodeMotion
+ : fir::impl::LoopInvariantCodeMotionBase<LoopInvariantCodeMotion> {
+ void runOnOperation() override;
+};
+
+} // namespace
+
+/// 'location' is a memory reference used by a memory access.
+/// The type of 'location' defines the data type of the access
+/// (e.g. it is considered to be invalid to access 'i64'
+/// data using '!fir.ref<i32>`).
+/// For the given location, this function returns true iff
+/// the Fortran object being accessed is a scalar that
+/// may not be OPTIONAL.
+///
+/// Note that the '!fir.ref<!fir.box<>>' accesses are considered
+/// to be scalar, even if the underlying data is an array.
+///
+/// Note that an access of '!fir.ref<scalar>' may access
+/// an array object. For example:
+/// real :: x(:)
+/// do i=...
+/// = x(10)
+/// 'x(10)' accesses array 'x', and it may be unsafe to hoist
+/// it without proving that '10' is a valid index for the array.
+/// The fact that 'x' is not OPTIONAL does not allow hoisting
+/// on its own.
+static bool isNonOptionalScalar(Value location) {
+ while (true) {
+ LDBG() << "Checking location:\n" << location;
+ Type dataType = fir::unwrapRefType(location.getType());
+ if (!isa<fir::BaseBoxType>(location.getType()) &&
+ (!dataType ||
+ (!isa<fir::BaseBoxType>(dataType) && !fir::isa_trivial(dataType) &&
+ !fir::isa_derived(dataType)))) {
+ LDBG() << "Failure: data access is not scalar";
+ return false;
+ }
+ Operation *defOp = location.getDefiningOp();
+ if (!defOp) {
+ LDBG() << "Failure: no defining operation";
+ return false;
+ }
+ if (auto varIface = dyn_cast<fir::FortranVariableOpInterface>(defOp)) {
+ bool result = !varIface.isOptional();
+ if (result)
+ LDBG() << "Success: is non optional scalar";
+ else
+ LDBG() << "Failure: is not non optional scalar";
+ return result;
+ }
+ if (auto viewIface = dyn_cast<fir::FortranObjectViewOpInterface>(defOp)) {
+ location = viewIface.getViewSource(cast<OpResult>(location));
+ } else {
+ LDBG() << "Failure: unknown operation:\n" << *defOp;
+ return false;
+ }
+ }
+}
+
+/// Returns true iff it is safe to hoist the given load-like operation 'op',
+/// which access given memory 'locations', out of the operation 'loopLike'.
+/// The current safety conditions are:
+/// * The loop runs at least one iteration, OR
+/// * all the accessed locations are inside scalar non-OPTIONAL
+/// Fortran objects (Fortran descriptors are considered to be scalars).
+static bool isSafeToHoistLoad(Operation *op, ArrayRef<Value> locations,
+ LoopLikeOpInterface loopLike,
+ AliasAnalysis &aliasAnalysis) {
+ for (Value location : locations)
+ if (aliasAnalysis.getModRef(loopLike.getOperation(), location)
+ .isModAndRef()) {
+ LDBG() << "Failure: reads location:\n"
+ << location << "\nwhich is modified inside the loop";
+ return false;
+ }
+
+ // Check that it is safe to read from all the locations before the loop.
+ std::optional<llvm::APInt> tripCount = loopLike.getStaticTripCount();
+ if (tripCount && !tripCount->isZero()) {
+ // Loop executes at least one iteration, so it is safe to hoist.
+ LDBG() << "Success: loop has non-zero iterations";
+ return true;
+ }
+
+ // Check whether the access must always be valid.
+ return llvm::all_of(
+ locations, [&](Value location) { return isNonOptionalScalar(location); });
+ // TODO: consider hoisting under condition of the loop's trip count
+ // being non-zero.
+}
+
+/// Returns true iff the given 'op' is a load-like operation,
+/// and it can be hoisted out of 'loopLike' operation.
+static bool canHoistLoad(Operation *op, LoopLikeOpInterface loopLike,
+ AliasAnalysis &aliasAnalysis) {
+ LDBG() << "Checking operation:\n" << *op;
+ if (auto effectInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
+ SmallVector<MemoryEffects::EffectInstance> effects;
+ effectInterface.getEffects(effects);
+ if (effects.empty()) {
+ LDBG() << "Failure: not a load";
+ return false;
+ }
+ llvm::SetVector<Value> locations;
+ for (const MemoryEffects::EffectInstance &effect : effects) {
+ Value location = effect.getValue();
+ if (!isa<MemoryEffects::Read>(effect.getEffect())) {
+ LDBG() << "Failure: has unsupported effects";
+ return false;
+ } else if (!location) {
+ LDBG() << "Failure: reads from unknown location";
+ return false;
+ }
+ locations.insert(location);
+ }
+ return isSafeToHoistLoad(op, locations.getArrayRef(), loopLike,
+ aliasAnalysis);
+ }
+ LDBG() << "Failure: has unknown effects";
+ return false;
+}
+
+void LoopInvariantCodeMotion::runOnOperation() {
+ if (disableFlangLICM) {
+ LDBG() << "Skipping [HL]FIR LoopInvariantCodeMotion()";
+ return;
+ }
+
+ LDBG() << "Enter [HL]FIR LoopInvariantCodeMotion()";
+
+ auto &aliasAnalysis = getAnalysis<AliasAnalysis>();
+ aliasAnalysis.addAnalysisImplementation(fir::AliasAnalysis{});
+
+ std::function<bool(Operation *, LoopLikeOpInterface loopLike)>
+ shouldMoveOutOfLoop = [&](Operation *op, LoopLikeOpInterface loopLike) {
+ if (isPure(op)) {
+ LDBG() << "Pure operation: " << *op;
+ return true;
+ }
+
+ // Handle RecursivelySpeculatable operations that have
+ // RecursiveMemoryEffects by checking if all their
+ // nested operations can be hoisted.
+ auto iface = dyn_cast<ConditionallySpeculatable>(op);
+ if (iface && iface.getSpeculatability() ==
+ Speculation::RecursivelySpeculatable) {
+ if (op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
+ LDBG() << "Checking recursive operation:\n" << *op;
+ llvm::SmallVector<Operation *> nestedOps;
+ for (Region ®ion : op->getRegions())
+ for (Block &block : region)
+ for (Operation &nestedOp : block)
+ nestedOps.push_back(&nestedOp);
+
+ bool result = llvm::all_of(nestedOps, [&](Operation *nestedOp) {
+ return shouldMoveOutOfLoop(nestedOp, loopLike);
+ });
+ LDBG() << "Recursive operation can" << (result ? "" : "not")
+ << " be hoisted";
+
+ // If nested operations cannot be hoisted, there is nothing
+ // else to check. Also if the operation itself does not have
+ // any memory effects, we can return the result now.
+ // Otherwise, we have to check the operation itself below.
+ if (!result || !isa<MemoryEffectOpInterface>(op))
+ return result;
+ }
+ }
+ return canHoistLoad(op, loopLike, aliasAnalysis);
+ };
+
+ getOperation()->walk([&](LoopLikeOpInterface loopLike) {
+ moveLoopInvariantCode(
+ loopLike.getLoopRegions(),
+ /*isDefinedOutsideRegion=*/
+ [&](Value value, Region *) {
+ return loopLike.isDefinedOutsideOfLoop(value);
+ },
+ /*shouldMoveOutOfRegion=*/
+ [&](Operation *op, Region *) {
+ return shouldMoveOutOfLoop(op, loopLike);
+ },
+ /*moveOutOfRegion=*/
+ [&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
+ });
+
+ LDBG() << "Exit [HL]FIR LoopInvariantCodeMotion()";
+}
diff --git a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
index ce7d9b1700bf7..953f1a2545781 100755
--- a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
+++ b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
@@ -7,11 +7,14 @@
This will insert mod ref test hook:
- to any fir.call to a function which name starts with "test_effect_"
- to any hlfir.declare for variable which name starts with "test_var_"
+ - to any fir.box_addr - they are assigned box_addr_# hooks
"""
import sys
import re
+box_addr_counter=0
+
for line in sys.stdin:
line = re.sub(
r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ",
@@ -23,4 +26,11 @@
r'\1\2", test.ptr ="\2"',
line,
)
+ line, count = re.subn(
+ r'(fir.box_addr.*) :',
+ rf'\1 {{test.ptr ="box_addr_{box_addr_counter}"}} :',
+ line,
+ )
+ if count > 0:
+ box_addr_counter += 1
sys.stdout.write(line)
diff --git a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
index a4c57cff70927..a81f266b87979 100644
--- a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
+++ b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
@@ -20,7 +20,7 @@ subroutine test_dummy(test_var_x)
use somemod, only : may_capture
implicit none
real :: test_var_x
- ! Capture is invalid after the call because test_var_xsaved does not have the
+ ! Capture is invalid after the call because test_var_x does not have the
! target attribute.
call may_capture(test_var_x)
call test_effect_external()
@@ -50,4 +50,61 @@ subroutine test_dummy_pointer(p)
end associate
end subroutine
! CHECK-LABEL: Testing : "_QPtest_dummy_pointer"
-! CHECK: test_effect_external -> test_var_p_target#0: ModRef
+! CHECK-DAG: test_effect_external ...
[truncated]
|
|
@llvm/pr-subscribers-flang-fir-hlfir Author: Slava Zakharin (vzakhari) ChangesThe new pass allows hoisting some Further hoisting improvements are possible in [HL]FIR LICM, This patch also contains improvements for FIR mod-ref analysis: This patch also modifies Patch is 198.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173438.diff 16 Files Affected:
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index f50202784e2dc..b6b78ec7b21b0 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -596,4 +596,15 @@ def MIFOpConversion : Pass<"mif-convert", "mlir::ModuleOp"> {
let dependentDialects = ["fir::FIROpsDialect", "mlir::LLVM::LLVMDialect"];
}
+def LoopInvariantCodeMotion : Pass<"flang-licm", "::mlir::func::FuncOp"> {
+ let summary = "Hoist invariants from loops";
+ let description = [{
+ Hoist invariants from loops. This is a FIR-specific version of loop
+ invariant code motion, which relies on FIR types, operations (such as
+ fir.declare) and interfaces such as FortranObjectViewOpInterface.
+ The pass only moves existing operations, so there are no dependent
+ dialects.
+ }];
+}
+
#endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
index d2656fa7b8ea8..0dc9dda3aa0f5 100644
--- a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+++ b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
@@ -468,7 +468,8 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
// TODO: limit to Fortran functions??
// 1. Detect variables that can be accessed indirectly.
fir::AliasAnalysis aliasAnalysis;
- fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+ fir::AliasAnalysis::Source varSrc =
+ aliasAnalysis.getSource(var, /*getLastInstantiationPoint=*/true);
// If the variable is not a user variable, we cannot safely assume that
// Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
// be placed in an allocatable/pointer descriptor and escape).
@@ -498,6 +499,7 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
// At that stage, it has been ruled out that local (including the saved ones)
// and dummy cannot be indirectly accessed in the call.
if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+ varSrc.kind != fir::AliasAnalysis::SourceKind::Argument &&
!varSrc.isDummyArgument()) {
if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
!isSavedLocal(varSrc))
@@ -523,19 +525,36 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
/// flow analysis to come 2) Allocate and Free effects are considered
/// modifying
ModRefResult AliasAnalysis::getModRef(Operation *op, Value location) {
- MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
- if (!interface) {
- if (auto call = llvm::dyn_cast<fir::CallOp>(op))
- return getCallModRef(call, location);
- return ModRefResult::getModAndRef();
- }
+ if (auto call = llvm::dyn_cast<fir::CallOp>(op))
+ return getCallModRef(call, location);
// Build a ModRefResult by merging the behavior of the effects of this
// operation.
+ ModRefResult result = ModRefResult::getNoModRef();
+ MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
+ if (op->hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
+ for (mlir::Region ®ion : op->getRegions()) {
+ result = result.merge(getModRef(region, location));
+ if (result.isModAndRef())
+ break;
+ }
+
+ // In MLIR, RecursiveMemoryEffects can be combined with
+ // MemoryEffectOpInterface to describe extra effects on top of the
+ // effects of the nested operations. However, the presence of
+ // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
+ // implies the operation has no other memory effects than the one of its
+ // nested operations.
+ if (!interface)
+ return result;
+ }
+
+ if (!interface || result.isModAndRef())
+ return ModRefResult::getModAndRef();
+
SmallVector<MemoryEffects::EffectInstance> effects;
interface.getEffects(effects);
- ModRefResult result = ModRefResult::getNoModRef();
for (const MemoryEffects::EffectInstance &effect : effects) {
// Check for an alias between the effect and our memory location.
@@ -563,22 +582,6 @@ ModRefResult AliasAnalysis::getModRef(mlir::Region ®ion,
mlir::Value location) {
ModRefResult result = ModRefResult::getNoModRef();
for (mlir::Operation &op : region.getOps()) {
- if (op.hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
- for (mlir::Region &subRegion : op.getRegions()) {
- result = result.merge(getModRef(subRegion, location));
- // Fast return is already mod and ref.
- if (result.isModAndRef())
- return result;
- }
- // In MLIR, RecursiveMemoryEffects can be combined with
- // MemoryEffectOpInterface to describe extra effects on top of the
- // effects of the nested operations. However, the presence of
- // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
- // implies the operation has no other memory effects than the one of its
- // nested operations.
- if (!mlir::isa<mlir::MemoryEffectOpInterface>(op))
- continue;
- }
result = result.merge(getModRef(&op, location));
if (result.isModAndRef())
return result;
@@ -674,6 +677,13 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v,
isCapturedInInternalProcedure |=
boxSrc.isCapturedInInternalProcedure;
+ if (getLastInstantiationPoint) {
+ if (!instantiationPoint)
+ instantiationPoint = boxSrc.origin.instantiationPoint;
+ } else {
+ instantiationPoint = boxSrc.origin.instantiationPoint;
+ }
+
global = llvm::dyn_cast<mlir::SymbolRefAttr>(boxSrc.origin.u);
if (global) {
type = SourceKind::Global;
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 103e736accca0..038ceb1d4bd75 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -206,6 +206,10 @@ void createDefaultFIROptimizerPassPipeline(mlir::PassManager &pm,
pm.addPass(fir::createSimplifyRegionLite());
pm.addPass(mlir::createCSEPass());
+ // Run LICM after CSE, which may reduce the number of operations to hoist.
+ if (pc.OptLevel.isOptimizingForSpeed())
+ pm.addPass(fir::createLoopInvariantCodeMotion());
+
// Polymorphic types
pm.addPass(fir::createPolymorphicOpConversion());
pm.addPass(fir::createAssumedRankOpConversion());
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 619f3adc67c85..fad6f34f478ba 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -1,43 +1,44 @@
add_flang_library(FIRTransforms
AbstractResult.cpp
AddAliasTags.cpp
- AffinePromotion.cpp
+ AddDebugInfo.cpp
AffineDemotion.cpp
+ AffinePromotion.cpp
+ AlgebraicSimplification.cpp
AnnotateConstant.cpp
+ ArrayValueCopy.cpp
AssumedRankOpConversion.cpp
- CharacterConversion.cpp
- CompilerGeneratedNames.cpp
- ConstantArgumentGlobalisation.cpp
- ControlFlowConverter.cpp
CUDA/CUFAllocationConversion.cpp
CUFAddConstructor.cpp
+ CUFComputeSharedMemoryOffsetsAndSize.cpp
CUFDeviceGlobal.cpp
- CUFOpConversion.cpp
CUFGPUToLLVMConversion.cpp
- CUFComputeSharedMemoryOffsetsAndSize.cpp
- ArrayValueCopy.cpp
+ CUFOpConversion.cpp
+ CharacterConversion.cpp
+ CompilerGeneratedNames.cpp
+ ConstantArgumentGlobalisation.cpp
+ ControlFlowConverter.cpp
+ ConvertComplexPow.cpp
+ DebugTypeGenerator.cpp
ExternalNameConversion.cpp
FIRToSCF.cpp
- MemoryUtils.cpp
- MemoryAllocation.cpp
- StackArrays.cpp
+ FunctionAttr.cpp
+ GenRuntimeCallsForTest.cpp
+ LoopInvariantCodeMotion.cpp
+ LoopVersioning.cpp
+ MIFOpConversion.cpp
MemRefDataFlowOpt.cpp
- SimplifyRegionLite.cpp
- AlgebraicSimplification.cpp
- SimplifyIntrinsics.cpp
- AddDebugInfo.cpp
+ MemoryAllocation.cpp
+ MemoryUtils.cpp
+ OptimizeArrayRepacking.cpp
PolymorphicOpConversion.cpp
- LoopVersioning.cpp
- StackReclaim.cpp
- VScaleAttr.cpp
- FunctionAttr.cpp
- DebugTypeGenerator.cpp
SetRuntimeCallAttributes.cpp
- GenRuntimeCallsForTest.cpp
SimplifyFIROperations.cpp
- OptimizeArrayRepacking.cpp
- ConvertComplexPow.cpp
- MIFOpConversion.cpp
+ SimplifyIntrinsics.cpp
+ SimplifyRegionLite.cpp
+ StackArrays.cpp
+ StackReclaim.cpp
+ VScaleAttr.cpp
DEPENDS
CUFAttrs
@@ -63,12 +64,14 @@ add_flang_library(FIRTransforms
MLIR_LIBS
MLIRAffineUtils
+ MLIRAnalysis
MLIRFuncDialect
MLIRGPUDialect
- MLIRLLVMDialect
MLIRLLVMCommonConversion
+ MLIRLLVMDialect
MLIRMathTransforms
MLIROpenACCDialect
MLIROpenACCToLLVMIRTranslation
MLIROpenMPDialect
+ MLIRTransformUtils
)
diff --git a/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
new file mode 100644
index 0000000000000..c033d5e278c8d
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
@@ -0,0 +1,233 @@
+//===- LoopInvariantCodeMotion.cpp ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// FIR-specific Loop Invariant Code Motion pass.
+/// The pass relies on FIR types and interfaces to prove the safety
+/// of hoisting invariant operations out of loop-like operations.
+/// It may be run on both HLFIR and FIR representations.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Analysis/AliasAnalysis.h"
+#include "flang/Optimizer/Dialect/FortranVariableInterface.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/LoopInvariantCodeMotionUtils.h"
+#include "llvm/Support/DebugLog.h"
+
+namespace fir {
+#define GEN_PASS_DEF_LOOPINVARIANTCODEMOTION
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "flang-licm"
+
+// Temporary engineering option for triaging LICM.
+static llvm::cl::opt<bool> disableFlangLICM(
+ "disable-flang-licm", llvm::cl::init(false), llvm::cl::Hidden,
+ llvm::cl::desc("Disable Flang's loop invariant code motion"));
+
+namespace {
+
+using namespace mlir;
+
+/// The pass tries to hoist loop invariant operations with only
+/// MemoryEffects::Read effects (MemoryEffects::Write support
+/// may be added later).
+/// The safety of hoisting is proven by:
+/// * Proving that the loop runs at least one iteration.
+/// * Proving that is is always safe to load from this location
+/// (see isSafeToHoistLoad() comments below).
+struct LoopInvariantCodeMotion
+ : fir::impl::LoopInvariantCodeMotionBase<LoopInvariantCodeMotion> {
+ void runOnOperation() override;
+};
+
+} // namespace
+
+/// 'location' is a memory reference used by a memory access.
+/// The type of 'location' defines the data type of the access
+/// (e.g. it is considered to be invalid to access 'i64'
+/// data using '!fir.ref<i32>`).
+/// For the given location, this function returns true iff
+/// the Fortran object being accessed is a scalar that
+/// may not be OPTIONAL.
+///
+/// Note that the '!fir.ref<!fir.box<>>' accesses are considered
+/// to be scalar, even if the underlying data is an array.
+///
+/// Note that an access of '!fir.ref<scalar>' may access
+/// an array object. For example:
+/// real :: x(:)
+/// do i=...
+/// = x(10)
+/// 'x(10)' accesses array 'x', and it may be unsafe to hoist
+/// it without proving that '10' is a valid index for the array.
+/// The fact that 'x' is not OPTIONAL does not allow hoisting
+/// on its own.
+static bool isNonOptionalScalar(Value location) {
+ while (true) {
+ LDBG() << "Checking location:\n" << location;
+ Type dataType = fir::unwrapRefType(location.getType());
+ if (!isa<fir::BaseBoxType>(location.getType()) &&
+ (!dataType ||
+ (!isa<fir::BaseBoxType>(dataType) && !fir::isa_trivial(dataType) &&
+ !fir::isa_derived(dataType)))) {
+ LDBG() << "Failure: data access is not scalar";
+ return false;
+ }
+ Operation *defOp = location.getDefiningOp();
+ if (!defOp) {
+ LDBG() << "Failure: no defining operation";
+ return false;
+ }
+ if (auto varIface = dyn_cast<fir::FortranVariableOpInterface>(defOp)) {
+ bool result = !varIface.isOptional();
+ if (result)
+ LDBG() << "Success: is non optional scalar";
+ else
+ LDBG() << "Failure: is not non optional scalar";
+ return result;
+ }
+ if (auto viewIface = dyn_cast<fir::FortranObjectViewOpInterface>(defOp)) {
+ location = viewIface.getViewSource(cast<OpResult>(location));
+ } else {
+ LDBG() << "Failure: unknown operation:\n" << *defOp;
+ return false;
+ }
+ }
+}
+
+/// Returns true iff it is safe to hoist the given load-like operation 'op',
+/// which access given memory 'locations', out of the operation 'loopLike'.
+/// The current safety conditions are:
+/// * The loop runs at least one iteration, OR
+/// * all the accessed locations are inside scalar non-OPTIONAL
+/// Fortran objects (Fortran descriptors are considered to be scalars).
+static bool isSafeToHoistLoad(Operation *op, ArrayRef<Value> locations,
+ LoopLikeOpInterface loopLike,
+ AliasAnalysis &aliasAnalysis) {
+ for (Value location : locations)
+ if (aliasAnalysis.getModRef(loopLike.getOperation(), location)
+ .isModAndRef()) {
+ LDBG() << "Failure: reads location:\n"
+ << location << "\nwhich is modified inside the loop";
+ return false;
+ }
+
+ // Check that it is safe to read from all the locations before the loop.
+ std::optional<llvm::APInt> tripCount = loopLike.getStaticTripCount();
+ if (tripCount && !tripCount->isZero()) {
+ // Loop executes at least one iteration, so it is safe to hoist.
+ LDBG() << "Success: loop has non-zero iterations";
+ return true;
+ }
+
+ // Check whether the access must always be valid.
+ return llvm::all_of(
+ locations, [&](Value location) { return isNonOptionalScalar(location); });
+ // TODO: consider hoisting under condition of the loop's trip count
+ // being non-zero.
+}
+
+/// Returns true iff the given 'op' is a load-like operation,
+/// and it can be hoisted out of 'loopLike' operation.
+static bool canHoistLoad(Operation *op, LoopLikeOpInterface loopLike,
+ AliasAnalysis &aliasAnalysis) {
+ LDBG() << "Checking operation:\n" << *op;
+ if (auto effectInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
+ SmallVector<MemoryEffects::EffectInstance> effects;
+ effectInterface.getEffects(effects);
+ if (effects.empty()) {
+ LDBG() << "Failure: not a load";
+ return false;
+ }
+ llvm::SetVector<Value> locations;
+ for (const MemoryEffects::EffectInstance &effect : effects) {
+ Value location = effect.getValue();
+ if (!isa<MemoryEffects::Read>(effect.getEffect())) {
+ LDBG() << "Failure: has unsupported effects";
+ return false;
+ } else if (!location) {
+ LDBG() << "Failure: reads from unknown location";
+ return false;
+ }
+ locations.insert(location);
+ }
+ return isSafeToHoistLoad(op, locations.getArrayRef(), loopLike,
+ aliasAnalysis);
+ }
+ LDBG() << "Failure: has unknown effects";
+ return false;
+}
+
+void LoopInvariantCodeMotion::runOnOperation() {
+ if (disableFlangLICM) {
+ LDBG() << "Skipping [HL]FIR LoopInvariantCodeMotion()";
+ return;
+ }
+
+ LDBG() << "Enter [HL]FIR LoopInvariantCodeMotion()";
+
+ auto &aliasAnalysis = getAnalysis<AliasAnalysis>();
+ aliasAnalysis.addAnalysisImplementation(fir::AliasAnalysis{});
+
+ std::function<bool(Operation *, LoopLikeOpInterface loopLike)>
+ shouldMoveOutOfLoop = [&](Operation *op, LoopLikeOpInterface loopLike) {
+ if (isPure(op)) {
+ LDBG() << "Pure operation: " << *op;
+ return true;
+ }
+
+ // Handle RecursivelySpeculatable operations that have
+ // RecursiveMemoryEffects by checking if all their
+ // nested operations can be hoisted.
+ auto iface = dyn_cast<ConditionallySpeculatable>(op);
+ if (iface && iface.getSpeculatability() ==
+ Speculation::RecursivelySpeculatable) {
+ if (op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
+ LDBG() << "Checking recursive operation:\n" << *op;
+ llvm::SmallVector<Operation *> nestedOps;
+ for (Region ®ion : op->getRegions())
+ for (Block &block : region)
+ for (Operation &nestedOp : block)
+ nestedOps.push_back(&nestedOp);
+
+ bool result = llvm::all_of(nestedOps, [&](Operation *nestedOp) {
+ return shouldMoveOutOfLoop(nestedOp, loopLike);
+ });
+ LDBG() << "Recursive operation can" << (result ? "" : "not")
+ << " be hoisted";
+
+ // If nested operations cannot be hoisted, there is nothing
+ // else to check. Also if the operation itself does not have
+ // any memory effects, we can return the result now.
+ // Otherwise, we have to check the operation itself below.
+ if (!result || !isa<MemoryEffectOpInterface>(op))
+ return result;
+ }
+ }
+ return canHoistLoad(op, loopLike, aliasAnalysis);
+ };
+
+ getOperation()->walk([&](LoopLikeOpInterface loopLike) {
+ moveLoopInvariantCode(
+ loopLike.getLoopRegions(),
+ /*isDefinedOutsideRegion=*/
+ [&](Value value, Region *) {
+ return loopLike.isDefinedOutsideOfLoop(value);
+ },
+ /*shouldMoveOutOfRegion=*/
+ [&](Operation *op, Region *) {
+ return shouldMoveOutOfLoop(op, loopLike);
+ },
+ /*moveOutOfRegion=*/
+ [&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
+ });
+
+ LDBG() << "Exit [HL]FIR LoopInvariantCodeMotion()";
+}
diff --git a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
index ce7d9b1700bf7..953f1a2545781 100755
--- a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
+++ b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
@@ -7,11 +7,14 @@
This will insert mod ref test hook:
- to any fir.call to a function which name starts with "test_effect_"
- to any hlfir.declare for variable which name starts with "test_var_"
+ - to any fir.box_addr - they are assigned box_addr_# hooks
"""
import sys
import re
+box_addr_counter=0
+
for line in sys.stdin:
line = re.sub(
r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ",
@@ -23,4 +26,11 @@
r'\1\2", test.ptr ="\2"',
line,
)
+ line, count = re.subn(
+ r'(fir.box_addr.*) :',
+ rf'\1 {{test.ptr ="box_addr_{box_addr_counter}"}} :',
+ line,
+ )
+ if count > 0:
+ box_addr_counter += 1
sys.stdout.write(line)
diff --git a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
index a4c57cff70927..a81f266b87979 100644
--- a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
+++ b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
@@ -20,7 +20,7 @@ subroutine test_dummy(test_var_x)
use somemod, only : may_capture
implicit none
real :: test_var_x
- ! Capture is invalid after the call because test_var_xsaved does not have the
+ ! Capture is invalid after the call because test_var_x does not have the
! target attribute.
call may_capture(test_var_x)
call test_effect_external()
@@ -50,4 +50,61 @@ subroutine test_dummy_pointer(p)
end associate
end subroutine
! CHECK-LABEL: Testing : "_QPtest_dummy_pointer"
-! CHECK: test_effect_external -> test_var_p_target#0: ModRef
+! CHECK-DAG: test_effect_external ...
[truncated]
|
|
✅ With the latest revision this PR passed the Python code formatter. |
LICM (llvm#173438) may insert new operations at the beginning of `fir.do_concurrent`'s block and they cannot be always hoisted to the alloca-block of the parent operation. This patch only moves `fir.alloca`s into the alloca-block, and moves all other operations right before fir.do_concurrent.
|
This should be merged after #173502 to avoid new test failures. |
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting stuff - again!
First pass looking at this, I will have a second look this week.
| // An example of something currently problematic is the allocmem generated for | ||
| // ALLOCATE of allocatable target. It currently does not have the target | ||
| // attribute, which would lead this analysis to believe it cannot escape. | ||
| if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the above TODOs:
- Limit to fortran functions
- Skip compiler generated temps
Are complete and can be removed?
Not fully related to your patch: I am trying to understand how ready getCallModRef is for use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, Tom!
I do not think this TODO is resolved, so the check for isFortranUserVariable should be necessary still.
I would like to look further into isCallToFortranUserProcedure check so that we can handle Fortran runtime calls better (e.g. based on the memroy access attributes that we are able to figure out in SetRuntimeCallAttributesPass).
I beleive there is a room for improvement in the mod-ref analysis, but it already works for some cases that Jean addressed when he initially added the implementation.
|
|
||
| // Run LICM after CSE, which may reduce the number of operations to hoist. | ||
| if (pc.OptLevel.isOptimizingForSpeed()) | ||
| pm.addPass(fir::createLoopInvariantCodeMotion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this doesn't currently lead to performance improvements, do we need to enable it by default? I'm worried this might be a relatively expensive pass.
I'm curious what the motivation is here more than anything. If there is a plan and you haven't observed serious compile time regressions then I am happy to see this on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to measure performance, but my expectation is that this patch won't change anything. You have a good point about the compilation time, and I will measure it.
There is one example related to private copies of descriptors in OpenACC, which LICM can help to address (though, I guess this is not the only way to solve it). In general, I would like to be able to optimize code at MLIR level, which otherwise would require more detailed alias-info representation in LLVM IR. In addition, such early MLIR optimizations may help with enabling more MLIR optimizations (like the ones related to Affine transformations).
I agree with you that it is probably too early to enable this by default, and I will move it under an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I'm not really opposed to having this on by default so long as you don't find serious compilation time regressions (I agree this pass looks useful in the long term), but feel free to disable by default for now if you prefer.
LICM (#173438) may insert new operations at the beginning of `fir.do_concurrent`'s block and they cannot be always hoisted to the alloca-block of the parent operation. This patch only moves `fir.alloca`s into the alloca-block, and moves all other operations right before fir.do_concurrent.
| return canHoistLoad(op, loopLike, aliasAnalysis); | ||
| }; | ||
|
|
||
| getOperation()->walk([&](LoopLikeOpInterface loopLike) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you check for mlir::omp::OutlinableOpenMPOpInterface and avoid LICM in that case. These are operations whose body will be outlined (and could be run in a different thread). I'm not sure that we will ever have a LoopLikeOp which is outlinable but it is better to be safe.
Similarly, please could you make sure not to outline operations out of loops and into something that is a mlir::omp::LoopWrapperInterface. The loop wrappers are assumed to contain only a nested loop wrapper or a loop, and a terminator. Any other operations would currently crash the compiler. For example in,
omp.private {type = private} @_QFtestEi_private_i32 : i32
func.func @_QPtest() {
%0 = fir.dummy_scope : !fir.dscope
%1 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFtestEi"}
%2:2 = hlfir.declare %1 {uniq_name = "_QFtestEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
%c1_i32 = arith.constant 1 : i32
%c100_i32 = arith.constant 100 : i32
%c1_i32_0 = arith.constant 1 : i32
omp.wsloop private(@_QFtestEi_private_i32 %2#0 -> %arg0 : !fir.ref<i32>) {
omp.loop_nest (%arg1) : i32 = (%c1_i32) to (%c100_i32) inclusive step (%c1_i32_0) {
%3:2 = hlfir.declare %arg0 {uniq_name = "_QFtestEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
hlfir.assign %arg1 to %3#0 : i32, !fir.ref<i32>
fir.call @_QPdo_something(%3#0) fastmath<contract> : (!fir.ref<i32>) -> ()
omp.yield
}
}
return
}
func.func private @_QPdo_something(!fir.ref<i32>)
The omp.wsloop operation must contain only the omp.loop_nest and the (implicit) terminator. It would be valid to keep going and move to the first non-loop wrapper parent (beware nested loop wrappers). But again, make sure nothing is ever moved out of a outlinable operation.
| /// Returns true iff it is safe to hoist the given load-like operation 'op', | ||
| /// which access given memory 'locations', out of the operation 'loopLike'. | ||
| /// The current safety conditions are: | ||
| /// * The loop runs at least one iteration, OR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like
integer, allocatable :: a
integer, pointer :: p
! a is not allocated
! p is not associated
do i=1,2
if (.false.)
load_like(a)
load_like(p)
endif
enddo
| return result; | ||
| } | ||
| } | ||
| return canHoistLoad(op, loopLike, aliasAnalysis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be another PR, but looking at #62419, I think it could be useful to hoist operations with allocation effects too.
The new pass allows hoisting some
fir.loadoperations earlyin MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as
llvm.memcpyinto temporary descriptors),given that proper TBAA information is generated by Flang.
Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.
This patch also contains improvements for FIR mod-ref analysis:
We may recurse into
HasRecursiveMemoryEffectsoperations anduse
getModRefrecursively to get more precise results forregions with
fir.calloperations.This patch also modifies
AliasAnalysisto set the instantiationpoint for cases where the tracked data is accessed through a load
from
!fir.ref<!fir.box<>>: without this change the mod-refanalysis was not able to recognize user pointer/allocatable variables.