Skip to content

Conversation

@vzakhari
Copy link
Contributor

The new pass allows hoisting some fir.load operations early
in MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as llvm.memcpy into temporary descriptors),
given that proper TBAA information is generated by Flang.

Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.

This patch also contains improvements for FIR mod-ref analysis:
We may recurse into HasRecursiveMemoryEffects operations and
use getModRef recursively to get more precise results for
regions with fir.call operations.

This patch also modifies AliasAnalysis to set the instantiation
point for cases where the tracked data is accessed through a load
from !fir.ref<!fir.box<>>: without this change the mod-ref
analysis was not able to recognize user pointer/allocatable variables.

We may recurse into `HasRecursiveMemoryEffects` operations and
use `getModRef` recursively to get more precise results for
regions with `fir.call` operations.

This patch also modifies `AliasAnalysis` to set the instantiation
point for cases where the tracked data is accessed through a load
from `!fir.ref<!fir.box<>>`: without this change the mod-ref
analysis was not able to recognize user pointer/allocatable variables.
The new pass allows hoisting some `fir.load` operations early
in MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as `llvm.memcpy` into temporary descriptors),
given that proper TBAA information is generated by Flang.

Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.
@llvmbot llvmbot added flang:driver flang Flang issues not falling into any other category flang:fir-hlfir labels Dec 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 24, 2025

@llvm/pr-subscribers-flang-driver

Author: Slava Zakharin (vzakhari)

Changes

The new pass allows hoisting some fir.load operations early
in MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as llvm.memcpy into temporary descriptors),
given that proper TBAA information is generated by Flang.

Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.

This patch also contains improvements for FIR mod-ref analysis:
We may recurse into HasRecursiveMemoryEffects operations and
use getModRef recursively to get more precise results for
regions with fir.call operations.

This patch also modifies AliasAnalysis to set the instantiation
point for cases where the tracked data is accessed through a load
from !fir.ref&lt;!fir.box&lt;&gt;&gt;: without this change the mod-ref
analysis was not able to recognize user pointer/allocatable variables.


Patch is 198.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173438.diff

16 Files Affected:

  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+11)
  • (modified) flang/lib/Optimizer/Analysis/AliasAnalysis.cpp (+34-24)
  • (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+4)
  • (modified) flang/lib/Optimizer/Transforms/CMakeLists.txt (+28-25)
  • (added) flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp (+233)
  • (modified) flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py (+10)
  • (modified) flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 (+59-2)
  • (modified) flang/test/Analysis/AliasAnalysis/modref-call-globals.f90 (+22)
  • (added) flang/test/Analysis/AliasAnalysis/modref-call-recursive.fir (+62)
  • (modified) flang/test/Driver/bbc-mlir-pass-pipeline.f90 (+3)
  • (modified) flang/test/Driver/mlir-pass-pipeline.f90 (+3)
  • (modified) flang/test/Fir/basic-program.fir (+3)
  • (modified) flang/test/Lower/array-expression-assumed-size.f90 (+164-159)
  • (modified) flang/test/Lower/array-temp.f90 (+262-250)
  • (modified) flang/test/Lower/vector-subscript-io.f90 (+5-5)
  • (added) flang/test/Transforms/licm.fir (+1326)
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index f50202784e2dc..b6b78ec7b21b0 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -596,4 +596,15 @@ def MIFOpConversion : Pass<"mif-convert", "mlir::ModuleOp"> {
   let dependentDialects = ["fir::FIROpsDialect", "mlir::LLVM::LLVMDialect"];
 }
 
+def LoopInvariantCodeMotion : Pass<"flang-licm", "::mlir::func::FuncOp"> {
+  let summary = "Hoist invariants from loops";
+  let description = [{
+    Hoist invariants from loops. This is a FIR-specific version of loop
+    invariant code motion, which relies on FIR types, operations (such as
+    fir.declare) and interfaces such as FortranObjectViewOpInterface.
+    The pass only moves existing operations, so there are no dependent
+    dialects.
+  }];
+}
+
 #endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
index d2656fa7b8ea8..0dc9dda3aa0f5 100644
--- a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+++ b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
@@ -468,7 +468,8 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
   // TODO: limit to Fortran functions??
   // 1. Detect variables that can be accessed indirectly.
   fir::AliasAnalysis aliasAnalysis;
-  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  fir::AliasAnalysis::Source varSrc =
+      aliasAnalysis.getSource(var, /*getLastInstantiationPoint=*/true);
   // If the variable is not a user variable, we cannot safely assume that
   // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
   // be placed in an allocatable/pointer descriptor and escape).
@@ -498,6 +499,7 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
   // At that stage, it has been ruled out that local (including the saved ones)
   // and dummy cannot be indirectly accessed in the call.
   if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+      varSrc.kind != fir::AliasAnalysis::SourceKind::Argument &&
       !varSrc.isDummyArgument()) {
     if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
         !isSavedLocal(varSrc))
@@ -523,19 +525,36 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
 /// flow analysis to come 2) Allocate and Free effects are considered
 /// modifying
 ModRefResult AliasAnalysis::getModRef(Operation *op, Value location) {
-  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
-  if (!interface) {
-    if (auto call = llvm::dyn_cast<fir::CallOp>(op))
-      return getCallModRef(call, location);
-    return ModRefResult::getModAndRef();
-  }
+  if (auto call = llvm::dyn_cast<fir::CallOp>(op))
+    return getCallModRef(call, location);
 
   // Build a ModRefResult by merging the behavior of the effects of this
   // operation.
+  ModRefResult result = ModRefResult::getNoModRef();
+  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
+  if (op->hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
+    for (mlir::Region &region : op->getRegions()) {
+      result = result.merge(getModRef(region, location));
+      if (result.isModAndRef())
+        break;
+    }
+
+    // In MLIR, RecursiveMemoryEffects can be combined with
+    // MemoryEffectOpInterface to describe extra effects on top of the
+    // effects of the nested operations.  However, the presence of
+    // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
+    // implies the operation has no other memory effects than the one of its
+    // nested operations.
+    if (!interface)
+      return result;
+  }
+
+  if (!interface || result.isModAndRef())
+    return ModRefResult::getModAndRef();
+
   SmallVector<MemoryEffects::EffectInstance> effects;
   interface.getEffects(effects);
 
-  ModRefResult result = ModRefResult::getNoModRef();
   for (const MemoryEffects::EffectInstance &effect : effects) {
 
     // Check for an alias between the effect and our memory location.
@@ -563,22 +582,6 @@ ModRefResult AliasAnalysis::getModRef(mlir::Region &region,
                                       mlir::Value location) {
   ModRefResult result = ModRefResult::getNoModRef();
   for (mlir::Operation &op : region.getOps()) {
-    if (op.hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
-      for (mlir::Region &subRegion : op.getRegions()) {
-        result = result.merge(getModRef(subRegion, location));
-        // Fast return is already mod and ref.
-        if (result.isModAndRef())
-          return result;
-      }
-      // In MLIR, RecursiveMemoryEffects can be combined with
-      // MemoryEffectOpInterface to describe extra effects on top of the
-      // effects of the nested operations.  However, the presence of
-      // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
-      // implies the operation has no other memory effects than the one of its
-      // nested operations.
-      if (!mlir::isa<mlir::MemoryEffectOpInterface>(op))
-        continue;
-    }
     result = result.merge(getModRef(&op, location));
     if (result.isModAndRef())
       return result;
@@ -674,6 +677,13 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v,
             isCapturedInInternalProcedure |=
                 boxSrc.isCapturedInInternalProcedure;
 
+            if (getLastInstantiationPoint) {
+              if (!instantiationPoint)
+                instantiationPoint = boxSrc.origin.instantiationPoint;
+            } else {
+              instantiationPoint = boxSrc.origin.instantiationPoint;
+            }
+
             global = llvm::dyn_cast<mlir::SymbolRefAttr>(boxSrc.origin.u);
             if (global) {
               type = SourceKind::Global;
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 103e736accca0..038ceb1d4bd75 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -206,6 +206,10 @@ void createDefaultFIROptimizerPassPipeline(mlir::PassManager &pm,
   pm.addPass(fir::createSimplifyRegionLite());
   pm.addPass(mlir::createCSEPass());
 
+  // Run LICM after CSE, which may reduce the number of operations to hoist.
+  if (pc.OptLevel.isOptimizingForSpeed())
+    pm.addPass(fir::createLoopInvariantCodeMotion());
+
   // Polymorphic types
   pm.addPass(fir::createPolymorphicOpConversion());
   pm.addPass(fir::createAssumedRankOpConversion());
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 619f3adc67c85..fad6f34f478ba 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -1,43 +1,44 @@
 add_flang_library(FIRTransforms
   AbstractResult.cpp
   AddAliasTags.cpp
-  AffinePromotion.cpp
+  AddDebugInfo.cpp
   AffineDemotion.cpp
+  AffinePromotion.cpp
+  AlgebraicSimplification.cpp
   AnnotateConstant.cpp
+  ArrayValueCopy.cpp
   AssumedRankOpConversion.cpp
-  CharacterConversion.cpp
-  CompilerGeneratedNames.cpp
-  ConstantArgumentGlobalisation.cpp
-  ControlFlowConverter.cpp
   CUDA/CUFAllocationConversion.cpp
   CUFAddConstructor.cpp
+  CUFComputeSharedMemoryOffsetsAndSize.cpp
   CUFDeviceGlobal.cpp
-  CUFOpConversion.cpp
   CUFGPUToLLVMConversion.cpp
-  CUFComputeSharedMemoryOffsetsAndSize.cpp
-  ArrayValueCopy.cpp
+  CUFOpConversion.cpp
+  CharacterConversion.cpp
+  CompilerGeneratedNames.cpp
+  ConstantArgumentGlobalisation.cpp
+  ControlFlowConverter.cpp
+  ConvertComplexPow.cpp
+  DebugTypeGenerator.cpp
   ExternalNameConversion.cpp
   FIRToSCF.cpp
-  MemoryUtils.cpp
-  MemoryAllocation.cpp
-  StackArrays.cpp
+  FunctionAttr.cpp
+  GenRuntimeCallsForTest.cpp
+  LoopInvariantCodeMotion.cpp
+  LoopVersioning.cpp
+  MIFOpConversion.cpp
   MemRefDataFlowOpt.cpp
-  SimplifyRegionLite.cpp
-  AlgebraicSimplification.cpp
-  SimplifyIntrinsics.cpp
-  AddDebugInfo.cpp
+  MemoryAllocation.cpp
+  MemoryUtils.cpp
+  OptimizeArrayRepacking.cpp
   PolymorphicOpConversion.cpp
-  LoopVersioning.cpp
-  StackReclaim.cpp
-  VScaleAttr.cpp
-  FunctionAttr.cpp
-  DebugTypeGenerator.cpp
   SetRuntimeCallAttributes.cpp
-  GenRuntimeCallsForTest.cpp
   SimplifyFIROperations.cpp
-  OptimizeArrayRepacking.cpp
-  ConvertComplexPow.cpp
-  MIFOpConversion.cpp
+  SimplifyIntrinsics.cpp
+  SimplifyRegionLite.cpp
+  StackArrays.cpp
+  StackReclaim.cpp
+  VScaleAttr.cpp
 
   DEPENDS
   CUFAttrs
@@ -63,12 +64,14 @@ add_flang_library(FIRTransforms
 
   MLIR_LIBS
   MLIRAffineUtils
+  MLIRAnalysis
   MLIRFuncDialect
   MLIRGPUDialect
-  MLIRLLVMDialect
   MLIRLLVMCommonConversion
+  MLIRLLVMDialect
   MLIRMathTransforms
   MLIROpenACCDialect
   MLIROpenACCToLLVMIRTranslation
   MLIROpenMPDialect
+  MLIRTransformUtils
 )
diff --git a/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
new file mode 100644
index 0000000000000..c033d5e278c8d
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
@@ -0,0 +1,233 @@
+//===- LoopInvariantCodeMotion.cpp ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// FIR-specific Loop Invariant Code Motion pass.
+/// The pass relies on FIR types and interfaces to prove the safety
+/// of hoisting invariant operations out of loop-like operations.
+/// It may be run on both HLFIR and FIR representations.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Analysis/AliasAnalysis.h"
+#include "flang/Optimizer/Dialect/FortranVariableInterface.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/LoopInvariantCodeMotionUtils.h"
+#include "llvm/Support/DebugLog.h"
+
+namespace fir {
+#define GEN_PASS_DEF_LOOPINVARIANTCODEMOTION
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "flang-licm"
+
+// Temporary engineering option for triaging LICM.
+static llvm::cl::opt<bool> disableFlangLICM(
+    "disable-flang-licm", llvm::cl::init(false), llvm::cl::Hidden,
+    llvm::cl::desc("Disable Flang's loop invariant code motion"));
+
+namespace {
+
+using namespace mlir;
+
+/// The pass tries to hoist loop invariant operations with only
+/// MemoryEffects::Read effects (MemoryEffects::Write support
+/// may be added later).
+/// The safety of hoisting is proven by:
+///   * Proving that the loop runs at least one iteration.
+///   * Proving that is is always safe to load from this location
+///     (see isSafeToHoistLoad() comments below).
+struct LoopInvariantCodeMotion
+    : fir::impl::LoopInvariantCodeMotionBase<LoopInvariantCodeMotion> {
+  void runOnOperation() override;
+};
+
+} // namespace
+
+/// 'location' is a memory reference used by a memory access.
+/// The type of 'location' defines the data type of the access
+/// (e.g. it is considered to be invalid to access 'i64'
+/// data using '!fir.ref<i32>`).
+/// For the given location, this function returns true iff
+/// the Fortran object being accessed is a scalar that
+/// may not be OPTIONAL.
+///
+/// Note that the '!fir.ref<!fir.box<>>' accesses are considered
+/// to be scalar, even if the underlying data is an array.
+///
+/// Note that an access of '!fir.ref<scalar>' may access
+/// an array object. For example:
+///   real :: x(:)
+///   do i=...
+///     = x(10)
+/// 'x(10)' accesses array 'x', and it may be unsafe to hoist
+/// it without proving that '10' is a valid index for the array.
+/// The fact that 'x' is not OPTIONAL does not allow hoisting
+/// on its own.
+static bool isNonOptionalScalar(Value location) {
+  while (true) {
+    LDBG() << "Checking location:\n" << location;
+    Type dataType = fir::unwrapRefType(location.getType());
+    if (!isa<fir::BaseBoxType>(location.getType()) &&
+        (!dataType ||
+         (!isa<fir::BaseBoxType>(dataType) && !fir::isa_trivial(dataType) &&
+          !fir::isa_derived(dataType)))) {
+      LDBG() << "Failure: data access is not scalar";
+      return false;
+    }
+    Operation *defOp = location.getDefiningOp();
+    if (!defOp) {
+      LDBG() << "Failure: no defining operation";
+      return false;
+    }
+    if (auto varIface = dyn_cast<fir::FortranVariableOpInterface>(defOp)) {
+      bool result = !varIface.isOptional();
+      if (result)
+        LDBG() << "Success: is non optional scalar";
+      else
+        LDBG() << "Failure: is not non optional scalar";
+      return result;
+    }
+    if (auto viewIface = dyn_cast<fir::FortranObjectViewOpInterface>(defOp)) {
+      location = viewIface.getViewSource(cast<OpResult>(location));
+    } else {
+      LDBG() << "Failure: unknown operation:\n" << *defOp;
+      return false;
+    }
+  }
+}
+
+/// Returns true iff it is safe to hoist the given load-like operation 'op',
+/// which access given memory 'locations', out of the operation 'loopLike'.
+/// The current safety conditions are:
+///   * The loop runs at least one iteration, OR
+///   * all the accessed locations are inside scalar non-OPTIONAL
+///     Fortran objects (Fortran descriptors are considered to be scalars).
+static bool isSafeToHoistLoad(Operation *op, ArrayRef<Value> locations,
+                              LoopLikeOpInterface loopLike,
+                              AliasAnalysis &aliasAnalysis) {
+  for (Value location : locations)
+    if (aliasAnalysis.getModRef(loopLike.getOperation(), location)
+            .isModAndRef()) {
+      LDBG() << "Failure: reads location:\n"
+             << location << "\nwhich is modified inside the loop";
+      return false;
+    }
+
+  // Check that it is safe to read from all the locations before the loop.
+  std::optional<llvm::APInt> tripCount = loopLike.getStaticTripCount();
+  if (tripCount && !tripCount->isZero()) {
+    // Loop executes at least one iteration, so it is safe to hoist.
+    LDBG() << "Success: loop has non-zero iterations";
+    return true;
+  }
+
+  // Check whether the access must always be valid.
+  return llvm::all_of(
+      locations, [&](Value location) { return isNonOptionalScalar(location); });
+  // TODO: consider hoisting under condition of the loop's trip count
+  // being non-zero.
+}
+
+/// Returns true iff the given 'op' is a load-like operation,
+/// and it can be hoisted out of 'loopLike' operation.
+static bool canHoistLoad(Operation *op, LoopLikeOpInterface loopLike,
+                         AliasAnalysis &aliasAnalysis) {
+  LDBG() << "Checking operation:\n" << *op;
+  if (auto effectInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
+    SmallVector<MemoryEffects::EffectInstance> effects;
+    effectInterface.getEffects(effects);
+    if (effects.empty()) {
+      LDBG() << "Failure: not a load";
+      return false;
+    }
+    llvm::SetVector<Value> locations;
+    for (const MemoryEffects::EffectInstance &effect : effects) {
+      Value location = effect.getValue();
+      if (!isa<MemoryEffects::Read>(effect.getEffect())) {
+        LDBG() << "Failure: has unsupported effects";
+        return false;
+      } else if (!location) {
+        LDBG() << "Failure: reads from unknown location";
+        return false;
+      }
+      locations.insert(location);
+    }
+    return isSafeToHoistLoad(op, locations.getArrayRef(), loopLike,
+                             aliasAnalysis);
+  }
+  LDBG() << "Failure: has unknown effects";
+  return false;
+}
+
+void LoopInvariantCodeMotion::runOnOperation() {
+  if (disableFlangLICM) {
+    LDBG() << "Skipping [HL]FIR LoopInvariantCodeMotion()";
+    return;
+  }
+
+  LDBG() << "Enter [HL]FIR LoopInvariantCodeMotion()";
+
+  auto &aliasAnalysis = getAnalysis<AliasAnalysis>();
+  aliasAnalysis.addAnalysisImplementation(fir::AliasAnalysis{});
+
+  std::function<bool(Operation *, LoopLikeOpInterface loopLike)>
+      shouldMoveOutOfLoop = [&](Operation *op, LoopLikeOpInterface loopLike) {
+        if (isPure(op)) {
+          LDBG() << "Pure operation: " << *op;
+          return true;
+        }
+
+        // Handle RecursivelySpeculatable operations that have
+        // RecursiveMemoryEffects by checking if all their
+        // nested operations can be hoisted.
+        auto iface = dyn_cast<ConditionallySpeculatable>(op);
+        if (iface && iface.getSpeculatability() ==
+                         Speculation::RecursivelySpeculatable) {
+          if (op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
+            LDBG() << "Checking recursive operation:\n" << *op;
+            llvm::SmallVector<Operation *> nestedOps;
+            for (Region &region : op->getRegions())
+              for (Block &block : region)
+                for (Operation &nestedOp : block)
+                  nestedOps.push_back(&nestedOp);
+
+            bool result = llvm::all_of(nestedOps, [&](Operation *nestedOp) {
+              return shouldMoveOutOfLoop(nestedOp, loopLike);
+            });
+            LDBG() << "Recursive operation can" << (result ? "" : "not")
+                   << " be hoisted";
+
+            // If nested operations cannot be hoisted, there is nothing
+            // else to check. Also if the operation itself does not have
+            // any memory effects, we can return the result now.
+            // Otherwise, we have to check the operation itself below.
+            if (!result || !isa<MemoryEffectOpInterface>(op))
+              return result;
+          }
+        }
+        return canHoistLoad(op, loopLike, aliasAnalysis);
+      };
+
+  getOperation()->walk([&](LoopLikeOpInterface loopLike) {
+    moveLoopInvariantCode(
+        loopLike.getLoopRegions(),
+        /*isDefinedOutsideRegion=*/
+        [&](Value value, Region *) {
+          return loopLike.isDefinedOutsideOfLoop(value);
+        },
+        /*shouldMoveOutOfRegion=*/
+        [&](Operation *op, Region *) {
+          return shouldMoveOutOfLoop(op, loopLike);
+        },
+        /*moveOutOfRegion=*/
+        [&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
+  });
+
+  LDBG() << "Exit [HL]FIR LoopInvariantCodeMotion()";
+}
diff --git a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
index ce7d9b1700bf7..953f1a2545781 100755
--- a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
+++ b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
@@ -7,11 +7,14 @@
  This will insert mod ref test hook:
    - to any fir.call to a function which name starts with "test_effect_"
    - to any hlfir.declare for variable which name starts with "test_var_"
+   - to any fir.box_addr - they are assigned box_addr_# hooks
 """
 
 import sys
 import re
 
+box_addr_counter=0
+
 for line in sys.stdin:
     line = re.sub(
         r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ",
@@ -23,4 +26,11 @@
         r'\1\2", test.ptr ="\2"',
         line,
     )
+    line, count = re.subn(
+        r'(fir.box_addr.*) :',
+        rf'\1 {{test.ptr ="box_addr_{box_addr_counter}"}} :',
+        line,
+    )
+    if count > 0:
+        box_addr_counter += 1
     sys.stdout.write(line)
diff --git a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
index a4c57cff70927..a81f266b87979 100644
--- a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
+++ b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
@@ -20,7 +20,7 @@ subroutine test_dummy(test_var_x)
   use somemod, only : may_capture
   implicit none
   real :: test_var_x
-  ! Capture is invalid after the call because test_var_xsaved does not have the
+  ! Capture is invalid after the call because test_var_x does not have the
   ! target attribute.
   call may_capture(test_var_x)
   call test_effect_external()
@@ -50,4 +50,61 @@ subroutine test_dummy_pointer(p)
   end associate
 end subroutine
 ! CHECK-LABEL: Testing : "_QPtest_dummy_pointer"
-! CHECK: test_effect_external -> test_var_p_target#0: ModRef
+! CHECK-DAG: test_effect_external ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 24, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Slava Zakharin (vzakhari)

Changes

The new pass allows hoisting some fir.load operations early
in MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as llvm.memcpy into temporary descriptors),
given that proper TBAA information is generated by Flang.

Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.

This patch also contains improvements for FIR mod-ref analysis:
We may recurse into HasRecursiveMemoryEffects operations and
use getModRef recursively to get more precise results for
regions with fir.call operations.

This patch also modifies AliasAnalysis to set the instantiation
point for cases where the tracked data is accessed through a load
from !fir.ref&lt;!fir.box&lt;&gt;&gt;: without this change the mod-ref
analysis was not able to recognize user pointer/allocatable variables.


Patch is 198.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173438.diff

16 Files Affected:

  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+11)
  • (modified) flang/lib/Optimizer/Analysis/AliasAnalysis.cpp (+34-24)
  • (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+4)
  • (modified) flang/lib/Optimizer/Transforms/CMakeLists.txt (+28-25)
  • (added) flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp (+233)
  • (modified) flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py (+10)
  • (modified) flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 (+59-2)
  • (modified) flang/test/Analysis/AliasAnalysis/modref-call-globals.f90 (+22)
  • (added) flang/test/Analysis/AliasAnalysis/modref-call-recursive.fir (+62)
  • (modified) flang/test/Driver/bbc-mlir-pass-pipeline.f90 (+3)
  • (modified) flang/test/Driver/mlir-pass-pipeline.f90 (+3)
  • (modified) flang/test/Fir/basic-program.fir (+3)
  • (modified) flang/test/Lower/array-expression-assumed-size.f90 (+164-159)
  • (modified) flang/test/Lower/array-temp.f90 (+262-250)
  • (modified) flang/test/Lower/vector-subscript-io.f90 (+5-5)
  • (added) flang/test/Transforms/licm.fir (+1326)
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index f50202784e2dc..b6b78ec7b21b0 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -596,4 +596,15 @@ def MIFOpConversion : Pass<"mif-convert", "mlir::ModuleOp"> {
   let dependentDialects = ["fir::FIROpsDialect", "mlir::LLVM::LLVMDialect"];
 }
 
+def LoopInvariantCodeMotion : Pass<"flang-licm", "::mlir::func::FuncOp"> {
+  let summary = "Hoist invariants from loops";
+  let description = [{
+    Hoist invariants from loops. This is a FIR-specific version of loop
+    invariant code motion, which relies on FIR types, operations (such as
+    fir.declare) and interfaces such as FortranObjectViewOpInterface.
+    The pass only moves existing operations, so there are no dependent
+    dialects.
+  }];
+}
+
 #endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
index d2656fa7b8ea8..0dc9dda3aa0f5 100644
--- a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+++ b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
@@ -468,7 +468,8 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
   // TODO: limit to Fortran functions??
   // 1. Detect variables that can be accessed indirectly.
   fir::AliasAnalysis aliasAnalysis;
-  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  fir::AliasAnalysis::Source varSrc =
+      aliasAnalysis.getSource(var, /*getLastInstantiationPoint=*/true);
   // If the variable is not a user variable, we cannot safely assume that
   // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
   // be placed in an allocatable/pointer descriptor and escape).
@@ -498,6 +499,7 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
   // At that stage, it has been ruled out that local (including the saved ones)
   // and dummy cannot be indirectly accessed in the call.
   if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+      varSrc.kind != fir::AliasAnalysis::SourceKind::Argument &&
       !varSrc.isDummyArgument()) {
     if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
         !isSavedLocal(varSrc))
@@ -523,19 +525,36 @@ static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
 /// flow analysis to come 2) Allocate and Free effects are considered
 /// modifying
 ModRefResult AliasAnalysis::getModRef(Operation *op, Value location) {
-  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
-  if (!interface) {
-    if (auto call = llvm::dyn_cast<fir::CallOp>(op))
-      return getCallModRef(call, location);
-    return ModRefResult::getModAndRef();
-  }
+  if (auto call = llvm::dyn_cast<fir::CallOp>(op))
+    return getCallModRef(call, location);
 
   // Build a ModRefResult by merging the behavior of the effects of this
   // operation.
+  ModRefResult result = ModRefResult::getNoModRef();
+  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
+  if (op->hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
+    for (mlir::Region &region : op->getRegions()) {
+      result = result.merge(getModRef(region, location));
+      if (result.isModAndRef())
+        break;
+    }
+
+    // In MLIR, RecursiveMemoryEffects can be combined with
+    // MemoryEffectOpInterface to describe extra effects on top of the
+    // effects of the nested operations.  However, the presence of
+    // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
+    // implies the operation has no other memory effects than the one of its
+    // nested operations.
+    if (!interface)
+      return result;
+  }
+
+  if (!interface || result.isModAndRef())
+    return ModRefResult::getModAndRef();
+
   SmallVector<MemoryEffects::EffectInstance> effects;
   interface.getEffects(effects);
 
-  ModRefResult result = ModRefResult::getNoModRef();
   for (const MemoryEffects::EffectInstance &effect : effects) {
 
     // Check for an alias between the effect and our memory location.
@@ -563,22 +582,6 @@ ModRefResult AliasAnalysis::getModRef(mlir::Region &region,
                                       mlir::Value location) {
   ModRefResult result = ModRefResult::getNoModRef();
   for (mlir::Operation &op : region.getOps()) {
-    if (op.hasTrait<mlir::OpTrait::HasRecursiveMemoryEffects>()) {
-      for (mlir::Region &subRegion : op.getRegions()) {
-        result = result.merge(getModRef(subRegion, location));
-        // Fast return is already mod and ref.
-        if (result.isModAndRef())
-          return result;
-      }
-      // In MLIR, RecursiveMemoryEffects can be combined with
-      // MemoryEffectOpInterface to describe extra effects on top of the
-      // effects of the nested operations.  However, the presence of
-      // RecursiveMemoryEffects and the absence of MemoryEffectOpInterface
-      // implies the operation has no other memory effects than the one of its
-      // nested operations.
-      if (!mlir::isa<mlir::MemoryEffectOpInterface>(op))
-        continue;
-    }
     result = result.merge(getModRef(&op, location));
     if (result.isModAndRef())
       return result;
@@ -674,6 +677,13 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v,
             isCapturedInInternalProcedure |=
                 boxSrc.isCapturedInInternalProcedure;
 
+            if (getLastInstantiationPoint) {
+              if (!instantiationPoint)
+                instantiationPoint = boxSrc.origin.instantiationPoint;
+            } else {
+              instantiationPoint = boxSrc.origin.instantiationPoint;
+            }
+
             global = llvm::dyn_cast<mlir::SymbolRefAttr>(boxSrc.origin.u);
             if (global) {
               type = SourceKind::Global;
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index 103e736accca0..038ceb1d4bd75 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -206,6 +206,10 @@ void createDefaultFIROptimizerPassPipeline(mlir::PassManager &pm,
   pm.addPass(fir::createSimplifyRegionLite());
   pm.addPass(mlir::createCSEPass());
 
+  // Run LICM after CSE, which may reduce the number of operations to hoist.
+  if (pc.OptLevel.isOptimizingForSpeed())
+    pm.addPass(fir::createLoopInvariantCodeMotion());
+
   // Polymorphic types
   pm.addPass(fir::createPolymorphicOpConversion());
   pm.addPass(fir::createAssumedRankOpConversion());
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 619f3adc67c85..fad6f34f478ba 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -1,43 +1,44 @@
 add_flang_library(FIRTransforms
   AbstractResult.cpp
   AddAliasTags.cpp
-  AffinePromotion.cpp
+  AddDebugInfo.cpp
   AffineDemotion.cpp
+  AffinePromotion.cpp
+  AlgebraicSimplification.cpp
   AnnotateConstant.cpp
+  ArrayValueCopy.cpp
   AssumedRankOpConversion.cpp
-  CharacterConversion.cpp
-  CompilerGeneratedNames.cpp
-  ConstantArgumentGlobalisation.cpp
-  ControlFlowConverter.cpp
   CUDA/CUFAllocationConversion.cpp
   CUFAddConstructor.cpp
+  CUFComputeSharedMemoryOffsetsAndSize.cpp
   CUFDeviceGlobal.cpp
-  CUFOpConversion.cpp
   CUFGPUToLLVMConversion.cpp
-  CUFComputeSharedMemoryOffsetsAndSize.cpp
-  ArrayValueCopy.cpp
+  CUFOpConversion.cpp
+  CharacterConversion.cpp
+  CompilerGeneratedNames.cpp
+  ConstantArgumentGlobalisation.cpp
+  ControlFlowConverter.cpp
+  ConvertComplexPow.cpp
+  DebugTypeGenerator.cpp
   ExternalNameConversion.cpp
   FIRToSCF.cpp
-  MemoryUtils.cpp
-  MemoryAllocation.cpp
-  StackArrays.cpp
+  FunctionAttr.cpp
+  GenRuntimeCallsForTest.cpp
+  LoopInvariantCodeMotion.cpp
+  LoopVersioning.cpp
+  MIFOpConversion.cpp
   MemRefDataFlowOpt.cpp
-  SimplifyRegionLite.cpp
-  AlgebraicSimplification.cpp
-  SimplifyIntrinsics.cpp
-  AddDebugInfo.cpp
+  MemoryAllocation.cpp
+  MemoryUtils.cpp
+  OptimizeArrayRepacking.cpp
   PolymorphicOpConversion.cpp
-  LoopVersioning.cpp
-  StackReclaim.cpp
-  VScaleAttr.cpp
-  FunctionAttr.cpp
-  DebugTypeGenerator.cpp
   SetRuntimeCallAttributes.cpp
-  GenRuntimeCallsForTest.cpp
   SimplifyFIROperations.cpp
-  OptimizeArrayRepacking.cpp
-  ConvertComplexPow.cpp
-  MIFOpConversion.cpp
+  SimplifyIntrinsics.cpp
+  SimplifyRegionLite.cpp
+  StackArrays.cpp
+  StackReclaim.cpp
+  VScaleAttr.cpp
 
   DEPENDS
   CUFAttrs
@@ -63,12 +64,14 @@ add_flang_library(FIRTransforms
 
   MLIR_LIBS
   MLIRAffineUtils
+  MLIRAnalysis
   MLIRFuncDialect
   MLIRGPUDialect
-  MLIRLLVMDialect
   MLIRLLVMCommonConversion
+  MLIRLLVMDialect
   MLIRMathTransforms
   MLIROpenACCDialect
   MLIROpenACCToLLVMIRTranslation
   MLIROpenMPDialect
+  MLIRTransformUtils
 )
diff --git a/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
new file mode 100644
index 0000000000000..c033d5e278c8d
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
@@ -0,0 +1,233 @@
+//===- LoopInvariantCodeMotion.cpp ----------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// FIR-specific Loop Invariant Code Motion pass.
+/// The pass relies on FIR types and interfaces to prove the safety
+/// of hoisting invariant operations out of loop-like operations.
+/// It may be run on both HLFIR and FIR representations.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Analysis/AliasAnalysis.h"
+#include "flang/Optimizer/Dialect/FortranVariableInterface.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/LoopInvariantCodeMotionUtils.h"
+#include "llvm/Support/DebugLog.h"
+
+namespace fir {
+#define GEN_PASS_DEF_LOOPINVARIANTCODEMOTION
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "flang-licm"
+
+// Temporary engineering option for triaging LICM.
+static llvm::cl::opt<bool> disableFlangLICM(
+    "disable-flang-licm", llvm::cl::init(false), llvm::cl::Hidden,
+    llvm::cl::desc("Disable Flang's loop invariant code motion"));
+
+namespace {
+
+using namespace mlir;
+
+/// The pass tries to hoist loop invariant operations with only
+/// MemoryEffects::Read effects (MemoryEffects::Write support
+/// may be added later).
+/// The safety of hoisting is proven by:
+///   * Proving that the loop runs at least one iteration.
+///   * Proving that is is always safe to load from this location
+///     (see isSafeToHoistLoad() comments below).
+struct LoopInvariantCodeMotion
+    : fir::impl::LoopInvariantCodeMotionBase<LoopInvariantCodeMotion> {
+  void runOnOperation() override;
+};
+
+} // namespace
+
+/// 'location' is a memory reference used by a memory access.
+/// The type of 'location' defines the data type of the access
+/// (e.g. it is considered to be invalid to access 'i64'
+/// data using '!fir.ref<i32>`).
+/// For the given location, this function returns true iff
+/// the Fortran object being accessed is a scalar that
+/// may not be OPTIONAL.
+///
+/// Note that the '!fir.ref<!fir.box<>>' accesses are considered
+/// to be scalar, even if the underlying data is an array.
+///
+/// Note that an access of '!fir.ref<scalar>' may access
+/// an array object. For example:
+///   real :: x(:)
+///   do i=...
+///     = x(10)
+/// 'x(10)' accesses array 'x', and it may be unsafe to hoist
+/// it without proving that '10' is a valid index for the array.
+/// The fact that 'x' is not OPTIONAL does not allow hoisting
+/// on its own.
+static bool isNonOptionalScalar(Value location) {
+  while (true) {
+    LDBG() << "Checking location:\n" << location;
+    Type dataType = fir::unwrapRefType(location.getType());
+    if (!isa<fir::BaseBoxType>(location.getType()) &&
+        (!dataType ||
+         (!isa<fir::BaseBoxType>(dataType) && !fir::isa_trivial(dataType) &&
+          !fir::isa_derived(dataType)))) {
+      LDBG() << "Failure: data access is not scalar";
+      return false;
+    }
+    Operation *defOp = location.getDefiningOp();
+    if (!defOp) {
+      LDBG() << "Failure: no defining operation";
+      return false;
+    }
+    if (auto varIface = dyn_cast<fir::FortranVariableOpInterface>(defOp)) {
+      bool result = !varIface.isOptional();
+      if (result)
+        LDBG() << "Success: is non optional scalar";
+      else
+        LDBG() << "Failure: is not non optional scalar";
+      return result;
+    }
+    if (auto viewIface = dyn_cast<fir::FortranObjectViewOpInterface>(defOp)) {
+      location = viewIface.getViewSource(cast<OpResult>(location));
+    } else {
+      LDBG() << "Failure: unknown operation:\n" << *defOp;
+      return false;
+    }
+  }
+}
+
+/// Returns true iff it is safe to hoist the given load-like operation 'op',
+/// which access given memory 'locations', out of the operation 'loopLike'.
+/// The current safety conditions are:
+///   * The loop runs at least one iteration, OR
+///   * all the accessed locations are inside scalar non-OPTIONAL
+///     Fortran objects (Fortran descriptors are considered to be scalars).
+static bool isSafeToHoistLoad(Operation *op, ArrayRef<Value> locations,
+                              LoopLikeOpInterface loopLike,
+                              AliasAnalysis &aliasAnalysis) {
+  for (Value location : locations)
+    if (aliasAnalysis.getModRef(loopLike.getOperation(), location)
+            .isModAndRef()) {
+      LDBG() << "Failure: reads location:\n"
+             << location << "\nwhich is modified inside the loop";
+      return false;
+    }
+
+  // Check that it is safe to read from all the locations before the loop.
+  std::optional<llvm::APInt> tripCount = loopLike.getStaticTripCount();
+  if (tripCount && !tripCount->isZero()) {
+    // Loop executes at least one iteration, so it is safe to hoist.
+    LDBG() << "Success: loop has non-zero iterations";
+    return true;
+  }
+
+  // Check whether the access must always be valid.
+  return llvm::all_of(
+      locations, [&](Value location) { return isNonOptionalScalar(location); });
+  // TODO: consider hoisting under condition of the loop's trip count
+  // being non-zero.
+}
+
+/// Returns true iff the given 'op' is a load-like operation,
+/// and it can be hoisted out of 'loopLike' operation.
+static bool canHoistLoad(Operation *op, LoopLikeOpInterface loopLike,
+                         AliasAnalysis &aliasAnalysis) {
+  LDBG() << "Checking operation:\n" << *op;
+  if (auto effectInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
+    SmallVector<MemoryEffects::EffectInstance> effects;
+    effectInterface.getEffects(effects);
+    if (effects.empty()) {
+      LDBG() << "Failure: not a load";
+      return false;
+    }
+    llvm::SetVector<Value> locations;
+    for (const MemoryEffects::EffectInstance &effect : effects) {
+      Value location = effect.getValue();
+      if (!isa<MemoryEffects::Read>(effect.getEffect())) {
+        LDBG() << "Failure: has unsupported effects";
+        return false;
+      } else if (!location) {
+        LDBG() << "Failure: reads from unknown location";
+        return false;
+      }
+      locations.insert(location);
+    }
+    return isSafeToHoistLoad(op, locations.getArrayRef(), loopLike,
+                             aliasAnalysis);
+  }
+  LDBG() << "Failure: has unknown effects";
+  return false;
+}
+
+void LoopInvariantCodeMotion::runOnOperation() {
+  if (disableFlangLICM) {
+    LDBG() << "Skipping [HL]FIR LoopInvariantCodeMotion()";
+    return;
+  }
+
+  LDBG() << "Enter [HL]FIR LoopInvariantCodeMotion()";
+
+  auto &aliasAnalysis = getAnalysis<AliasAnalysis>();
+  aliasAnalysis.addAnalysisImplementation(fir::AliasAnalysis{});
+
+  std::function<bool(Operation *, LoopLikeOpInterface loopLike)>
+      shouldMoveOutOfLoop = [&](Operation *op, LoopLikeOpInterface loopLike) {
+        if (isPure(op)) {
+          LDBG() << "Pure operation: " << *op;
+          return true;
+        }
+
+        // Handle RecursivelySpeculatable operations that have
+        // RecursiveMemoryEffects by checking if all their
+        // nested operations can be hoisted.
+        auto iface = dyn_cast<ConditionallySpeculatable>(op);
+        if (iface && iface.getSpeculatability() ==
+                         Speculation::RecursivelySpeculatable) {
+          if (op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
+            LDBG() << "Checking recursive operation:\n" << *op;
+            llvm::SmallVector<Operation *> nestedOps;
+            for (Region &region : op->getRegions())
+              for (Block &block : region)
+                for (Operation &nestedOp : block)
+                  nestedOps.push_back(&nestedOp);
+
+            bool result = llvm::all_of(nestedOps, [&](Operation *nestedOp) {
+              return shouldMoveOutOfLoop(nestedOp, loopLike);
+            });
+            LDBG() << "Recursive operation can" << (result ? "" : "not")
+                   << " be hoisted";
+
+            // If nested operations cannot be hoisted, there is nothing
+            // else to check. Also if the operation itself does not have
+            // any memory effects, we can return the result now.
+            // Otherwise, we have to check the operation itself below.
+            if (!result || !isa<MemoryEffectOpInterface>(op))
+              return result;
+          }
+        }
+        return canHoistLoad(op, loopLike, aliasAnalysis);
+      };
+
+  getOperation()->walk([&](LoopLikeOpInterface loopLike) {
+    moveLoopInvariantCode(
+        loopLike.getLoopRegions(),
+        /*isDefinedOutsideRegion=*/
+        [&](Value value, Region *) {
+          return loopLike.isDefinedOutsideOfLoop(value);
+        },
+        /*shouldMoveOutOfRegion=*/
+        [&](Operation *op, Region *) {
+          return shouldMoveOutOfLoop(op, loopLike);
+        },
+        /*moveOutOfRegion=*/
+        [&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
+  });
+
+  LDBG() << "Exit [HL]FIR LoopInvariantCodeMotion()";
+}
diff --git a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
index ce7d9b1700bf7..953f1a2545781 100755
--- a/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
+++ b/flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
@@ -7,11 +7,14 @@
  This will insert mod ref test hook:
    - to any fir.call to a function which name starts with "test_effect_"
    - to any hlfir.declare for variable which name starts with "test_var_"
+   - to any fir.box_addr - they are assigned box_addr_# hooks
 """
 
 import sys
 import re
 
+box_addr_counter=0
+
 for line in sys.stdin:
     line = re.sub(
         r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ",
@@ -23,4 +26,11 @@
         r'\1\2", test.ptr ="\2"',
         line,
     )
+    line, count = re.subn(
+        r'(fir.box_addr.*) :',
+        rf'\1 {{test.ptr ="box_addr_{box_addr_counter}"}} :',
+        line,
+    )
+    if count > 0:
+        box_addr_counter += 1
     sys.stdout.write(line)
diff --git a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
index a4c57cff70927..a81f266b87979 100644
--- a/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
+++ b/flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
@@ -20,7 +20,7 @@ subroutine test_dummy(test_var_x)
   use somemod, only : may_capture
   implicit none
   real :: test_var_x
-  ! Capture is invalid after the call because test_var_xsaved does not have the
+  ! Capture is invalid after the call because test_var_x does not have the
   ! target attribute.
   call may_capture(test_var_x)
   call test_effect_external()
@@ -50,4 +50,61 @@ subroutine test_dummy_pointer(p)
   end associate
 end subroutine
 ! CHECK-LABEL: Testing : "_QPtest_dummy_pointer"
-! CHECK: test_effect_external -> test_var_p_target#0: ModRef
+! CHECK-DAG: test_effect_external ...
[truncated]

@github-actions
Copy link

github-actions bot commented Dec 24, 2025

✅ With the latest revision this PR passed the Python code formatter.

vzakhari added a commit to vzakhari/llvm-project that referenced this pull request Dec 24, 2025
LICM (llvm#173438) may insert new operations at the beginning of
`fir.do_concurrent`'s block and they cannot be always hoisted
to the alloca-block of the parent operation. This patch
only moves `fir.alloca`s into the alloca-block, and moves
all other operations right before fir.do_concurrent.
@vzakhari
Copy link
Contributor Author

This should be merged after #173502 to avoid new test failures.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting stuff - again!

First pass looking at this, I will have a second look this week.

// An example of something currently problematic is the allocmem generated for
// ALLOCATE of allocatable target. It currently does not have the target
// attribute, which would lead this analysis to believe it cannot escape.
if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the above TODOs:

  • Limit to fortran functions
  • Skip compiler generated temps

Are complete and can be removed?

Not fully related to your patch: I am trying to understand how ready getCallModRef is for use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Tom!

I do not think this TODO is resolved, so the check for isFortranUserVariable should be necessary still.

I would like to look further into isCallToFortranUserProcedure check so that we can handle Fortran runtime calls better (e.g. based on the memroy access attributes that we are able to figure out in SetRuntimeCallAttributesPass).

I beleive there is a room for improvement in the mod-ref analysis, but it already works for some cases that Jean addressed when he initially added the implementation.


// Run LICM after CSE, which may reduce the number of operations to hoist.
if (pc.OptLevel.isOptimizingForSpeed())
pm.addPass(fir::createLoopInvariantCodeMotion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't currently lead to performance improvements, do we need to enable it by default? I'm worried this might be a relatively expensive pass.

I'm curious what the motivation is here more than anything. If there is a plan and you haven't observed serious compile time regressions then I am happy to see this on by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to measure performance, but my expectation is that this patch won't change anything. You have a good point about the compilation time, and I will measure it.

There is one example related to private copies of descriptors in OpenACC, which LICM can help to address (though, I guess this is not the only way to solve it). In general, I would like to be able to optimize code at MLIR level, which otherwise would require more detailed alias-info representation in LLVM IR. In addition, such early MLIR optimizations may help with enabling more MLIR optimizations (like the ones related to Affine transformations).

I agree with you that it is probably too early to enable this by default, and I will move it under an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I'm not really opposed to having this on by default so long as you don't find serious compilation time regressions (I agree this pass looks useful in the long term), but feel free to disable by default for now if you prefer.

vzakhari added a commit that referenced this pull request Dec 30, 2025
LICM (#173438) may insert new operations at the beginning of
`fir.do_concurrent`'s block and they cannot be always hoisted
to the alloca-block of the parent operation. This patch
only moves `fir.alloca`s into the alloca-block, and moves
all other operations right before fir.do_concurrent.
return canHoistLoad(op, loopLike, aliasAnalysis);
};

getOperation()->walk([&](LoopLikeOpInterface loopLike) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you check for mlir::omp::OutlinableOpenMPOpInterface and avoid LICM in that case. These are operations whose body will be outlined (and could be run in a different thread). I'm not sure that we will ever have a LoopLikeOp which is outlinable but it is better to be safe.

Similarly, please could you make sure not to outline operations out of loops and into something that is a mlir::omp::LoopWrapperInterface. The loop wrappers are assumed to contain only a nested loop wrapper or a loop, and a terminator. Any other operations would currently crash the compiler. For example in,

  omp.private {type = private} @_QFtestEi_private_i32 : i32
  func.func @_QPtest() {
    %0 = fir.dummy_scope : !fir.dscope
    %1 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFtestEi"}
    %2:2 = hlfir.declare %1 {uniq_name = "_QFtestEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
    %c1_i32 = arith.constant 1 : i32
    %c100_i32 = arith.constant 100 : i32
    %c1_i32_0 = arith.constant 1 : i32
    omp.wsloop private(@_QFtestEi_private_i32 %2#0 -> %arg0 : !fir.ref<i32>) {
      omp.loop_nest (%arg1) : i32 = (%c1_i32) to (%c100_i32) inclusive step (%c1_i32_0) {
        %3:2 = hlfir.declare %arg0 {uniq_name = "_QFtestEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        hlfir.assign %arg1 to %3#0 : i32, !fir.ref<i32>
        fir.call @_QPdo_something(%3#0) fastmath<contract> : (!fir.ref<i32>) -> ()
        omp.yield
      }
    }
    return
  }
  func.func private @_QPdo_something(!fir.ref<i32>)

The omp.wsloop operation must contain only the omp.loop_nest and the (implicit) terminator. It would be valid to keep going and move to the first non-loop wrapper parent (beware nested loop wrappers). But again, make sure nothing is ever moved out of a outlinable operation.

/// Returns true iff it is safe to hoist the given load-like operation 'op',
/// which access given memory 'locations', out of the operation 'loopLike'.
/// The current safety conditions are:
/// * The loop runs at least one iteration, OR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like

integer, allocatable :: a
integer, pointer :: p
! a is not allocated
! p is not associated
do i=1,2
  if (.false.)
    load_like(a)
    load_like(p)
  endif
enddo

return result;
}
}
return canHoistLoad(op, loopLike, aliasAnalysis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be another PR, but looking at #62419, I think it could be useful to hoist operations with allocation effects too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:driver flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants