Skip to content

Conversation

vzakhari
Copy link
Contributor

This patch changes two things:

  1. We do not need to use the loop counter's last value
    for regular do-loops in Lowering.
  2. The loop counter's increment is implied by fir.do_loop
    operation, so there is no need to increment it explicitly.

The last point has been especially confusing to me, because it was
unclear why we have an explicit increment if it is implied.
It looks like CFGConversion somehow still makes the final code
correct, i.e. the counter is not incremented twice.
Anyway, the new lowering should look more concise.

This patch changes two things:
  1. We do not need to use the loop counter's last value
     for regular do-loops in Lowering.
  2. The loop counter's increment is implied by fir.do_loop
     operation, so there is no need to increment it explicitly.

The last point has been especially confusing to me, because it was
unclear why we have an explicit increment if it is implied.
It looks like CFGConversion somehow still makes the final code
correct, i.e. the counter is not incremented twice.
Anyway, the new lowering should look more concise.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp openacc labels Sep 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-openacc

Author: Slava Zakharin (vzakhari)

Changes

This patch changes two things:

  1. We do not need to use the loop counter's last value
    for regular do-loops in Lowering.
  2. The loop counter's increment is implied by fir.do_loop
    operation, so there is no need to increment it explicitly.

The last point has been especially confusing to me, because it was
unclear why we have an explicit increment if it is implied.
It looks like CFGConversion somehow still makes the final code
correct, i.e. the counter is not incremented twice.
Anyway, the new lowering should look more concise.


Patch is 49.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160630.diff

19 Files Affected:

  • (modified) flang/lib/Lower/Bridge.cpp (+5-9)
  • (modified) flang/lib/Lower/IO.cpp (+3-3)
  • (modified) flang/test/Fir/convert-to-llvm-openmp-and-fir.fir (+6-9)
  • (modified) flang/test/Lower/OpenACC/acc-declare.f90 (+9-9)
  • (modified) flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90 (+8-8)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 (+4-5)
  • (modified) flang/test/Lower/OpenMP/sections-predetermined-private.f90 (+4-4)
  • (modified) flang/test/Lower/OpenMP/shared-loop.f90 (+12-12)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 (+3-4)
  • (modified) flang/test/Lower/OpenMP/wsloop-variable.f90 (+3-4)
  • (modified) flang/test/Lower/do_loop.f90 (+28-35)
  • (modified) flang/test/Lower/do_loop_unstructured.f90 (+1-2)
  • (modified) flang/test/Lower/infinite_loop.f90 (+4-5)
  • (modified) flang/test/Lower/io-implied-do-fixes.f90 (+2-4)
  • (modified) flang/test/Lower/loops.f90 (+2-2)
  • (modified) flang/test/Lower/loops2.f90 (+4-4)
  • (modified) flang/test/Lower/mixed_loops.f90 (+4-5)
  • (modified) flang/test/Lower/nsw.f90 (+1-1)
  • (modified) flang/test/Transforms/OpenMP/simd-only.mlir (+2-2)
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 4a5b9885bb7c4..149e51b501a82 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -2544,7 +2544,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         auto loopOp = fir::DoLoopOp::create(
             *builder, loc, lowerValue, upperValue, stepValue,
             /*unordered=*/false,
-            /*finalCountValue=*/true,
+            /*finalCountValue=*/false,
             builder->createConvert(loc, loopVarType, lowerValue));
         info.loopOp = loopOp;
         builder->setInsertionPointToStart(loopOp.getBody());
@@ -2696,22 +2696,18 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         // Decrement tripVariable.
         auto doLoopOp = mlir::cast<fir::DoLoopOp>(info.loopOp);
         builder->setInsertionPointToEnd(doLoopOp.getBody());
-        llvm::SmallVector<mlir::Value, 2> results;
-        results.push_back(mlir::arith::AddIOp::create(
-            *builder, loc, doLoopOp.getInductionVar(), doLoopOp.getStep(),
-            iofAttr));
         // Step loopVariable to help optimizations such as vectorization.
         // Induction variable elimination will clean up as necessary.
         mlir::Value step = builder->createConvert(
             loc, info.getLoopVariableType(), doLoopOp.getStep());
         mlir::Value loopVar =
             fir::LoadOp::create(*builder, loc, info.loopVariable);
-        results.push_back(
-            mlir::arith::AddIOp::create(*builder, loc, loopVar, step, iofAttr));
-        fir::ResultOp::create(*builder, loc, results);
+        mlir::Value loopVarInc =
+            mlir::arith::AddIOp::create(*builder, loc, loopVar, step, iofAttr);
+        fir::ResultOp::create(*builder, loc, loopVarInc);
         builder->setInsertionPointAfter(doLoopOp);
         // The loop control variable may be used after the loop.
-        fir::StoreOp::create(*builder, loc, doLoopOp.getResult(1),
+        fir::StoreOp::create(*builder, loc, doLoopOp.getResult(0),
                              info.loopVariable);
         continue;
       }
diff --git a/flang/lib/Lower/IO.cpp b/flang/lib/Lower/IO.cpp
index 4ad2ac01334fa..98dc78f625b9e 100644
--- a/flang/lib/Lower/IO.cpp
+++ b/flang/lib/Lower/IO.cpp
@@ -977,9 +977,9 @@ static void genIoLoop(Fortran::lower::AbstractConverter &converter,
     fir::StoreOp::create(builder, loc, lcv, loopVar);
     genItemList(ioImpliedDo);
     builder.setInsertionPointToEnd(doLoopOp.getBody());
-    mlir::Value result = mlir::arith::AddIOp::create(
-        builder, loc, doLoopOp.getInductionVar(), doLoopOp.getStep(), iofAttr);
-    fir::ResultOp::create(builder, loc, result);
+    // fir.do_loop's induction variable's increment is implied,
+    // so we do not need to increment it explicitly.
+    fir::ResultOp::create(builder, loc, doLoopOp.getInductionVar());
     builder.setInsertionPointAfter(doLoopOp);
     // The loop control variable may be used after the loop.
     lcv = builder.createConvert(loc, fir::unwrapRefType(loopVar.getType()),
diff --git a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
index 24e5cad84b709..38d51110bbde3 100644
--- a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+++ b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
@@ -349,7 +349,7 @@ func.func @_QPopenmp_target_data_region() {
     %3 = fir.convert %c1024_i32 : (i32) -> index
     %c1 = arith.constant 1 : index
     %4 = fir.convert %2 : (index) -> i32
-    %5:2 = fir.do_loop %arg0 = %2 to %3 step %c1 iter_args(%arg1 = %4) -> (index, i32) {
+    %5 = fir.do_loop %arg0 = %2 to %3 step %c1 iter_args(%arg1 = %4) -> (i32) {
       fir.store %arg1 to %1 : !fir.ref<i32>
       %6 = fir.load %1 : !fir.ref<i32>
       %7 = fir.load %1 : !fir.ref<i32>
@@ -358,13 +358,12 @@ func.func @_QPopenmp_target_data_region() {
       %9 = arith.subi %8, %c1_i64 : i64
       %10 = fir.coordinate_of %0, %9 : (!fir.ref<!fir.array<1024xi32>>, i64) -> !fir.ref<i32>
       fir.store %6 to %10 : !fir.ref<i32>
-      %11 = arith.addi %arg0, %c1 overflow<nsw> : index
       %12 = fir.convert %c1 : (index) -> i32
       %13 = fir.load %1 : !fir.ref<i32>
       %14 = arith.addi %13, %12 overflow<nsw> : i32
-      fir.result %11, %14 : index, i32
+      fir.result %14 : i32
     }
-    fir.store %5#1 to %1 : !fir.ref<i32>
+    fir.store %5 to %1 : !fir.ref<i32>
     omp.terminator
   }
   return
@@ -404,7 +403,6 @@ func.func @_QPopenmp_target_data_region() {
 // CHECK:             %[[VAL_21:.*]] = llvm.sub %[[VAL_19]], %[[VAL_20]]  : i64
 // CHECK:             %[[VAL_22:.*]] = llvm.getelementptr %[[VAL_1]][0, %[[VAL_21]]] : (!llvm.ptr, i64) -> !llvm.ptr
 // CHECK:             llvm.store %[[VAL_17]], %[[VAL_22]] : i32, !llvm.ptr
-// CHECK:             %[[VAL_23:.*]] = llvm.add %[[VAL_12]], %[[VAL_8]] overflow<nsw> : i64
 // CHECK:             %[[VAL_24:.*]] = llvm.trunc %[[VAL_8]] : i64 to i32
 // CHECK:             %[[VAL_25:.*]] = llvm.load %[[VAL_3]] : !llvm.ptr -> i32
 // CHECK:             %[[VAL_26:.*]] = llvm.add %[[VAL_25]], %[[VAL_24]] overflow<nsw> : i32
@@ -653,18 +651,17 @@ func.func @_QPsb() {
   omp.sections   {
     omp.section {
       %2 = fir.convert %c1 : (index) -> i32
-      %3:2 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %2) -> (index, i32) {
+      %3 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %2) -> (i32) {
         fir.store %arg1 to %0 : !fir.ref<i32>
         %4 = fir.load %1 : !fir.ref<i32>
         %5 = arith.addi %4, %c1_i32 : i32
         fir.store %5 to %1 : !fir.ref<i32>
-        %6 = arith.addi %arg0, %c1 : index
         %7 = fir.convert %c1 : (index) -> i32
         %8 = fir.load %0 : !fir.ref<i32>
         %9 = arith.addi %8, %7 : i32
-        fir.result %6, %9 : index, i32
+        fir.result %9 : i32
       }
-      fir.store %3#1 to %0 : !fir.ref<i32>
+      fir.store %3 to %0 : !fir.ref<i32>
       omp.terminator
     }
     omp.terminator
diff --git a/flang/test/Lower/OpenACC/acc-declare.f90 b/flang/test/Lower/OpenACC/acc-declare.f90
index edae0e6a4d37e..46c4365f23fd6 100644
--- a/flang/test/Lower/OpenACC/acc-declare.f90
+++ b/flang/test/Lower/OpenACC/acc-declare.f90
@@ -20,7 +20,7 @@ subroutine acc_declare_copy()
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ALLOCA]](%{{.*}}) {acc.declare = #acc.declare<dataClause =  acc_copy>, uniq_name = "_QMacc_declareFacc_declare_copyEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[COPYIN:.*]] = acc.copyin varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copy>, name = "a"}
 ! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
 ! CHECK: }
 ! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>)
 ! CHECK: acc.copyout accPtr(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_copy>, name = "a"}
@@ -40,7 +40,7 @@ subroutine acc_declare_create()
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ALLOCA]](%{{.*}}) {acc.declare = #acc.declare<dataClause =  acc_create>, uniq_name = "_QMacc_declareFacc_declare_createEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
 ! CHECK: }
 ! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
 ! CHECK: acc.delete accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_create>, name = "a"}
@@ -60,7 +60,7 @@ subroutine acc_declare_present(a)
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause =  acc_present>, uniq_name = "_QMacc_declareFacc_declare_presentEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[PRESENT:.*]] = acc.present varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 ! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>)
 ! CHECK: acc.delete accPtr(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_present>, name = "a"}
 
@@ -81,7 +81,7 @@ subroutine acc_declare_copyin()
 ! CHECK: %[[COPYIN_A:.*]] = acc.copyin varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: %[[COPYIN_B:.*]] = acc.copyin varPtr(%[[BDECL]]#0 : !fir.ref<!fir.array<10xi32>>) -> !fir.ref<!fir.array<10xi32>> {dataClause = #acc<data_clause acc_copyin_readonly>, name = "b"}
 ! CHECK: acc.declare_enter dataOperands(%[[COPYIN_A]], %[[COPYIN_B]] : !fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<10xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 ! CHECK: acc.delete accPtr(%[[COPYIN_A]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_copyin>, name = "a"}
 ! CHECK: acc.delete accPtr(%[[COPYIN_B]] : !fir.ref<!fir.array<10xi32>>) {dataClause = #acc<data_clause acc_copyin_readonly>, name = "b"}
 
@@ -99,7 +99,7 @@ subroutine acc_declare_copyout()
 ! CHECK: %[[ADECL:.*]]:2 = hlfir.declare %[[A]](%{{.*}}) {acc.declare = #acc.declare<dataClause =  acc_copyout>, uniq_name = "_QMacc_declareFacc_declare_copyoutEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copyout>, name = "a"}
 ! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 ! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
 ! CHECK: acc.copyout accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) {name = "a"}
 ! CHECK: return
@@ -118,7 +118,7 @@ subroutine acc_declare_deviceptr(a)
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause =  acc_deviceptr>, uniq_name = "_QMacc_declareFacc_declare_deviceptrEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[DEVICEPTR:.*]] = acc.deviceptr varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: acc.declare_enter dataOperands(%[[DEVICEPTR]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 
   subroutine acc_declare_link(a)
     integer :: a(100), i
@@ -134,7 +134,7 @@ subroutine acc_declare_link(a)
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause =  acc_declare_link>, uniq_name = "_QMacc_declareFacc_declare_linkEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[LINK:.*]] = acc.declare_link varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: acc.declare_enter dataOperands(%[[LINK]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 
   subroutine acc_declare_device_resident(a)
     integer :: a(100), i
@@ -150,7 +150,7 @@ subroutine acc_declare_device_resident(a)
 ! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause =  acc_declare_device_resident>, uniq_name = "_QMacc_declareFacc_declare_device_residentEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
 ! CHECK: %[[DEVICERES:.*]] = acc.declare_device_resident varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
 ! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
 ! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>)
 ! CHECK: acc.delete accPtr(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_declare_device_resident>, name = "a"}
 
@@ -279,7 +279,7 @@ subroutine acc_declare_multiple_directive(a, b)
 ! CHECK: %[[COPYIN:.*]] = acc.copyin varPtr(%[[DECL_A]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copy>, name = "a"}
 ! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[DECL_B]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copyout>, name = "b"}
 ! CHECK: acc.declare_enter dataOperands(%[[COPYIN]], %[[CREATE]] : !fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
 
 
 ! CHECK: acc.copyout accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[DECL_B]]#0 : !fir.ref<!fir.array<100xi32>>) {name = "b"}
diff --git a/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90 b/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
index 0c0d877a17b00..642b11bcd6b75 100644
--- a/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
+++ b/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
@@ -22,10 +22,10 @@ subroutine sb1
 !CHECK:    %[[I_DECL:.*]]:2 = hlfir.declare %[[I_ADDR]] {uniq_name = "_QFsb1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:    omp.parallel private({{.*}} %[[I_DECL]]#0 -> %[[I_PVT_ADDR:.*]] : {{.*}}) {
 !CHECK:      %[[I_PVT_DECL:.*]]:2 = hlfir.declare %[[I_PVT_ADDR]] {uniq_name = "_QFsb1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK:      %[[I_FINAL_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK:      %[[I_FINAL_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
 !CHECK:        fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:      }
-!CHECK:      fir.store %[[I_FINAL_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK:      fir.store %[[I_FINAL_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:      omp.terminator
 !CHECK:    }
 !CHECK:    return
@@ -58,20 +58,20 @@ subroutine sb2
 
 !CHECK:      %[[I_PVT_DECL:.*]]:2 = hlfir.declare %[[I_PVT_ADDR]] {uniq_name = "_QFsb2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
-!CHECK:      %[[FINAL_J_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[J_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK:      %[[FINAL_J_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[J_VAL:.*]] = %{{.*}}) -> (i32) {
 !CHECK:        fir.store %[[J_VAL]] to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:        fir.if %{{.*}} {
-!CHECK:          %[[FINAL_I_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK:          %[[FINAL_I_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
 !CHECK:            fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:          }
-!CHECK:          fir.store %[[FINAL_I_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK:          fir.store %[[FINAL_I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:        }
-!CHECK:        %[[FINAL_I_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK:        %[[FINAL_I_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
 !CHECK:          fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:        }
-!CHECK:        fir.store %[[FINAL_I_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK:        fir.store %[[FINAL_I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:      }
-!CHECK:      fir.store %[[FINAL_J_VAL]]#1 to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK:      fir.store %[[FINAL_J_VAL]] to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
 !CHECK:      omp.terminator
 !CHECK:    }
 !CHECK:    return
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
index 4d1023925fd88..3bb40834afe4c 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
@@ -59,21 +59,20 @@
 ! CHECK:               %[[VAL_10:.*]] = fir.convert %[[VAL_9]] : (i32) -> index
 ! CHECK:               %[[VAL_11:.*]] = arith.constant 1 : index
 ! CHECK:               %[[LB:.*]] = fir.convert %[[VAL_8]] : (index) -> i32
-! CHECK:               %[[VAL_12:.*]]:2 = fir.do_loop %[[VAL_13:[^ ]*]] =
+! CHECK:               %[[VAL_12:.*]] = fir.do_loop %[[VAL_13:[^ ]*]] =
 ! CHECK-SAME:              %[[VAL_8]] to %[[VAL_10]] step %[[VAL_11]]
-! CHECK-SAME:              iter_args(%[[IV:.*]] = %[[LB]]) -> (index, i32) {
+! CHECK-SAME:              iter_args(%[[IV:.*]] = %[[LB]]) -> (i32) {
 ! CHECK:                 fir.store %[[IV]] to %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[LOAD:.*]] = fir.load %[[PRIV_I_DECL]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_15:.*]] = fir.load %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_16:.*]] = arith.addi %[[LOAD]], %[[VAL_15]] : i32
 ! CHECK:                 hlfir.assign %[[VAL_16]] to %[[PRIV_X_DECL]]#0 : i32, !fir.ref<i32>
-! CHECK:                 %[[VAL_17:.*]] = arith.addi %[[VAL_13]], %[[VAL_11]] overflow<nsw> : index
 ! CHECK:                 %[[STEPCAST:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
 ! CHECK:                 %[[IVLOAD:.*]] = fir.load %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[IVINC:.*]] = arith.addi %[[I...
[truncated]

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for taking the time to update all of those tests

@rscottmanley
Copy link
Contributor

rscottmanley commented Sep 25, 2025

Maybe I'm misreading, but looking at the changes to the test, it looks like we're still yielding the original type instead of the primary iv. Why not yield the primary iv and convert the result after the loop? It's still incrementing the user variable?

@jeanPerier
Copy link
Contributor

Maybe I'm misreading, but looking at the changes to the test, it looks like we're still yielding the original type instead of the primary iv. Why not yield the primary iv and convert the result after the loop? It's still incrementing the user variable?

Slava will know best. My understanding is that using the original variable type helps LLVM vectorizer: af7edf1

@vzakhari
Copy link
Contributor Author

Maybe I'm misreading, but looking at the changes to the test, it looks like we're still yielding the original type instead of the primary iv. Why not yield the primary iv and convert the result after the loop? It's still incrementing the user variable?

Slava will know best. My understanding is that using the original variable type helps LLVM vectorizer: af7edf1

Yes, incrementing the user IV in the user type was crucial for helping LLVM vectorizer. Jean, Scott and I had some discussions about this today.

It seems that exposing the index loop counter too early in the Lowering does not give us a lot of benefit, and the following representation might be better:

      %3 = fir.do_loop %arg0 = %lb_in_user_type to %ub_in_user_type step %step_in_user_type -> (i32) {
        fir.store %arg0 to %user_iv_loc : !fir.ref<i32>
        ...
        fir.result %arg0 : i32
      }
      fir.store %3 to %user_iv_loc : !fir.ref<i32>

The fir.do_loop to CFG conversion, then should introduce an index loop counter to make sure that the loops with the final increment overflowing the IV user-type work properly.

I plan to merge this PR as-is, and I will experiment with the above lowering scheme.

@vzakhari vzakhari merged commit d6e20c4 into llvm:main Sep 26, 2025
14 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
This patch changes two things:
  1. We do not need to use the loop counter's last value
     for regular do-loops in Lowering.
  2. The loop counter's increment is implied by fir.do_loop
     operation, so there is no need to increment it explicitly.

The last point has been especially confusing to me, because it was
unclear why we have an explicit increment if it is implied.
It looks like CFGConversion somehow still makes the final code
correct, i.e. the counter is not incremented twice.
Anyway, the new lowering should look more concise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category openacc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants