-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[flang] Clean-up for fir.do_loop generation in lowering. #160630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch changes two things: 1. We do not need to use the loop counter's last value for regular do-loops in Lowering. 2. The loop counter's increment is implied by fir.do_loop operation, so there is no need to increment it explicitly. The last point has been especially confusing to me, because it was unclear why we have an explicit increment if it is implied. It looks like CFGConversion somehow still makes the final code correct, i.e. the counter is not incremented twice. Anyway, the new lowering should look more concise.
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-openacc Author: Slava Zakharin (vzakhari) ChangesThis patch changes two things:
The last point has been especially confusing to me, because it was Patch is 49.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160630.diff 19 Files Affected:
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 4a5b9885bb7c4..149e51b501a82 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -2544,7 +2544,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
auto loopOp = fir::DoLoopOp::create(
*builder, loc, lowerValue, upperValue, stepValue,
/*unordered=*/false,
- /*finalCountValue=*/true,
+ /*finalCountValue=*/false,
builder->createConvert(loc, loopVarType, lowerValue));
info.loopOp = loopOp;
builder->setInsertionPointToStart(loopOp.getBody());
@@ -2696,22 +2696,18 @@ class FirConverter : public Fortran::lower::AbstractConverter {
// Decrement tripVariable.
auto doLoopOp = mlir::cast<fir::DoLoopOp>(info.loopOp);
builder->setInsertionPointToEnd(doLoopOp.getBody());
- llvm::SmallVector<mlir::Value, 2> results;
- results.push_back(mlir::arith::AddIOp::create(
- *builder, loc, doLoopOp.getInductionVar(), doLoopOp.getStep(),
- iofAttr));
// Step loopVariable to help optimizations such as vectorization.
// Induction variable elimination will clean up as necessary.
mlir::Value step = builder->createConvert(
loc, info.getLoopVariableType(), doLoopOp.getStep());
mlir::Value loopVar =
fir::LoadOp::create(*builder, loc, info.loopVariable);
- results.push_back(
- mlir::arith::AddIOp::create(*builder, loc, loopVar, step, iofAttr));
- fir::ResultOp::create(*builder, loc, results);
+ mlir::Value loopVarInc =
+ mlir::arith::AddIOp::create(*builder, loc, loopVar, step, iofAttr);
+ fir::ResultOp::create(*builder, loc, loopVarInc);
builder->setInsertionPointAfter(doLoopOp);
// The loop control variable may be used after the loop.
- fir::StoreOp::create(*builder, loc, doLoopOp.getResult(1),
+ fir::StoreOp::create(*builder, loc, doLoopOp.getResult(0),
info.loopVariable);
continue;
}
diff --git a/flang/lib/Lower/IO.cpp b/flang/lib/Lower/IO.cpp
index 4ad2ac01334fa..98dc78f625b9e 100644
--- a/flang/lib/Lower/IO.cpp
+++ b/flang/lib/Lower/IO.cpp
@@ -977,9 +977,9 @@ static void genIoLoop(Fortran::lower::AbstractConverter &converter,
fir::StoreOp::create(builder, loc, lcv, loopVar);
genItemList(ioImpliedDo);
builder.setInsertionPointToEnd(doLoopOp.getBody());
- mlir::Value result = mlir::arith::AddIOp::create(
- builder, loc, doLoopOp.getInductionVar(), doLoopOp.getStep(), iofAttr);
- fir::ResultOp::create(builder, loc, result);
+ // fir.do_loop's induction variable's increment is implied,
+ // so we do not need to increment it explicitly.
+ fir::ResultOp::create(builder, loc, doLoopOp.getInductionVar());
builder.setInsertionPointAfter(doLoopOp);
// The loop control variable may be used after the loop.
lcv = builder.createConvert(loc, fir::unwrapRefType(loopVar.getType()),
diff --git a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
index 24e5cad84b709..38d51110bbde3 100644
--- a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+++ b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
@@ -349,7 +349,7 @@ func.func @_QPopenmp_target_data_region() {
%3 = fir.convert %c1024_i32 : (i32) -> index
%c1 = arith.constant 1 : index
%4 = fir.convert %2 : (index) -> i32
- %5:2 = fir.do_loop %arg0 = %2 to %3 step %c1 iter_args(%arg1 = %4) -> (index, i32) {
+ %5 = fir.do_loop %arg0 = %2 to %3 step %c1 iter_args(%arg1 = %4) -> (i32) {
fir.store %arg1 to %1 : !fir.ref<i32>
%6 = fir.load %1 : !fir.ref<i32>
%7 = fir.load %1 : !fir.ref<i32>
@@ -358,13 +358,12 @@ func.func @_QPopenmp_target_data_region() {
%9 = arith.subi %8, %c1_i64 : i64
%10 = fir.coordinate_of %0, %9 : (!fir.ref<!fir.array<1024xi32>>, i64) -> !fir.ref<i32>
fir.store %6 to %10 : !fir.ref<i32>
- %11 = arith.addi %arg0, %c1 overflow<nsw> : index
%12 = fir.convert %c1 : (index) -> i32
%13 = fir.load %1 : !fir.ref<i32>
%14 = arith.addi %13, %12 overflow<nsw> : i32
- fir.result %11, %14 : index, i32
+ fir.result %14 : i32
}
- fir.store %5#1 to %1 : !fir.ref<i32>
+ fir.store %5 to %1 : !fir.ref<i32>
omp.terminator
}
return
@@ -404,7 +403,6 @@ func.func @_QPopenmp_target_data_region() {
// CHECK: %[[VAL_21:.*]] = llvm.sub %[[VAL_19]], %[[VAL_20]] : i64
// CHECK: %[[VAL_22:.*]] = llvm.getelementptr %[[VAL_1]][0, %[[VAL_21]]] : (!llvm.ptr, i64) -> !llvm.ptr
// CHECK: llvm.store %[[VAL_17]], %[[VAL_22]] : i32, !llvm.ptr
-// CHECK: %[[VAL_23:.*]] = llvm.add %[[VAL_12]], %[[VAL_8]] overflow<nsw> : i64
// CHECK: %[[VAL_24:.*]] = llvm.trunc %[[VAL_8]] : i64 to i32
// CHECK: %[[VAL_25:.*]] = llvm.load %[[VAL_3]] : !llvm.ptr -> i32
// CHECK: %[[VAL_26:.*]] = llvm.add %[[VAL_25]], %[[VAL_24]] overflow<nsw> : i32
@@ -653,18 +651,17 @@ func.func @_QPsb() {
omp.sections {
omp.section {
%2 = fir.convert %c1 : (index) -> i32
- %3:2 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %2) -> (index, i32) {
+ %3 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %2) -> (i32) {
fir.store %arg1 to %0 : !fir.ref<i32>
%4 = fir.load %1 : !fir.ref<i32>
%5 = arith.addi %4, %c1_i32 : i32
fir.store %5 to %1 : !fir.ref<i32>
- %6 = arith.addi %arg0, %c1 : index
%7 = fir.convert %c1 : (index) -> i32
%8 = fir.load %0 : !fir.ref<i32>
%9 = arith.addi %8, %7 : i32
- fir.result %6, %9 : index, i32
+ fir.result %9 : i32
}
- fir.store %3#1 to %0 : !fir.ref<i32>
+ fir.store %3 to %0 : !fir.ref<i32>
omp.terminator
}
omp.terminator
diff --git a/flang/test/Lower/OpenACC/acc-declare.f90 b/flang/test/Lower/OpenACC/acc-declare.f90
index edae0e6a4d37e..46c4365f23fd6 100644
--- a/flang/test/Lower/OpenACC/acc-declare.f90
+++ b/flang/test/Lower/OpenACC/acc-declare.f90
@@ -20,7 +20,7 @@ subroutine acc_declare_copy()
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ALLOCA]](%{{.*}}) {acc.declare = #acc.declare<dataClause = acc_copy>, uniq_name = "_QMacc_declareFacc_declare_copyEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[COPYIN:.*]] = acc.copyin varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copy>, name = "a"}
! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
! CHECK: }
! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>)
! CHECK: acc.copyout accPtr(%[[COPYIN]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_copy>, name = "a"}
@@ -40,7 +40,7 @@ subroutine acc_declare_create()
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ALLOCA]](%{{.*}}) {acc.declare = #acc.declare<dataClause = acc_create>, uniq_name = "_QMacc_declareFacc_declare_createEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
! CHECK: }
! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
! CHECK: acc.delete accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_create>, name = "a"}
@@ -60,7 +60,7 @@ subroutine acc_declare_present(a)
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause = acc_present>, uniq_name = "_QMacc_declareFacc_declare_presentEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[PRESENT:.*]] = acc.present varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>)
! CHECK: acc.delete accPtr(%[[PRESENT]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_present>, name = "a"}
@@ -81,7 +81,7 @@ subroutine acc_declare_copyin()
! CHECK: %[[COPYIN_A:.*]] = acc.copyin varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: %[[COPYIN_B:.*]] = acc.copyin varPtr(%[[BDECL]]#0 : !fir.ref<!fir.array<10xi32>>) -> !fir.ref<!fir.array<10xi32>> {dataClause = #acc<data_clause acc_copyin_readonly>, name = "b"}
! CHECK: acc.declare_enter dataOperands(%[[COPYIN_A]], %[[COPYIN_B]] : !fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<10xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
! CHECK: acc.delete accPtr(%[[COPYIN_A]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_copyin>, name = "a"}
! CHECK: acc.delete accPtr(%[[COPYIN_B]] : !fir.ref<!fir.array<10xi32>>) {dataClause = #acc<data_clause acc_copyin_readonly>, name = "b"}
@@ -99,7 +99,7 @@ subroutine acc_declare_copyout()
! CHECK: %[[ADECL:.*]]:2 = hlfir.declare %[[A]](%{{.*}}) {acc.declare = #acc.declare<dataClause = acc_copyout>, uniq_name = "_QMacc_declareFacc_declare_copyoutEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copyout>, name = "a"}
! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>)
! CHECK: acc.copyout accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[ADECL]]#0 : !fir.ref<!fir.array<100xi32>>) {name = "a"}
! CHECK: return
@@ -118,7 +118,7 @@ subroutine acc_declare_deviceptr(a)
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause = acc_deviceptr>, uniq_name = "_QMacc_declareFacc_declare_deviceptrEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[DEVICEPTR:.*]] = acc.deviceptr varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: acc.declare_enter dataOperands(%[[DEVICEPTR]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
subroutine acc_declare_link(a)
integer :: a(100), i
@@ -134,7 +134,7 @@ subroutine acc_declare_link(a)
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause = acc_declare_link>, uniq_name = "_QMacc_declareFacc_declare_linkEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[LINK:.*]] = acc.declare_link varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: acc.declare_enter dataOperands(%[[LINK]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
subroutine acc_declare_device_resident(a)
integer :: a(100), i
@@ -150,7 +150,7 @@ subroutine acc_declare_device_resident(a)
! CHECK: %[[DECL:.*]]:2 = hlfir.declare %[[ARG0]](%{{.*}}) dummy_scope %{{[0-9]+}} {acc.declare = #acc.declare<dataClause = acc_declare_device_resident>, uniq_name = "_QMacc_declareFacc_declare_device_residentEa"} : (!fir.ref<!fir.array<100xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
! CHECK: %[[DEVICERES:.*]] = acc.declare_device_resident varPtr(%[[DECL]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {name = "a"}
! CHECK: %[[TOKEN:.*]] = acc.declare_enter dataOperands(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (index, i32)
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%arg{{.*}} = %{{.*}}) -> (i32)
! CHECK: acc.declare_exit token(%[[TOKEN]]) dataOperands(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>)
! CHECK: acc.delete accPtr(%[[DEVICERES]] : !fir.ref<!fir.array<100xi32>>) {dataClause = #acc<data_clause acc_declare_device_resident>, name = "a"}
@@ -279,7 +279,7 @@ subroutine acc_declare_multiple_directive(a, b)
! CHECK: %[[COPYIN:.*]] = acc.copyin varPtr(%[[DECL_A]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copy>, name = "a"}
! CHECK: %[[CREATE:.*]] = acc.create varPtr(%[[DECL_B]]#0 : !fir.ref<!fir.array<100xi32>>) -> !fir.ref<!fir.array<100xi32>> {dataClause = #acc<data_clause acc_copyout>, name = "b"}
! CHECK: acc.declare_enter dataOperands(%[[COPYIN]], %[[CREATE]] : !fir.ref<!fir.array<100xi32>>, !fir.ref<!fir.array<100xi32>>)
-! CHECK: %{{.*}}:{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (index, i32) {
+! CHECK: %{{.*}} = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%{{.*}} = %{{.*}}) -> (i32) {
! CHECK: acc.copyout accPtr(%[[CREATE]] : !fir.ref<!fir.array<100xi32>>) to varPtr(%[[DECL_B]]#0 : !fir.ref<!fir.array<100xi32>>) {name = "b"}
diff --git a/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90 b/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
index 0c0d877a17b00..642b11bcd6b75 100644
--- a/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
+++ b/flang/test/Lower/OpenMP/hlfir-seqloop-parallel.f90
@@ -22,10 +22,10 @@ subroutine sb1
!CHECK: %[[I_DECL:.*]]:2 = hlfir.declare %[[I_ADDR]] {uniq_name = "_QFsb1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: omp.parallel private({{.*}} %[[I_DECL]]#0 -> %[[I_PVT_ADDR:.*]] : {{.*}}) {
!CHECK: %[[I_PVT_DECL:.*]]:2 = hlfir.declare %[[I_PVT_ADDR]] {uniq_name = "_QFsb1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK: %[[I_FINAL_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK: %[[I_FINAL_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
!CHECK: fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: }
-!CHECK: fir.store %[[I_FINAL_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK: fir.store %[[I_FINAL_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: omp.terminator
!CHECK: }
!CHECK: return
@@ -58,20 +58,20 @@ subroutine sb2
!CHECK: %[[I_PVT_DECL:.*]]:2 = hlfir.declare %[[I_PVT_ADDR]] {uniq_name = "_QFsb2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK: %[[FINAL_J_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[J_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK: %[[FINAL_J_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[J_VAL:.*]] = %{{.*}}) -> (i32) {
!CHECK: fir.store %[[J_VAL]] to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: fir.if %{{.*}} {
-!CHECK: %[[FINAL_I_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK: %[[FINAL_I_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
!CHECK: fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: }
-!CHECK: fir.store %[[FINAL_I_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK: fir.store %[[FINAL_I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: }
-!CHECK: %[[FINAL_I_VAL:.*]]:2 = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (index, i32) {
+!CHECK: %[[FINAL_I_VAL:.*]] = fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} iter_args(%[[I_VAL:.*]] = %{{.*}}) -> (i32) {
!CHECK: fir.store %[[I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: }
-!CHECK: fir.store %[[FINAL_I_VAL]]#1 to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK: fir.store %[[FINAL_I_VAL]] to %[[I_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: }
-!CHECK: fir.store %[[FINAL_J_VAL]]#1 to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
+!CHECK: fir.store %[[FINAL_J_VAL]] to %[[J_PVT_DECL]]#0 : !fir.ref<i32>
!CHECK: omp.terminator
!CHECK: }
!CHECK: return
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
index 4d1023925fd88..3bb40834afe4c 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
@@ -59,21 +59,20 @@
! CHECK: %[[VAL_10:.*]] = fir.convert %[[VAL_9]] : (i32) -> index
! CHECK: %[[VAL_11:.*]] = arith.constant 1 : index
! CHECK: %[[LB:.*]] = fir.convert %[[VAL_8]] : (index) -> i32
-! CHECK: %[[VAL_12:.*]]:2 = fir.do_loop %[[VAL_13:[^ ]*]] =
+! CHECK: %[[VAL_12:.*]] = fir.do_loop %[[VAL_13:[^ ]*]] =
! CHECK-SAME: %[[VAL_8]] to %[[VAL_10]] step %[[VAL_11]]
-! CHECK-SAME: iter_args(%[[IV:.*]] = %[[LB]]) -> (index, i32) {
+! CHECK-SAME: iter_args(%[[IV:.*]] = %[[LB]]) -> (i32) {
! CHECK: fir.store %[[IV]] to %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
! CHECK: %[[LOAD:.*]] = fir.load %[[PRIV_I_DECL]]#0 : !fir.ref<i32>
! CHECK: %[[VAL_15:.*]] = fir.load %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
! CHECK: %[[VAL_16:.*]] = arith.addi %[[LOAD]], %[[VAL_15]] : i32
! CHECK: hlfir.assign %[[VAL_16]] to %[[PRIV_X_DECL]]#0 : i32, !fir.ref<i32>
-! CHECK: %[[VAL_17:.*]] = arith.addi %[[VAL_13]], %[[VAL_11]] overflow<nsw> : index
! CHECK: %[[STEPCAST:.*]] = fir.convert %[[VAL_11]] : (index) -> i32
! CHECK: %[[IVLOAD:.*]] = fir.load %[[PRIV_J_DECL]]#0 : !fir.ref<i32>
! CHECK: %[[IVINC:.*]] = arith.addi %[[I...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for taking the time to update all of those tests
Maybe I'm misreading, but looking at the changes to the test, it looks like we're still yielding the original type instead of the primary iv. Why not yield the primary iv and convert the result after the loop? It's still incrementing the user variable? |
Slava will know best. My understanding is that using the original variable type helps LLVM vectorizer: af7edf1 |
Yes, incrementing the user IV in the user type was crucial for helping LLVM vectorizer. Jean, Scott and I had some discussions about this today. It seems that exposing the
The I plan to merge this PR as-is, and I will experiment with the above lowering scheme. |
This patch changes two things: 1. We do not need to use the loop counter's last value for regular do-loops in Lowering. 2. The loop counter's increment is implied by fir.do_loop operation, so there is no need to increment it explicitly. The last point has been especially confusing to me, because it was unclear why we have an explicit increment if it is implied. It looks like CFGConversion somehow still makes the final code correct, i.e. the counter is not incremented twice. Anyway, the new lowering should look more concise.
This patch changes two things:
for regular do-loops in Lowering.
operation, so there is no need to increment it explicitly.
The last point has been especially confusing to me, because it was
unclear why we have an explicit increment if it is implied.
It looks like CFGConversion somehow still makes the final code
correct, i.e. the counter is not incremented twice.
Anyway, the new lowering should look more concise.