-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][CodeGen] add nsw to address calculations #74709
Conversation
nsw is a flag for LLVM arithmetic operations meaning "no signed wrap". If this keyword is present, the result of the operation is a poison value if overflow occurs. Adding this keyword permits LLVM to re-order integer arithmetic more aggressively. In https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16 @vzakhari observed that adding nsw is useful to enable hoisting of address calculations after some loops (or is at least a step in that direction). Classic flang also adds nsw to address calculations.
@llvm/pr-subscribers-flang-codegen @llvm/pr-subscribers-flang-fir-hlfir Author: Tom Eccles (tblah) Changes
In Classic flang also adds nsw to address calculations. Patch is 32.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74709.diff 6 Files Affected:
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index bf175c8ebadee..293208ce3b601 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -2387,6 +2387,9 @@ struct XArrayCoorOpConversion
const bool baseIsBoxed = coor.getMemref().getType().isa<fir::BaseBoxType>();
TypePair baseBoxTyPair =
baseIsBoxed ? getBoxTypePair(coor.getMemref().getType()) : TypePair{};
+ mlir::LLVM::IntegerOverflowFlagsAttr nsw =
+ mlir::LLVM::IntegerOverflowFlagsAttr::get(
+ rewriter.getContext(), mlir::LLVM::IntegerOverflowFlags::nsw);
// For each dimension of the array, generate the offset calculation.
for (unsigned i = 0; i < rank; ++i, ++indexOffset, ++shapeOffset,
@@ -2407,14 +2410,15 @@ struct XArrayCoorOpConversion
if (normalSlice)
step = integerCast(loc, rewriter, idxTy, operands[sliceOffset + 2]);
}
- auto idx = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, index, lb);
+ auto idx = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, index, lb, nsw);
mlir::Value diff =
- rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, idx, step);
+ rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, idx, step, nsw);
if (normalSlice) {
mlir::Value sliceLb =
integerCast(loc, rewriter, idxTy, operands[sliceOffset]);
- auto adj = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, sliceLb, lb);
- diff = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, diff, adj);
+ auto adj =
+ rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, sliceLb, lb, nsw);
+ diff = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, diff, adj, nsw);
}
// Update the offset given the stride and the zero based index `diff`
// that was just computed.
@@ -2422,17 +2426,21 @@ struct XArrayCoorOpConversion
// Use stride in bytes from the descriptor.
mlir::Value stride =
getStrideFromBox(loc, baseBoxTyPair, operands[0], i, rewriter);
- auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, stride);
- offset = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset);
+ auto sc =
+ rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, stride, nsw);
+ offset =
+ rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset, nsw);
} else {
// Use stride computed at last iteration.
- auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, prevExt);
- offset = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset);
+ auto sc =
+ rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, prevExt, nsw);
+ offset =
+ rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset, nsw);
// Compute next stride assuming contiguity of the base array
// (in element number).
auto nextExt = integerCast(loc, rewriter, idxTy, operands[shapeOffset]);
- prevExt =
- rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, prevExt, nextExt);
+ prevExt = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, prevExt,
+ nextExt, nsw);
}
}
@@ -2491,8 +2499,8 @@ struct XArrayCoorOpConversion
assert(coor.getLenParams().size() == 1);
auto length = integerCast(loc, rewriter, idxTy,
operands[coor.lenParamsOffset()]);
- offset =
- rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, offset, length);
+ offset = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, offset,
+ length, nsw);
} else {
TODO(loc, "compute size of derived type with type parameters");
}
@@ -2665,6 +2673,9 @@ struct CoordinateOpConversion
auto cpnTy = fir::dyn_cast_ptrOrBoxEleTy(boxObjTy);
mlir::Type llvmPtrTy = ::getLlvmPtrType(coor.getContext());
mlir::Type byteTy = ::getI8Type(coor.getContext());
+ mlir::LLVM::IntegerOverflowFlagsAttr nsw =
+ mlir::LLVM::IntegerOverflowFlagsAttr::get(
+ rewriter.getContext(), mlir::LLVM::IntegerOverflowFlags::nsw);
for (unsigned i = 1, last = operands.size(); i < last; ++i) {
if (auto arrTy = cpnTy.dyn_cast<fir::SequenceType>()) {
@@ -2680,9 +2691,9 @@ struct CoordinateOpConversion
index < lastIndex; ++index) {
mlir::Value stride = getStrideFromBox(loc, boxTyPair, operands[0],
index - i, rewriter);
- auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy,
- operands[index], stride);
- off = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, off);
+ auto sc = rewriter.create<mlir::LLVM::MulOp>(
+ loc, idxTy, operands[index], stride, nsw);
+ off = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, off, nsw);
}
resultAddr = rewriter.create<mlir::LLVM::GEPOp>(
loc, llvmPtrTy, byteTy, resultAddr,
diff --git a/flang/test/Fir/array-coor.fir b/flang/test/Fir/array-coor.fir
index 738acd7dd91fd..a765670d20b28 100644
--- a/flang/test/Fir/array-coor.fir
+++ b/flang/test/Fir/array-coor.fir
@@ -9,12 +9,12 @@ func.func @array_coor_box_value(%29 : !fir.box<!fir.array<2xf64>>,
}
// CHECK-LABEL: define double @array_coor_box_value
-// CHECK: %[[t3:.*]] = sub i64 %{{.*}}, 1
-// CHECK: %[[t4:.*]] = mul i64 %[[t3]], 1
+// CHECK: %[[t3:.*]] = sub nsw i64 %{{.*}}, 1
+// CHECK: %[[t4:.*]] = mul nsw i64 %[[t3]], 1
// CHECK: %[[t5:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %{{.*}}, i32 0, i32 7, i32 0, i32 2
// CHECK: %[[t6:.*]] = load i64, ptr %[[t5]]
-// CHECK: %[[t7:.*]] = mul i64 %[[t4]], %[[t6]]
-// CHECK: %[[t8:.*]] = add i64 %[[t7]], 0
+// CHECK: %[[t7:.*]] = mul nsw i64 %[[t4]], %[[t6]]
+// CHECK: %[[t8:.*]] = add nsw i64 %[[t7]], 0
// CHECK: %[[t9:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %{{.*}}, i32 0, i32 0
// CHECK: %[[t10:.*]] = load ptr, ptr %[[t9]]
// CHECK: %[[t11:.*]] = getelementptr i8, ptr %[[t10]], i64 %[[t8]]
@@ -36,8 +36,8 @@ func.func private @take_int(%arg0: !fir.ref<i32>) -> ()
// CHECK-SAME: ptr %[[VAL_0:.*]])
// CHECK: %[[VAL_1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[VAL_0]], i32 0, i32 7, i32 0, i32 2
// CHECK: %[[VAL_2:.*]] = load i64, ptr %[[VAL_1]]
-// CHECK: %[[VAL_3:.*]] = mul i64 1, %[[VAL_2]]
-// CHECK: %[[VAL_4:.*]] = add i64 %[[VAL_3]], 0
+// CHECK: %[[VAL_3:.*]] = mul nsw i64 1, %[[VAL_2]]
+// CHECK: %[[VAL_4:.*]] = add nsw i64 %[[VAL_3]], 0
// CHECK: %[[VAL_5:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[VAL_0]], i32 0, i32 0
// CHECK: %[[VAL_6:.*]] = load ptr, ptr %[[VAL_5]]
// CHECK: %[[VAL_7:.*]] = getelementptr i8, ptr %[[VAL_6]], i64 %[[VAL_4]]
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index 87a2763608250..5d265a5e3a08d 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -114,8 +114,8 @@ func.func @f5(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf
%4 = fir.do_loop %arg3 = %c0 to %1 step %c1 iter_args(%arg4 = %2) -> (!fir.array<?xf32>) {
// CHECK: %[[B_STRIDE_GEP:.*]] = getelementptr {{.*}}, ptr %[[B]], i32 0, i32 7, i32 0, i32 2
// CHECK: %[[B_STRIDE:.*]] = load i64, ptr %[[B_STRIDE_GEP]]
- // CHECK: %[[B_DIM_OFFSET:.*]] = mul i64 %{{.*}}, %[[B_STRIDE]]
- // CHECK: %[[B_OFFSET:.*]] = add i64 %[[B_DIM_OFFSET]], 0
+ // CHECK: %[[B_DIM_OFFSET:.*]] = mul nsw i64 %{{.*}}, %[[B_STRIDE]]
+ // CHECK: %[[B_OFFSET:.*]] = add nsw i64 %[[B_DIM_OFFSET]], 0
// CHECK: %[[B_BASE_GEP:.*]] = getelementptr {{.*}}, ptr %{{.*}}, i32 0, i32 0
// CHECK: %[[B_BASE:.*]] = load ptr, ptr %[[B_BASE_GEP]]
// CHECK: %[[B_VOID_ADDR:.*]] = getelementptr i8, ptr %[[B_BASE]], i64 %[[B_OFFSET]]
@@ -172,7 +172,7 @@ func.func @f7(%arg0: !fir.ref<f32>, %arg1: !fir.box<!fir.array<?xf32>>) {
%0 = fir.shift %c4 : (index) -> !fir.shift<1>
// CHECK: %[[STRIDE_GEP:.*]] = getelementptr {{.*}}, ptr %[[Y]], i32 0, i32 7, i32 0, i32 2
// CHECK: %[[STRIDE:.*]] = load i64, ptr %[[STRIDE_GEP]]
- // CHECK: mul i64 96, %[[STRIDE]]
+ // CHECK: mul nsw i64 96, %[[STRIDE]]
%1 = fir.array_coor %arg1(%0) %c100 : (!fir.box<!fir.array<?xf32>>, !fir.shift<1>, index) -> !fir.ref<f32>
%2 = fir.load %1 : !fir.ref<f32>
fir.store %2 to %arg0 : !fir.ref<f32>
@@ -202,7 +202,7 @@ func.func @f8(%a : !fir.ref<!fir.array<2x2x!fir.type<t{i:i32}>>>, %i : i32) {
func.func @f9(%i: i32, %e : i64, %j: i64, %c: !fir.ref<!fir.array<?x?x!fir.char<1,?>>>) -> !fir.ref<!fir.char<1,?>> {
%s = fir.shape %e, %e : (i64, i64) -> !fir.shape<2>
// CHECK: %[[CAST:.*]] = sext i32 %[[I]] to i64
- // CHECK: %[[OFFSET:.*]] = mul i64 %{{.*}}, %[[CAST]]
+ // CHECK: %[[OFFSET:.*]] = mul nsw i64 %{{.*}}, %[[CAST]]
// CHECK: getelementptr i8, ptr %[[C]], i64 %[[OFFSET]]
%a = fir.array_coor %c(%s) %j, %j typeparams %i : (!fir.ref<!fir.array<?x?x!fir.char<1,?>>>, !fir.shape<2>, i64, i64, i32) -> !fir.ref<!fir.char<1,?>>
return %a : !fir.ref<!fir.char<1,?>>
diff --git a/flang/test/Fir/convert-to-llvm.fir b/flang/test/Fir/convert-to-llvm.fir
index 993058ebb0a4d..be82ffab7e33e 100644
--- a/flang/test/Fir/convert-to-llvm.fir
+++ b/flang/test/Fir/convert-to-llvm.fir
@@ -2027,10 +2027,10 @@ func.func @ext_array_coor0(%arg0: !fir.ref<!fir.array<?xi32>>) {
// CHECK: %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
-// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]] : i64
-// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] : i64
+// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
// CHECK: %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
// Conversion with shift and slice.
@@ -2046,12 +2046,12 @@ func.func @ext_array_coor1(%arg0: !fir.ref<!fir.array<?xi32>>) {
// CHECK: %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C0]] : i64
-// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C0]] : i64
-// CHECK: %[[ADJ:.*]] = llvm.sub %[[C0]], %[[C0]] : i64
-// CHECK: %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] : i64
-// CHECK: %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1]] : i64
-// CHECK: %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] : i64
+// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C0]] overflow<nsw> : i64
+// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C0]] overflow<nsw> : i64
+// CHECK: %[[ADJ:.*]] = llvm.sub %[[C0]], %[[C0]] overflow<nsw> : i64
+// CHECK: %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] overflow<nsw> : i64
+// CHECK: %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] overflow<nsw> : i64
// CHECK: %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
// Conversion for a dynamic length char.
@@ -2067,10 +2067,10 @@ func.func @ext_array_coor2(%arg0: !fir.ref<!fir.array<?x!fir.char<1,?>>>) {
// CHECK: %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
-// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]] : i64
-// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] : i64
+// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
// CHECK: %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
// Conversion for a `fir.box`.
@@ -2086,12 +2086,12 @@ func.func @ext_array_coor3(%arg0: !fir.box<!fir.array<?xi32>>) {
// CHECK: %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
+// CHECK: %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
// CHECK: %[[GEPSTRIDE:.*]] = llvm.getelementptr %[[ARG0]][0, 7, 0, 2] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, array<1 x array<3 x i64>>)>
// CHECK: %[[LOADEDSTRIDE:.*]] = llvm.load %[[GEPSTRIDE]] : !llvm.ptr -> i64
-// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[LOADEDSTRIDE]] : i64
-// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] : i64
+// CHECK: %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[LOADEDSTRIDE]] overflow<nsw> : i64
+// CHECK: %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
// CHECK: %[[GEPADDR:.*]] = llvm.getelementptr %[[ARG0]][0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, array<1 x array<3 x i64>>)>
// CHECK: %[[LOADEDADDR:.*]] = llvm.load %[[GEPADDR]] : !llvm.ptr -> !llvm.ptr
// CHECK: %[[GEPADDROFFSET:.*]] = llvm.getelementptr %[[LOADEDADDR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
@@ -2115,12 +2115,12 @@ func.func @ext_array_coor4(%arg0: !fir.ref<!fir.array<100xi32>>) {
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C1_1:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[IDX:.*]] = llvm.sub %[[C1]], %[[C0]] : i64
-// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
-// CHECK: %[[ADJ:.*]] = llvm.sub %[[C10]], %[[C0]] : i64
-// CHECK: %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] : i64
-// CHECK: %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1_1]] : i64
-// CHECK: %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] : i64
+// CHECK: %[[IDX:.*]] = llvm.sub %[[C1]], %[[C0]] overflow<nsw> : i64
+// CHECK: %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK: %[[ADJ:.*]] = llvm.sub %[[C10]], %[[C0]] overflow<nsw> : i64
+// CHECK: %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] overflow<nsw> : i64
+// CHECK: %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1_1]] overflow<nsw> : i64
+// CHECK: %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] overflow<nsw> : i64
// CHECK: %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
// Conversion with index type shape and slice
@@ -2134,13 +2134,13 @@ func.func @ext_array_coor5(%arg0: !fir.ref<!fir.array<?xi32>>, %idx1 : index, %i
// CHECK-SAME: %[[VAL_0:.*]]: !llvm.ptr, %[[VAL_1:.*]]: i64, %[[VAL_2:.*]]: i64, %[[VAL_3:.*]]: i64, %[[VAL_4:.*]]: i64, %[[VAL_5:.*]]: i64) {
// CHECK: %[[VAL_6:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[VAL_7:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] : i64
-// CHECK: %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] : i64
-// CHECK: %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] : i64
-// CHECK: %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] : i64
+// CHECK: %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] overflow<nsw> : i64
+// CHECK: %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] overflow<nsw> : i64
+// CHECK: %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] overflow<nsw> : i64
+// CHECK: %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] overflow<nsw> : i64
// CHECK: %[[VAL_16:.*]] = llvm.getelementptr %[[VAL_0]][%[[VAL_13]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
// CHECK: }
@@ -2155,27 +2155,27 @@ func.func @ext_array_coor6(%arg0: !fir.ref<!fir.array<?x?x?xi32>>, %idx1 : index
// CHECK-SAME: %[[VAL_0:.*]]: !llvm.ptr, %[[VAL_1:.*]]: i64, %[[VAL_2:.*]]: i64, %[[VAL_3:.*]]: i64, %[[VAL_4:.*]]: i64, %[[VAL_5:.*]]: i64) {
// CHECK: %[[VAL_6:.*]] = llvm.mlir.constant(1 : i64) : i64
// CHECK: %[[VAL_7:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK: %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] : i64
-// CHECK: %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] : i64
-// CHECK: %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] : i64
-// CHECK: %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] : i64
-// CHECK: %[[VAL_15:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_16:.*]] = llvm.mul %[[VAL_15]], %[[VAL_4]] : i64
-// CHECK: %[[VAL_17:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_18:.*]] = llvm.add %[[VAL_16]], %[[VAL_17]] : i64
-// CHECK: %[[VAL_19:.*]] = llvm.mul %[[VAL_18]], %[[VAL_14]] : i64
-// CHECK: %[[VAL_20:.*]] = llvm.add %[[VAL_19]], %[[VAL_13]] : i64
-// CHECK: %[[VAL_21:.*]] = llvm.mul %[[VAL_14]], %[[VAL_1]] : i64
-// CHECK: %[[VAL_22:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_23:.*]] = llvm.mul %[[VAL_22]], %[[VAL_4]] : i64
-// CHECK: %[[VAL_24:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] : i64
-// CHECK: %[[VAL_25:.*]] = llvm.add %[[VAL_23]], %[[VAL_24]] : i64
-// CHECK: %[[VAL_26:.*]] = llvm.mul %[[VAL_25]], %[[VAL_21]] : i64
-// CHECK: %[[VAL_27:.*]] = llvm.add %[[VAL_26]], %[[VAL_20]] : i64
-// CHECK: %[[VAL_28:.*]] = llvm.mul %[[VAL_21]], %[[VAL_1]] : i64
+// CHECK: %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] overflow<nsw> : i64
+// CHECK: %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] overflow<nsw> : i64
+// CHECK: %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK: %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] overflow<nsw> : i64
+// CHECK: %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] overflow<nsw> : i64
+// CHECK: %[[VAL_15:.*]] = llvm....
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please wait for @vzakhari opinion, but since arithmetic overflow in address computation is not a feature in Fortran, this sounds reasonable to me.
Thank you for the changes, Tom. It looks good to me. I just want to clarify that this is not the place that I talked about in the RFC. I was talking about setting |
Not for this patch, but are there other places where we could add |
Thanks for explaining and reviewing. I'll merge this then make a new patch for the do increment. |
I don't have hardware to test this myself. Would anyone else be able to verify if it works?
@vzakhari Are you sure if this is allowed? do-variables could definitely overflow. Is integer overflow completely undefined in Fortran? I can't find any mention of it in the standard. |
I think so. Adding @klausler to validate, but gfortran doc is telling this is forbidden (https://gcc.gnu.org/onlinedocs/gfortran/Behavior-on-integer-overflow.html). I think the rational is that Fortran specifies the INTEGER type as a subset of mathematical integers in 7.4.3.1, and it also says that aside from parentheses order, Fortran is free to rewrite any expression to its mathematical equivalent (10.1.5.2.4 point 2.). So it would be valid for the compiler to fold In fact, you can check that this is the case with gfortran on the following funny example (gfortran likely rewrites
gfortran 13.2 outputs |
I don't know what "nsw" means, but will just add a reminder that Fortran DO loop iteration counts are calculated before the loop starts. Don't use the current value of the DO loop index variable for loop termination testing -- just increment it each time around. |
Thanks for the asnwer, lowering is not using the do loop index for termination testing. My example with It is just an example that shows what kind of optimization LLVM can do if nsw is set on additions. LLVM example:
Slava perf issue comes from a DO loop index that is being used in user code/addressing after being incremented. We do not add the "nsw" flag on the add that increments the do loop variable inside the loop currently, so llvm thinks that it has to honor any do loop overflow in usages of the do loop variable after its increment. I think the question of Tom is: are we supposed to honor do loop overflow inside the body of the loop? And I think Fortran does allow user code with integers to overflow, so this flag can be set. |
What do you mean by "honor do loop overflow"? Integer overflow makes a program non-conformant. The language doesn't give any guarantees about the result. But existing code that depends on overflow & underflow canceling out is out there; for example, computing ((N+1)-1) and expecting N. |
Tom, I understand your concern. I think adding no signed wrap on the do-variable increment may be a problem depending on how LLVM would optimize the loop ivs. For example:
There is signed wrap on the final increment of I guess setting We can try to add P.S. btw, gfortran compiles the above code into an infinite loop. |
…#91579) This patch adds nsw flag to the increment of do-variables when a new option is enabled. NOTE 11.10 in the Fortran 2018 standard says they never overflow. See also the discussion in llvm#74709 and the following discourse post. https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/5
nsw
is a flag for LLVM arithmetic operations meaning "no signed wrap". If this keyword is present, the result of the operation is a poison value if overflow occurs. Adding this keyword permits LLVM to re-order integer arithmetic more aggressively.In
https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16 @vzakhari observed that adding nsw is useful to enable hoisting of address calculations after some loops (or is at least a step in that direction).
Classic flang also adds nsw to address calculations.