[flang][CodeGen] add nsw to address calculations #74709

tblah · 2023-12-07T10:44:55Z

nsw is a flag for LLVM arithmetic operations meaning "no signed wrap". If this keyword is present, the result of the operation is a poison value if overflow occurs. Adding this keyword permits LLVM to re-order integer arithmetic more aggressively.

In
https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16 @vzakhari observed that adding nsw is useful to enable hoisting of address calculations after some loops (or is at least a step in that direction).

Classic flang also adds nsw to address calculations.

@vzakhari

nsw is a flag for LLVM arithmetic operations meaning "no signed wrap". If this keyword is present, the result of the operation is a poison value if overflow occurs. Adding this keyword permits LLVM to re-order integer arithmetic more aggressively. In https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16 @vzakhari observed that adding nsw is useful to enable hoisting of address calculations after some loops (or is at least a step in that direction). Classic flang also adds nsw to address calculations.

llvmbot · 2023-12-07T10:45:24Z

@llvm/pr-subscribers-flang-codegen

@llvm/pr-subscribers-flang-fir-hlfir

Author: Tom Eccles (tblah)

Changes

nsw is a flag for LLVM arithmetic operations meaning "no signed wrap". If this keyword is present, the result of the operation is a poison value if overflow occurs. Adding this keyword permits LLVM to re-order integer arithmetic more aggressively.

In
https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16 @vzakhari observed that adding nsw is useful to enable hoisting of address calculations after some loops (or is at least a step in that direction).

Classic flang also adds nsw to address calculations.

Patch is 32.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74709.diff

6 Files Affected:

(modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+26-15)
(modified) flang/test/Fir/array-coor.fir (+6-6)
(modified) flang/test/Fir/arrexp.fir (+4-4)
(modified) flang/test/Fir/convert-to-llvm.fir (+69-69)
(modified) flang/test/Fir/coordinateof.fir (+4-4)
(modified) flang/test/Fir/tbaa.fir (+6-6)

diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index bf175c8ebadee..293208ce3b601 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -2387,6 +2387,9 @@ struct XArrayCoorOpConversion
     const bool baseIsBoxed = coor.getMemref().getType().isa<fir::BaseBoxType>();
     TypePair baseBoxTyPair =
         baseIsBoxed ? getBoxTypePair(coor.getMemref().getType()) : TypePair{};
+    mlir::LLVM::IntegerOverflowFlagsAttr nsw =
+        mlir::LLVM::IntegerOverflowFlagsAttr::get(
+            rewriter.getContext(), mlir::LLVM::IntegerOverflowFlags::nsw);
 
     // For each dimension of the array, generate the offset calculation.
     for (unsigned i = 0; i < rank; ++i, ++indexOffset, ++shapeOffset,
@@ -2407,14 +2410,15 @@ struct XArrayCoorOpConversion
         if (normalSlice)
           step = integerCast(loc, rewriter, idxTy, operands[sliceOffset + 2]);
       }
-      auto idx = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, index, lb);
+      auto idx = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, index, lb, nsw);
       mlir::Value diff =
-          rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, idx, step);
+          rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, idx, step, nsw);
       if (normalSlice) {
         mlir::Value sliceLb =
             integerCast(loc, rewriter, idxTy, operands[sliceOffset]);
-        auto adj = rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, sliceLb, lb);
-        diff = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, diff, adj);
+        auto adj =
+            rewriter.create<mlir::LLVM::SubOp>(loc, idxTy, sliceLb, lb, nsw);
+        diff = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, diff, adj, nsw);
       }
       // Update the offset given the stride and the zero based index `diff`
       // that was just computed.
@@ -2422,17 +2426,21 @@ struct XArrayCoorOpConversion
         // Use stride in bytes from the descriptor.
         mlir::Value stride =
             getStrideFromBox(loc, baseBoxTyPair, operands[0], i, rewriter);
-        auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, stride);
-        offset = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset);
+        auto sc =
+            rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, stride, nsw);
+        offset =
+            rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset, nsw);
       } else {
         // Use stride computed at last iteration.
-        auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, prevExt);
-        offset = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset);
+        auto sc =
+            rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, diff, prevExt, nsw);
+        offset =
+            rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, offset, nsw);
         // Compute next stride assuming contiguity of the base array
         // (in element number).
         auto nextExt = integerCast(loc, rewriter, idxTy, operands[shapeOffset]);
-        prevExt =
-            rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, prevExt, nextExt);
+        prevExt = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, prevExt,
+                                                     nextExt, nsw);
       }
     }
 
@@ -2491,8 +2499,8 @@ struct XArrayCoorOpConversion
           assert(coor.getLenParams().size() == 1);
           auto length = integerCast(loc, rewriter, idxTy,
                                     operands[coor.lenParamsOffset()]);
-          offset =
-              rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, offset, length);
+          offset = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy, offset,
+                                                      length, nsw);
         } else {
           TODO(loc, "compute size of derived type with type parameters");
         }
@@ -2665,6 +2673,9 @@ struct CoordinateOpConversion
     auto cpnTy = fir::dyn_cast_ptrOrBoxEleTy(boxObjTy);
     mlir::Type llvmPtrTy = ::getLlvmPtrType(coor.getContext());
     mlir::Type byteTy = ::getI8Type(coor.getContext());
+    mlir::LLVM::IntegerOverflowFlagsAttr nsw =
+        mlir::LLVM::IntegerOverflowFlagsAttr::get(
+            rewriter.getContext(), mlir::LLVM::IntegerOverflowFlags::nsw);
 
     for (unsigned i = 1, last = operands.size(); i < last; ++i) {
       if (auto arrTy = cpnTy.dyn_cast<fir::SequenceType>()) {
@@ -2680,9 +2691,9 @@ struct CoordinateOpConversion
              index < lastIndex; ++index) {
           mlir::Value stride = getStrideFromBox(loc, boxTyPair, operands[0],
                                                 index - i, rewriter);
-          auto sc = rewriter.create<mlir::LLVM::MulOp>(loc, idxTy,
-                                                       operands[index], stride);
-          off = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, off);
+          auto sc = rewriter.create<mlir::LLVM::MulOp>(
+              loc, idxTy, operands[index], stride, nsw);
+          off = rewriter.create<mlir::LLVM::AddOp>(loc, idxTy, sc, off, nsw);
         }
         resultAddr = rewriter.create<mlir::LLVM::GEPOp>(
             loc, llvmPtrTy, byteTy, resultAddr,
diff --git a/flang/test/Fir/array-coor.fir b/flang/test/Fir/array-coor.fir
index 738acd7dd91fd..a765670d20b28 100644
--- a/flang/test/Fir/array-coor.fir
+++ b/flang/test/Fir/array-coor.fir
@@ -9,12 +9,12 @@ func.func @array_coor_box_value(%29 : !fir.box<!fir.array<2xf64>>,
 }
 
 // CHECK-LABEL: define double @array_coor_box_value
-// CHECK: %[[t3:.*]] = sub i64 %{{.*}}, 1
-// CHECK: %[[t4:.*]] = mul i64 %[[t3]], 1
+// CHECK: %[[t3:.*]] = sub nsw i64 %{{.*}}, 1
+// CHECK: %[[t4:.*]] = mul nsw i64 %[[t3]], 1
 // CHECK: %[[t5:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %{{.*}}, i32 0, i32 7, i32 0, i32 2
 // CHECK: %[[t6:.*]] = load i64, ptr %[[t5]]
-// CHECK: %[[t7:.*]] = mul i64 %[[t4]], %[[t6]]
-// CHECK: %[[t8:.*]] = add i64 %[[t7]], 0
+// CHECK: %[[t7:.*]] = mul nsw i64 %[[t4]], %[[t6]]
+// CHECK: %[[t8:.*]] = add nsw i64 %[[t7]], 0
 // CHECK: %[[t9:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %{{.*}}, i32 0, i32 0
 // CHECK: %[[t10:.*]] = load ptr, ptr %[[t9]]
 // CHECK: %[[t11:.*]] = getelementptr i8, ptr %[[t10]], i64 %[[t8]]
@@ -36,8 +36,8 @@ func.func private @take_int(%arg0: !fir.ref<i32>) -> ()
 // CHECK-SAME: ptr %[[VAL_0:.*]])
 // CHECK:   %[[VAL_1:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[VAL_0]], i32 0, i32 7, i32 0, i32 2
 // CHECK:   %[[VAL_2:.*]] = load i64, ptr %[[VAL_1]]
-// CHECK:   %[[VAL_3:.*]] = mul i64 1, %[[VAL_2]]
-// CHECK:   %[[VAL_4:.*]] = add i64 %[[VAL_3]], 0
+// CHECK:   %[[VAL_3:.*]] = mul nsw i64 1, %[[VAL_2]]
+// CHECK:   %[[VAL_4:.*]] = add nsw i64 %[[VAL_3]], 0
 // CHECK:   %[[VAL_5:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, ptr %[[VAL_0]], i32 0, i32 0
 // CHECK:   %[[VAL_6:.*]] = load ptr, ptr %[[VAL_5]]
 // CHECK:   %[[VAL_7:.*]] = getelementptr i8, ptr %[[VAL_6]], i64 %[[VAL_4]]
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index 87a2763608250..5d265a5e3a08d 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -114,8 +114,8 @@ func.func @f5(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf
   %4 = fir.do_loop %arg3 = %c0 to %1 step %c1 iter_args(%arg4 = %2) -> (!fir.array<?xf32>) {
     // CHECK: %[[B_STRIDE_GEP:.*]] = getelementptr {{.*}}, ptr %[[B]], i32 0, i32 7, i32 0, i32 2
     // CHECK: %[[B_STRIDE:.*]] = load i64, ptr %[[B_STRIDE_GEP]]
-    // CHECK: %[[B_DIM_OFFSET:.*]] = mul i64 %{{.*}}, %[[B_STRIDE]]
-    // CHECK: %[[B_OFFSET:.*]] =  add i64 %[[B_DIM_OFFSET]], 0
+    // CHECK: %[[B_DIM_OFFSET:.*]] = mul nsw i64 %{{.*}}, %[[B_STRIDE]]
+    // CHECK: %[[B_OFFSET:.*]] =  add nsw i64 %[[B_DIM_OFFSET]], 0
     // CHECK: %[[B_BASE_GEP:.*]] = getelementptr {{.*}}, ptr %{{.*}}, i32 0, i32 0
     // CHECK: %[[B_BASE:.*]] = load ptr, ptr %[[B_BASE_GEP]]
     // CHECK: %[[B_VOID_ADDR:.*]] = getelementptr i8, ptr %[[B_BASE]], i64 %[[B_OFFSET]]
@@ -172,7 +172,7 @@ func.func @f7(%arg0: !fir.ref<f32>, %arg1: !fir.box<!fir.array<?xf32>>) {
   %0 = fir.shift %c4 : (index) -> !fir.shift<1>
   // CHECK: %[[STRIDE_GEP:.*]] = getelementptr {{.*}}, ptr %[[Y]], i32 0, i32 7, i32 0, i32 2
   // CHECK: %[[STRIDE:.*]] = load i64, ptr %[[STRIDE_GEP]]
-  // CHECK: mul i64 96, %[[STRIDE]]
+  // CHECK: mul nsw i64 96, %[[STRIDE]]
   %1 = fir.array_coor %arg1(%0) %c100 : (!fir.box<!fir.array<?xf32>>, !fir.shift<1>, index) -> !fir.ref<f32>
   %2 = fir.load %1 : !fir.ref<f32>
   fir.store %2 to %arg0 : !fir.ref<f32>
@@ -202,7 +202,7 @@ func.func @f8(%a : !fir.ref<!fir.array<2x2x!fir.type<t{i:i32}>>>, %i : i32) {
 func.func @f9(%i: i32, %e : i64, %j: i64, %c: !fir.ref<!fir.array<?x?x!fir.char<1,?>>>) -> !fir.ref<!fir.char<1,?>> {
   %s = fir.shape %e, %e : (i64, i64) -> !fir.shape<2>
   // CHECK: %[[CAST:.*]] = sext i32 %[[I]] to i64
-  // CHECK: %[[OFFSET:.*]] = mul i64 %{{.*}}, %[[CAST]]
+  // CHECK: %[[OFFSET:.*]] = mul nsw i64 %{{.*}}, %[[CAST]]
   // CHECK: getelementptr i8, ptr %[[C]], i64 %[[OFFSET]]
   %a = fir.array_coor %c(%s) %j, %j typeparams %i : (!fir.ref<!fir.array<?x?x!fir.char<1,?>>>, !fir.shape<2>, i64, i64, i32) -> !fir.ref<!fir.char<1,?>>
   return %a :  !fir.ref<!fir.char<1,?>>
diff --git a/flang/test/Fir/convert-to-llvm.fir b/flang/test/Fir/convert-to-llvm.fir
index 993058ebb0a4d..be82ffab7e33e 100644
--- a/flang/test/Fir/convert-to-llvm.fir
+++ b/flang/test/Fir/convert-to-llvm.fir
@@ -2027,10 +2027,10 @@ func.func @ext_array_coor0(%arg0: !fir.ref<!fir.array<?xi32>>) {
 // CHECK:         %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
-// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  : i64
+// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  overflow<nsw> : i64
 // CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 
 // Conversion with shift and slice.
@@ -2046,12 +2046,12 @@ func.func @ext_array_coor1(%arg0: !fir.ref<!fir.array<?xi32>>) {
 // CHECK:         %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C0]] : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C0]] : i64
-// CHECK:         %[[ADJ:.*]] = llvm.sub %[[C0]], %[[C0]]  : i64
-// CHECK:         %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]]  : i64
-// CHECK:         %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1]]  : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]]  : i64
+// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C0]] overflow<nsw> : i64
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C0]] overflow<nsw> : i64
+// CHECK:         %[[ADJ:.*]] = llvm.sub %[[C0]], %[[C0]]  overflow<nsw> : i64
+// CHECK:         %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] overflow<nsw> : i64
+// CHECK:         %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] overflow<nsw> : i64
 // CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 
 // Conversion for a dynamic length char.
@@ -2067,10 +2067,10 @@ func.func @ext_array_coor2(%arg0: !fir.ref<!fir.array<?x!fir.char<1,?>>>) {
 // CHECK:         %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
-// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  : i64
+// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
 // CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 
 // Conversion for a `fir.box`.
@@ -2086,12 +2086,12 @@ func.func @ext_array_coor3(%arg0: !fir.box<!fir.array<?xi32>>) {
 // CHECK:         %[[C0:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] : i64
+// CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
 // CHECK:         %[[GEPSTRIDE:.*]] = llvm.getelementptr %[[ARG0]][0, 7, 0, 2] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, array<1 x array<3 x i64>>)>
 // CHECK:         %[[LOADEDSTRIDE:.*]] = llvm.load %[[GEPSTRIDE]] : !llvm.ptr -> i64
-// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[LOADEDSTRIDE]]  : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  : i64
+// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[LOADEDSTRIDE]] overflow<nsw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
 // CHECK:         %[[GEPADDR:.*]] = llvm.getelementptr %[[ARG0]][0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, array<1 x array<3 x i64>>)>
 // CHECK:         %[[LOADEDADDR:.*]] = llvm.load %[[GEPADDR]] : !llvm.ptr -> !llvm.ptr
 // CHECK:         %[[GEPADDROFFSET:.*]] = llvm.getelementptr %[[LOADEDADDR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
@@ -2115,12 +2115,12 @@ func.func @ext_array_coor4(%arg0: !fir.ref<!fir.array<100xi32>>) {
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C1_1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:         %[[IDX:.*]] = llvm.sub %[[C1]], %[[C0]]  : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]]  : i64
-// CHECK:         %[[ADJ:.*]] = llvm.sub %[[C10]], %[[C0]]  : i64
-// CHECK:         %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]]  : i64
-// CHECK:         %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1_1]] : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]]  : i64
+// CHECK:         %[[IDX:.*]] = llvm.sub %[[C1]], %[[C0]] overflow<nsw> : i64
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
+// CHECK:         %[[ADJ:.*]] = llvm.sub %[[C10]], %[[C0]] overflow<nsw> : i64
+// CHECK:         %[[DIFF1:.*]] = llvm.add %[[DIFF0]], %[[ADJ]] overflow<nsw> : i64
+// CHECK:         %[[STRIDE:.*]] = llvm.mul %[[DIFF1]], %[[C1_1]] overflow<nsw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[STRIDE]], %[[C0_1]] overflow<nsw> : i64
 // CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 
 // Conversion with index type shape and slice
@@ -2134,13 +2134,13 @@ func.func @ext_array_coor5(%arg0: !fir.ref<!fir.array<?xi32>>, %idx1 : index, %i
 // CHECK-SAME:        %[[VAL_0:.*]]: !llvm.ptr, %[[VAL_1:.*]]: i64, %[[VAL_2:.*]]: i64, %[[VAL_3:.*]]: i64, %[[VAL_4:.*]]: i64, %[[VAL_5:.*]]: i64) {
 // CHECK:           %[[VAL_6:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:           %[[VAL_7:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:           %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]]  : i64
-// CHECK:           %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]]  : i64
-// CHECK:           %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]]  : i64
-// CHECK:           %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]]  : i64
+// CHECK:           %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] overflow<nsw> : i64
+// CHECK:           %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] overflow<nsw> : i64
+// CHECK:           %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] overflow<nsw> : i64
+// CHECK:           %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] overflow<nsw> : i64
 // CHECK:           %[[VAL_16:.*]] = llvm.getelementptr %[[VAL_0]][%[[VAL_13]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 // CHECK:         }
 
@@ -2155,27 +2155,27 @@ func.func @ext_array_coor6(%arg0: !fir.ref<!fir.array<?x?x?xi32>>, %idx1 : index
 // CHECK-SAME:        %[[VAL_0:.*]]: !llvm.ptr, %[[VAL_1:.*]]: i64, %[[VAL_2:.*]]: i64, %[[VAL_3:.*]]: i64, %[[VAL_4:.*]]: i64, %[[VAL_5:.*]]: i64) {
 // CHECK:           %[[VAL_6:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:           %[[VAL_7:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:           %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]]  : i64
-// CHECK:           %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]]  : i64
-// CHECK:           %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]]  : i64
-// CHECK:           %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]]  : i64
-// CHECK:           %[[VAL_15:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_16:.*]] = llvm.mul %[[VAL_15]], %[[VAL_4]]  : i64
-// CHECK:           %[[VAL_17:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_18:.*]] = llvm.add %[[VAL_16]], %[[VAL_17]]  : i64
-// CHECK:           %[[VAL_19:.*]] = llvm.mul %[[VAL_18]], %[[VAL_14]]  : i64
-// CHECK:           %[[VAL_20:.*]] = llvm.add %[[VAL_19]], %[[VAL_13]]  : i64
-// CHECK:           %[[VAL_21:.*]] = llvm.mul %[[VAL_14]], %[[VAL_1]]  : i64
-// CHECK:           %[[VAL_22:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_23:.*]] = llvm.mul %[[VAL_22]], %[[VAL_4]]  : i64
-// CHECK:           %[[VAL_24:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]]  : i64
-// CHECK:           %[[VAL_25:.*]] = llvm.add %[[VAL_23]], %[[VAL_24]]  : i64
-// CHECK:           %[[VAL_26:.*]] = llvm.mul %[[VAL_25]], %[[VAL_21]]  : i64
-// CHECK:           %[[VAL_27:.*]] = llvm.add %[[VAL_26]], %[[VAL_20]]  : i64
-// CHECK:           %[[VAL_28:.*]] = llvm.mul %[[VAL_21]], %[[VAL_1]]  : i64
+// CHECK:           %[[VAL_8:.*]] = llvm.sub %[[VAL_5]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_9:.*]] = llvm.mul %[[VAL_8]], %[[VAL_4]] overflow<nsw> : i64
+// CHECK:           %[[VAL_10:.*]] = llvm.sub %[[VAL_2]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_11:.*]] = llvm.add %[[VAL_9]], %[[VAL_10]] overflow<nsw> : i64
+// CHECK:           %[[VAL_12:.*]] = llvm.mul %[[VAL_11]], %[[VAL_6]] overflow<nsw> : i64
+// CHECK:           %[[VAL_13:.*]] = llvm.add %[[VAL_12]], %[[VAL_7]] overflow<nsw> : i64
+// CHECK:           %[[VAL_14:.*]] = llvm.mul %[[VAL_6]], %[[VAL_1]] overflow<nsw> : i64
+// CHECK:           %[[VAL_15:.*]] = llvm....
[truncated]

jeanPerier

Please wait for @vzakhari opinion, but since arithmetic overflow in address computation is not a feature in Fortran, this sounds reasonable to me.

vzakhari · 2023-12-07T17:48:33Z

Thank you for the changes, Tom. It looks good to me. I just want to clarify that this is not the place that I talked about in the RFC. I was talking about setting nsw on the do-variable increment that is generated by the loop lowering. Adding nsw there is what enables the hoisting I mentioned in the RFC - in this case, the nsws on the address computations might be not that important, because LLVM was able to optimize without them.

kiranchandramohan · 2023-12-07T17:49:18Z

Not for this patch, but are there other places where we could add nsw for address or offset calculations, places that call or compute offset, genBoxOffsetGep etc?

tblah · 2023-12-08T10:51:03Z

Thanks for explaining and reviewing. I'll merge this then make a new patch for the do increment.

I don't have hardware to test this myself. Would anyone else be able to verify if it works?

tblah · 2023-12-08T14:59:31Z

I was talking about setting nsw on the do-variable increment that is generated by the loop lowering.

@vzakhari Are you sure if this is allowed? do-variables could definitely overflow. Is integer overflow completely undefined in Fortran? I can't find any mention of it in the standard.

jeanPerier · 2023-12-08T16:45:55Z

I was talking about setting nsw on the do-variable increment that is generated by the loop lowering.

@vzakhari Are you sure if this is allowed? do-variables could definitely overflow. Is integer overflow completely undefined in Fortran? I can't find any mention of it in the standard.

I think so. Adding @klausler to validate, but gfortran doc is telling this is forbidden (https://gcc.gnu.org/onlinedocs/gfortran/Behavior-on-integer-overflow.html).

I think the rational is that Fortran specifies the INTEGER type as a subset of mathematical integers in 7.4.3.1, and it also says that aside from parentheses order, Fortran is free to rewrite any expression to its mathematical equivalent (10.1.5.2.4 point 2.).

So it would be valid for the compiler to fold MAX_INT + 1 .le. 0 to .false. since this is mathematically correct. Hence a Fortran programmer should never rely on integer overflow behavior.

In fact, you can check that this is the case with gfortran on the following funny example (gfortran likely rewrites n + 1 .le. 0 to n .le. 1 because it is allowed to as per 10.1.5.2.4 point 2.)

 integer :: n = 2147483647
 call test(n, n+1)
end

subroutine test(n, n_plus_one)
 integer :: n, n_plus_one
 print *, n+1 .le. 0
 print *, n_plus_one .le. 0
end

gfortran 13.2 outputs F T.

klausler · 2023-12-08T16:54:09Z

I don't know what "nsw" means, but will just add a reminder that Fortran DO loop iteration counts are calculated before the loop starts. Don't use the current value of the DO loop index variable for loop termination testing -- just increment it each time around.

jeanPerier · 2023-12-08T17:32:24Z

I don't know what "nsw" means, but will just add a reminder that Fortran DO loop iteration counts are calculated before the loop starts. Don't use the current value of the DO loop index variable for loop termination testing -- just increment it each time around.

Thanks for the asnwer, lowering is not using the do loop index for termination testing. My example with n + 1 .le. 0 had nothing to do with the DO LOOP lowering and was misleading in this discussion.

It is just an example that shows what kind of optimization LLVM can do if nsw is set on additions. n + 1 .le. 0 can be rewritten to n . le. 1 by LLVM only if nsw is set on the add. nsw tells LLVM that overflow in the addition are illegal and LLVM do not need to generate code where the overflow would be honored: it can rewrite arithmetic operations to mathematically equivalent forms, and not only "2 complements equivalent form".

LLVM example:

define dso_local i1 @test_nsw(i32 noundef %0) {
  %2 = add nsw i32 %0, 10
  %3 = icmp sle i32 %2, 0
  ret i1 %3
}

define dso_local i1 @test_no_nsw(i32 noundef %0) {
  %2 = add i32 %0, 10
  %3 = icmp sle i32 %2, 0
  ret i1 %3
}

opt -O3 -S: the add can be optimized out with nsw but not without it:

define dso_local i1 @test_nsw(i32 noundef %0) local_unnamed_addr #0 {
  %2 = icmp slt i32 %0, -9
  ret i1 %2
}

define dso_local i1 @test_no_nsw(i32 noundef %0) local_unnamed_addr #0 {
  %2 = add i32 %0, 10
  %3 = icmp slt i32 %2, 1
  ret i1 %3
}

Slava perf issue comes from a DO loop index that is being used in user code/addressing after being incremented. We do not add the "nsw" flag on the add that increments the do loop variable inside the loop currently, so llvm thinks that it has to honor any do loop overflow in usages of the do loop variable after its increment. I think the question of Tom is: are we supposed to honor do loop overflow inside the body of the loop? And I think Fortran does allow user code with integers to overflow, so this flag can be set.

klausler · 2023-12-08T17:42:59Z

What do you mean by "honor do loop overflow"?

Integer overflow makes a program non-conformant. The language doesn't give any guarantees about the result. But existing code that depends on overflow & underflow canceling out is out there; for example, computing ((N+1)-1) and expecting N.

vzakhari · 2023-12-08T19:25:55Z

Tom, I understand your concern. I think adding no signed wrap on the do-variable increment may be a problem depending on how LLVM would optimize the loop ivs.

For example:

program main
  real :: x(10)
  call test(x, -127_1, 127_1, 10_1)
contains
  subroutine test(x,b,e,n)
    integer(1) :: i, b, e, n
    real :: x(:)
    do i=b,e
       x(MOD(i,n)) = 1.0
    end do
    print *, i
  end subroutine test
end program main

There is signed wrap on the final increment of i. At the same time, Flang uses a separate loop iteration control of an extended type, so the loop control will not trigger the signed wrap and the loop execution will complete.

I guess setting nsw on the i increment may lead LLVM to erroneously reason about the range of iteration space. I am not sure if such a deduction may lead to incorrect optimization of the compiler-generated loop iteration control, though.

We can try to add nsw under an option and investigate further. At least, there will be an option for conformant programs.

P.S. btw, gfortran compiles the above code into an infinite loop.

…#91579) This patch adds nsw flag to the increment of do-variables when a new option is enabled. NOTE 11.10 in the Fortran 2018 standard says they never overflow. See also the discussion in llvm#74709 and the following discourse post. https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/5

tblah requested review from kiranchandramohan, jeanPerier and vzakhari December 7, 2023 10:44

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Dec 7, 2023

jeanPerier approved these changes Dec 7, 2023

View reviewed changes

vzakhari approved these changes Dec 7, 2023

View reviewed changes

tblah merged commit bdacd56 into llvm:main Dec 8, 2023
7 checks passed

tblah added a commit to tblah/llvm-project that referenced this pull request Dec 8, 2023

[flang] fix ppc test broken after llvm#74709

7549389

I don't have hardware to test this myself. Would anyone else be able to verify if it works?

tblah added a commit that referenced this pull request Dec 11, 2023

[flang] fix ppc test broken after #74709 (#74826)

a9adcef

kkwli added a commit to kkwli/llvm-project that referenced this pull request Dec 11, 2023

[flang] update ppc-vec-store-elem-order.f90 after llvm#74709 (NFC)

d5d7eb4

kkwli added a commit that referenced this pull request Dec 11, 2023

[flang] update ppc-vec-store-elem-order.f90 after #74709 (NFC) (#75064)

a4e67de

yus3710-fj mentioned this pull request May 9, 2024

[flang] Add nsw flag to do-variable increment with a new option #91579

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang][CodeGen] add nsw to address calculations #74709

[flang][CodeGen] add nsw to address calculations #74709

tblah commented Dec 7, 2023

llvmbot commented Dec 7, 2023 •

edited

jeanPerier left a comment

vzakhari commented Dec 7, 2023

kiranchandramohan commented Dec 7, 2023

tblah commented Dec 8, 2023

tblah commented Dec 8, 2023

jeanPerier commented Dec 8, 2023 •

edited

klausler commented Dec 8, 2023

jeanPerier commented Dec 8, 2023

klausler commented Dec 8, 2023

vzakhari commented Dec 8, 2023

[flang][CodeGen] add nsw to address calculations #74709

[flang][CodeGen] add nsw to address calculations #74709

Conversation

tblah commented Dec 7, 2023

llvmbot commented Dec 7, 2023 • edited

jeanPerier left a comment

Choose a reason for hiding this comment

vzakhari commented Dec 7, 2023

kiranchandramohan commented Dec 7, 2023

tblah commented Dec 8, 2023

tblah commented Dec 8, 2023

jeanPerier commented Dec 8, 2023 • edited

klausler commented Dec 8, 2023

jeanPerier commented Dec 8, 2023

klausler commented Dec 8, 2023

vzakhari commented Dec 8, 2023

llvmbot commented Dec 7, 2023 •

edited

jeanPerier commented Dec 8, 2023 •

edited