[Flang] TSVC s314: `fcmp` doesn't have fast-math flags #74263

yus3710-fj · 2023-12-04T00:21:33Z

Flang can't vectorize the loop in s314 of TSVC while Clang can vectorize the loop written in C.

! Fortran version
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
        if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue

// C version
for (int nl = 0; nl < ntimes; nl++) {
  x = a[0];
  for (int i = 1; i < n; i++) {
    if (a[i] > x) {
      x = a[i];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, x);
}

$ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   17 |                 for (int i = 0; i < LEN; i++) {
      |                 ^

fcmp should have fast-math flags to be recognized as max reduction but Flang doesn't support.

.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !21
  %23 = load float, ptr %gep, align 4, !dbg !21, !tbaa !15
  %24 = fcmp ogt float %23, %22, !dbg !21
  br i1 %24, label %25, label %26, !dbg !21

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !23, !tbaa !15
  br label %26, !dbg !21

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !24
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !21
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !21

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-12-04T00:21:49Z

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s314` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.

! Fortran version
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
        if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue

// C version
for (int nl = 0; nl &lt; ntimes; nl++) {
  x = a[0];
  for (int i = 1; i &lt; n; i++) {
    if (a[i] &gt; x) {
      x = a[i];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, x);
}

$ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   17 |                 for (int i = 0; i &lt; LEN; i++) {
      |                 ^

fcmp should have fast-math flags to be recognized as max reduction but Flang doesn't support.

.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @<!-- -->_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !21
  %23 = load float, ptr %gep, align 4, !dbg !21, !tbaa !15
  %24 = fcmp ogt float %23, %22, !dbg !21
  br i1 %24, label %25, label %26, !dbg !21

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !23, !tbaa !15
  br label %26, !dbg !21

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !24
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !21
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !21

tblah · 2023-12-04T09:49:40Z

Thank you for reporting this. I will take a look.

llvm.fcmp does support fast math attributes therefore so should arith.cmpf. The heavy churn in flang tests are because flang sets fastmath<contract> by default on all operations that support the fast math interface. Downstream users of MLIR should not be so effected. This was requested in llvm#74263

`llvm.fcmp` does support fast math attributes therefore so should `arith.cmpf`. The heavy churn in flang tests are because flang sets `fastmath<contract>` by default on all operations that support the fast math interface. Downstream users of MLIR should not be so effected. This was requested in #74263

tblah · 2023-12-06T10:20:42Z

@yus3710-fj I think this should be fixed now: #74315

Please could you confirm that this did fix your use case

yus3710-fj · 2023-12-07T02:48:32Z

Thank you for the fix.
That works well for my use case.

$ flang-new -Ofast s314.f -S -emit-llvm -v
flang-new version 18.0.0git (https://github.com/llvm/llvm-project.git b683709ea6eec7d0a388bd50c571774c9b9ffdb7)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -emit-llvm -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -mframe-pointer=non-leaf -O3 -o s314.ll -x f95-cpp-input s314.f

; s314.ll
.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !24
  %23 = load float, ptr %gep, align 4, !dbg !24, !tbaa !26
  %24 = fcmp fast ogt float %23, %22, !dbg !24 ; fast-math flag is set now!
  br i1 %24, label %25, label %26, !dbg !24

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !31, !tbaa !15
  br label %26, !dbg !24

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !24
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !24

yus3710-fj added performance flang:ir labels Dec 4, 2023

yus3710-fj mentioned this issue Dec 4, 2023

[Flang] TSVC s314, s3111: not vectorized because the loops are not recognized as reduction loops #74264

Closed

tblah self-assigned this Dec 4, 2023

tblah mentioned this issue Dec 4, 2023

[mlir][flang] add fast math attribute to fcmp #74315

Merged

yus3710-fj closed this as completed Dec 7, 2023

yus3710-fj mentioned this issue Jan 24, 2024

[Flang] TSVC s314: needs fast-math flags in function attributes for vectorization #79257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flang] TSVC s314: `fcmp` doesn't have fast-math flags #74263

[Flang] TSVC s314: `fcmp` doesn't have fast-math flags #74263

yus3710-fj commented Dec 4, 2023

llvmbot commented Dec 4, 2023

tblah commented Dec 4, 2023

tblah commented Dec 6, 2023

yus3710-fj commented Dec 7, 2023

[Flang] TSVC s314: fcmp doesn't have fast-math flags #74263

[Flang] TSVC s314: fcmp doesn't have fast-math flags #74263

Comments

yus3710-fj commented Dec 4, 2023

llvmbot commented Dec 4, 2023

tblah commented Dec 4, 2023

tblah commented Dec 6, 2023

yus3710-fj commented Dec 7, 2023

[Flang] TSVC s314: `fcmp` doesn't have fast-math flags #74263

[Flang] TSVC s314: `fcmp` doesn't have fast-math flags #74263