Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang] TSVC s314: fcmp doesn't have fast-math flags #74263

Closed
yus3710-fj opened this issue Dec 4, 2023 · 4 comments
Closed

[Flang] TSVC s314: fcmp doesn't have fast-math flags #74263

yus3710-fj opened this issue Dec 4, 2023 · 4 comments

Comments

@yus3710-fj
Copy link
Contributor

Flang can't vectorize the loop in s314 of TSVC while Clang can vectorize the loop written in C.

! Fortran version
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
        if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue
// C version
for (int nl = 0; nl < ntimes; nl++) {
  x = a[0];
  for (int i = 1; i < n; i++) {
    if (a[i] > x) {
      x = a[i];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, x);
}
$ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   17 |                 for (int i = 0; i < LEN; i++) {
      |                 ^

fcmp should have fast-math flags to be recognized as max reduction but Flang doesn't support.

.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !21
  %23 = load float, ptr %gep, align 4, !dbg !21, !tbaa !15
  %24 = fcmp ogt float %23, %22, !dbg !21
  br i1 %24, label %25, label %26, !dbg !21

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !23, !tbaa !15
  br label %26, !dbg !21

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !24
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !21
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !21
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 4, 2023

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s314` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.
! Fortran version
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
        if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue
// C version
for (int nl = 0; nl &lt; ntimes; nl++) {
  x = a[0];
  for (int i = 1; i &lt; n; i++) {
    if (a[i] &gt; x) {
      x = a[i];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, x);
}
$ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   17 |                 for (int i = 0; i &lt; LEN; i++) {
      |                 ^

fcmp should have fast-math flags to be recognized as max reduction but Flang doesn't support.

.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @<!-- -->_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !21
  %23 = load float, ptr %gep, align 4, !dbg !21, !tbaa !15
  %24 = fcmp ogt float %23, %22, !dbg !21
  br i1 %24, label %25, label %26, !dbg !21

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !23, !tbaa !15
  br label %26, !dbg !21

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !24
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !21
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !21

@tblah
Copy link
Contributor

tblah commented Dec 4, 2023

Thank you for reporting this. I will take a look.

@tblah tblah self-assigned this Dec 4, 2023
tblah added a commit to tblah/llvm-project that referenced this issue Dec 4, 2023
llvm.fcmp does support fast math attributes therefore so should
arith.cmpf.

The heavy churn in flang tests are because flang sets fastmath<contract>
by default on all operations that support the fast math interface.
Downstream users of MLIR should not be so effected.

This was requested in llvm#74263
tblah added a commit that referenced this issue Dec 6, 2023
`llvm.fcmp` does support fast math attributes therefore so should
`arith.cmpf`.

The heavy churn in flang tests are because flang sets
`fastmath<contract>` by default on all operations that support the fast
math interface. Downstream users of MLIR should not be so effected.

This was requested in #74263
@tblah
Copy link
Contributor

tblah commented Dec 6, 2023

@yus3710-fj I think this should be fixed now: #74315

Please could you confirm that this did fix your use case

@yus3710-fj
Copy link
Contributor Author

Thank you for the fix.
That works well for my use case.

$ flang-new -Ofast s314.f -S -emit-llvm -v
flang-new version 18.0.0git (https://github.com/llvm/llvm-project.git b683709ea6eec7d0a388bd50c571774c9b9ffdb7)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -emit-llvm -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -mframe-pointer=non-leaf -O3 -o s314.ll -x f95-cpp-input s314.f
; s314.ll
.lr.ph:                                           ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !24
  %23 = load float, ptr %gep, align 4, !dbg !24, !tbaa !26
  %24 = fcmp fast ogt float %23, %22, !dbg !24 ; fast-math flag is set now!
  br i1 %24, label %25, label %26, !dbg !24

25:                                               ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !31, !tbaa !15
  br label %26, !dbg !24

26:                                               ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !24
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants