-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang] TSVC s314: fcmp
doesn't have fast-math flags
#74263
Labels
Comments
@llvm/issue-subscribers-flang-ir Author: Yusuke MINATO (yus3710-fj)
Flang can't vectorize the loop in `s314` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.
! Fortran version
do 1 nl = 1,ntimes
x = a(1)
do 10 i = 2,n
if(a(i) .gt. x) x = a(i)
10 continue
call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
1 continue // C version
for (int nl = 0; nl < ntimes; nl++) {
x = a[0];
for (int i = 1; i < n; i++) {
if (a[i] > x) {
x = a[i];
}
}
dummy(a, b, c, d, e, aa, bb, cc, x);
} $ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
"/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
17 | for (int i = 0; i < LEN; i++) {
| ^
|
Thank you for reporting this. I will take a look. |
tblah
added a commit
to tblah/llvm-project
that referenced
this issue
Dec 4, 2023
llvm.fcmp does support fast math attributes therefore so should arith.cmpf. The heavy churn in flang tests are because flang sets fastmath<contract> by default on all operations that support the fast math interface. Downstream users of MLIR should not be so effected. This was requested in llvm#74263
tblah
added a commit
that referenced
this issue
Dec 6, 2023
`llvm.fcmp` does support fast math attributes therefore so should `arith.cmpf`. The heavy churn in flang tests are because flang sets `fastmath<contract>` by default on all operations that support the fast math interface. Downstream users of MLIR should not be so effected. This was requested in #74263
@yus3710-fj I think this should be fixed now: #74315 Please could you confirm that this did fix your use case |
Thank you for the fix. $ flang-new -Ofast s314.f -S -emit-llvm -v
flang-new version 18.0.0git (https://github.com/llvm/llvm-project.git b683709ea6eec7d0a388bd50c571774c9b9ffdb7)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
"/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -emit-llvm -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -mframe-pointer=non-leaf -O3 -o s314.ll -x f95-cpp-input s314.f ; s314.ll
.lr.ph: ; preds = %.lr.ph.preheader, %26
%indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
%22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
%gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !24
%23 = load float, ptr %gep, align 4, !dbg !24, !tbaa !26
%24 = fcmp fast ogt float %23, %22, !dbg !24 ; fast-math flag is set now!
br i1 %24, label %25, label %26, !dbg !24
25: ; preds = %.lr.ph
store float %23, ptr %12, align 4, !dbg !31, !tbaa !15
br label %26, !dbg !24
26: ; preds = %25, %.lr.ph
%27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !32
%exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !24
br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !24 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Flang can't vectorize the loop in
s314
of TSVC while Clang can vectorize the loop written in C.fcmp
should have fast-math flags to be recognized as max reduction but Flang doesn't support.The text was updated successfully, but these errors were encountered: