[flang] Eliminate workaround for optimizing maxval Fortran intrinsic #65814

unterumarmung · 2023-09-08T21:44:47Z

Following #65213, the lowering of the arith.maxf operation has transitioned from using the maxnum LLVM intrinsic to maximum. This modification renders the statement in the deleted here comment obsolete and the associated workaround unnecessary. Consequently, this commit removes the workaround and rectifies related test cases.

Following llvm#65213, the optimization of the `arith.maxf` operation has transitioned from using the `maxnum` LLVM intrinsic to `maximum`. This modification renders the statement in the previous commit obsolete and the associated workaround unnecessary. Consequently, this commit removes the workaround and rectifies related test cases.

unterumarmung · 2023-09-08T21:47:43Z

@vzakhari It seems like you've written this workaround. Could you please review the PR and tell if this change is ok?

unterumarmung · 2023-09-08T21:48:40Z

Also, if this PR is going to be merged, I would do it after #65800 is landed.

vzakhari · 2023-09-08T22:01:02Z

Thank you! Would you be able to check what happens with F128 maxval case right now? Does LLVM call fmaxl (with and w/o -ffast-math)?

unterumarmung · 2023-09-09T08:41:00Z

Thank you! Would you be able to check what happens with F128 maxval case right now? Does LLVM call fmaxl (with and w/o -ffast-math)?

I'll try. Could you share some Fortran program reproducer, please?

unterumarmung · 2023-09-09T09:43:15Z

So I used this program to reproduce:

program quad_precision_maxval
  use, intrinsic :: iso_fortran_env, only: real128

  real(real128) :: quad_array(5)
  quad_array = [1.0_16, 2.0_16, 3.0_16, 4.0_16, 5.0_16]

  max_value = maxval(quad_array)
  print *, "Maximum value in the array:", max_value
end program quad_precision_maxval

No fast math:

$ ./build/bin/flang-new f.f90

$ ./a.out
Maximum value in the array: 5

$ objdump -d a.out | grep fmaxl

Works fine and no fmaxl mentions

Fast math enabled:

$ ./build/bin/flang-new -ffast-math f.f90

$ ./a.out
Maximum value in the array: 5

$ objdump -d a.out | grep fmaxl

Works fine and no fmaxl mentions

So, is this enough or have I missed something?

vzakhari · 2023-09-09T17:02:53Z

Yes, it might be a bit complex to properly check this.

Please try this:

subroutine test(x, t)
  real(16) :: x(100), t
  t = maxval(x)
end subroutine test

flang-new -O2 -ffast-math -c maxval.f90:

LLVM ERROR: Cannot select: t27: f128 = fmaximum nnan ninf nsz arcp contract afn reassoc t26, t25
  t26: f128,ch = load<(load (s128) from %ir.scevgep, !tbaa !3)> t0, t5, undef:i64
    t5: i64 = add t2, t4
      t2: i64,ch = CopyFromReg t0, Register:i64 %4
        t1: i64 = Register %4
      t4: i64,ch = CopyFromReg t0, Register:i64 %0
        t3: i64 = Register %0
    t9: i64 = undef
  t25: f128 = fmaximum nnan ninf nsz arcp contract afn reassoc t24, t21
    t24: f128,ch = load<(load (s128) from %ir.scevgep2, !tbaa !3)> t0, t23, undef:i64
      t23: i64 = add t5, Constant:i64<-16>
        t5: i64 = add t2, t4
          t2: i64,ch = CopyFromReg t0, Register:i64 %4
            t1: i64 = Register %4
          t4: i64,ch = CopyFromReg t0, Register:i64 %0
            t3: i64 = Register %0
        t22: i64 = Constant<-16>
      t9: i64 = undef
    t21: f128 = fmaximum nnan ninf nsz arcp contract afn reassoc t20, t17
      t20: f128,ch = load<(load (s128) from %ir.scevgep4, !tbaa !3)> t0, t19, undef:i64
        t19: i64 = add t5, Constant:i64<-32>
          t5: i64 = add t2, t4
            t2: i64,ch = CopyFromReg t0, Register:i64 %4
              t1: i64 = Register %4
            t4: i64,ch = CopyFromReg t0, Register:i64 %0
              t3: i64 = Register %0
          t18: i64 = Constant<-32>
        t9: i64 = undef
      t17: f128 = fmaximum nnan ninf nsz arcp contract afn reassoc t16, t13
        t16: f128,ch = load<(load (s128) from %ir.scevgep6, !tbaa !3)> t0, t15, undef:i64
          t15: i64 = add t5, Constant:i64<-48>
            t5: i64 = add t2, t4
              t2: i64,ch = CopyFromReg t0, Register:i64 %4
                t1: i64 = Register %4
              t4: i64,ch = CopyFromReg t0, Register:i64 %0
                t3: i64 = Register %0
            t14: i64 = Constant<-48>
          t9: i64 = undef
        t13: f128 = fmaximum nnan ninf nsz arcp contract afn reassoc t10, t12
          t10: f128,ch = load<(load (s128) from %ir.scevgep8, !tbaa !3)> t0, t7, undef:i64
            t7: i64 = add t5, Constant:i64<-64>
              t5: i64 = add t2, t4
                t2: i64,ch = CopyFromReg t0, Register:i64 %4
                  t1: i64 = Register %4
                t4: i64,ch = CopyFromReg t0, Register:i64 %0
                  t3: i64 = Register %0
              t6: i64 = Constant<-64>
            t9: i64 = undef
          t12: f128,ch = CopyFromReg t0, Register:f128 %1
            t11: f128 = Register %1
In function: test_

unterumarmung · 2023-09-09T20:23:50Z

Oh, so I forgot about optimizations flags in Flang..
Should we file an issue for the LLVM codegen and close this for now? I guess it should not fail with ICE

vzakhari · 2023-09-09T21:32:01Z

Yes, I am reluctant to merge this change if it is going to cause tests failures. Filing an issue for LLVM would be great.

unterumarmung · 2023-09-10T10:33:35Z

@vzakhari please take a look #65886

unterumarmung · 2023-09-12T22:54:27Z

@vzakhari
Maybe we should implement this change anyways, but keep the workaround for the 128-bit numbers? I guess they are not as common as any other. And I think it is better to lower to the llvm intrinsics as much as possible?

vzakhari · 2023-09-12T22:57:58Z

I am not sure that mlir::arith::MaxFOp has a lot of benefits over the select here. Most code generators will recognize the max idiom and optimize it accordingly based on the fast math flags provided. Can you please share the use case that you care about?

unterumarmung · 2023-09-12T23:02:38Z

There is no actual usecase. I just thought that narrowing the workaround down to the cases it is actually needed for would be better.

unterumarmung requested a review from a team as a code owner September 8, 2023 21:44

github-actions bot added the flang Flang issues not falling into any other category label Sep 8, 2023

unterumarmung requested a review from vzakhari September 8, 2023 21:45

unterumarmung mentioned this pull request Sep 10, 2023

Cannot select fmaximum for f128 #65886

Open

unterumarmung closed this Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang] Eliminate workaround for optimizing maxval Fortran intrinsic #65814

[flang] Eliminate workaround for optimizing maxval Fortran intrinsic #65814

unterumarmung commented Sep 8, 2023 •

edited

unterumarmung commented Sep 8, 2023

unterumarmung commented Sep 8, 2023

vzakhari commented Sep 8, 2023

unterumarmung commented Sep 9, 2023

unterumarmung commented Sep 9, 2023

vzakhari commented Sep 9, 2023

unterumarmung commented Sep 9, 2023

vzakhari commented Sep 9, 2023

unterumarmung commented Sep 10, 2023

unterumarmung commented Sep 12, 2023 •

edited

vzakhari commented Sep 12, 2023

unterumarmung commented Sep 12, 2023

[flang] Eliminate workaround for optimizing maxval Fortran intrinsic #65814

[flang] Eliminate workaround for optimizing maxval Fortran intrinsic #65814

Conversation

unterumarmung commented Sep 8, 2023 • edited

unterumarmung commented Sep 8, 2023

unterumarmung commented Sep 8, 2023

vzakhari commented Sep 8, 2023

unterumarmung commented Sep 9, 2023

unterumarmung commented Sep 9, 2023

vzakhari commented Sep 9, 2023

unterumarmung commented Sep 9, 2023

vzakhari commented Sep 9, 2023

unterumarmung commented Sep 10, 2023

unterumarmung commented Sep 12, 2023 • edited

vzakhari commented Sep 12, 2023

unterumarmung commented Sep 12, 2023

unterumarmung commented Sep 8, 2023 •

edited

unterumarmung commented Sep 12, 2023 •

edited