Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable no-math-errno and reciprocal optimizations. #4693

Merged
merged 2 commits into from
Feb 7, 2023

Conversation

nilsdeppe
Copy link
Member

@nilsdeppe nilsdeppe commented Feb 3, 2023

Proposed changes

  • no-math-errno allows the compiler to switch things like sqrt to vector intrinsics.
  • Enable reciprocal math. We generally don't care about the difference between a / b and a * one_over_b.

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.

Further comments

This enables more auto-vectorization and makes it easier for the compiler to
optimize code.
Comment on lines 90 to 95
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMA>
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMA>
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMAF>
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMAF>
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMAL>
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMAL>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these macros are output from the standard library reporting support for features, not input to the library requesting features. They should be set by -mfma (probably included in -march if appropriate). Testing with gcc on my machine doesn't show any effect from defining these, but only gives an fma instruction with -mfma (which also defines the macros).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests GCC defined them but clang unfortunately didn't, even if I have -O3 -mfma -march=native on a machine that supports fma. I found this rather surprising and annoying...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that clang is missing the definitions, but defining them on the command line still doesn't enable fma instructions or anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're saying now. I misinterpreted the cppreference wording as "You can define these to enable FMA" instead of "If these are defined, you are guaranteed an FMA". I'll drop the commit. Thanks!

@nilsdeppe
Copy link
Member Author

Okay, dropped the last commit, nothing else changed! Thanks for the review!

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have gone through the others in more detail when I first looked at this. It looks like another one might also be problematic.

INTERFACE_COMPILE_OPTIONS
$<$<COMPILE_LANGUAGE:C>:-ffp-contract=on>
$<$<COMPILE_LANGUAGE:CXX>:-ffp-contract=on>
$<$<COMPILE_LANGUAGE:Fortran>:-ffp-contract=on>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this one in a bit more detail, and it looks like this option means different things in clang and gcc. For clang "on" is the default, and for GCC it looks like "on" actually turns off converting things to FMA instructions. So we should probably leave this one out.

GCC 11.3.1:

   -ffp-contract=style
       -ffp-contract=off disables floating-point expression contraction.
       -ffp-contract=fast enables floating-point expression contraction
       such as forming of fused multiply-add operations if the target has
       native support for them.  -ffp-contract=on enables floating-point
       expression contraction if allowed by the language standard.  This
       is currently not implemented and treated equal to
       -ffp-contract=off.

       The default is -ffp-contract=fast.

Clang (whatever version is documented on their website):

-ffp-contract=<arg>

Form fused FP ops (e.g. FMAs): fast (fuses across statements disregarding pragmas) | on (only fuses in the same statement unless dictated by pragmas) | off (never fuses) | fast-honor-pragmas (fuses across statements unless dictated by pragmas). Default is ‘fast’ for CUDA, ‘fast-honor-pragmas’ for HIP, and ‘on’ otherwise. must be ‘fast’, ‘on’, ‘off’ or ‘fast-honor-pragmas’.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... If only things were more consistent. I agree, let's omit. If we need to change to fast-honor-pragmas for clang later we can!

@nilsdeppe
Copy link
Member Author

Okay, pushed an update again!

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good. Got to love it when "on" is a synonym for "off".

@wthrowe wthrowe merged commit a1dd492 into sxs-collaboration:develop Feb 7, 2023
@nilsvu
Copy link
Member

nilsvu commented Feb 7, 2023

Could you update the PR description and title because I think you dropped some changes?

@nilsdeppe nilsdeppe changed the title Enable FMA and reciprocal optimizations. Enable no-math-errno and reciprocal optimizations. Feb 7, 2023
@nilsdeppe nilsdeppe deleted the faster_math_flags branch September 14, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants