-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable no-math-errno and reciprocal optimizations. #4693
Enable no-math-errno and reciprocal optimizations. #4693
Conversation
This enables more auto-vectorization and makes it easier for the compiler to optimize code.
cmake/SetupCxxFlags.cmake
Outdated
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMA> | ||
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMA> | ||
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMAF> | ||
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMAF> | ||
$<$<COMPILE_LANGUAGE:CXX>:FP_FAST_FMAL> | ||
$<$<COMPILE_LANGUAGE:C>:FP_FAST_FMAL>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these macros are output from the standard library reporting support for features, not input to the library requesting features. They should be set by -mfma
(probably included in -march
if appropriate). Testing with gcc on my machine doesn't show any effect from defining these, but only gives an fma instruction with -mfma
(which also defines the macros).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests GCC defined them but clang unfortunately didn't, even if I have -O3 -mfma -march=native
on a machine that supports fma. I found this rather surprising and annoying...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that clang is missing the definitions, but defining them on the command line still doesn't enable fma instructions or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're saying now. I misinterpreted the cppreference wording as "You can define these to enable FMA" instead of "If these are defined, you are guaranteed an FMA". I'll drop the commit. Thanks!
ad0ca78
to
780cb26
Compare
Okay, dropped the last commit, nothing else changed! Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have gone through the others in more detail when I first looked at this. It looks like another one might also be problematic.
cmake/SetupCxxFlags.cmake
Outdated
INTERFACE_COMPILE_OPTIONS | ||
$<$<COMPILE_LANGUAGE:C>:-ffp-contract=on> | ||
$<$<COMPILE_LANGUAGE:CXX>:-ffp-contract=on> | ||
$<$<COMPILE_LANGUAGE:Fortran>:-ffp-contract=on>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this one in a bit more detail, and it looks like this option means different things in clang and gcc. For clang "on" is the default, and for GCC it looks like "on" actually turns off converting things to FMA instructions. So we should probably leave this one out.
GCC 11.3.1:
-ffp-contract=style -ffp-contract=off disables floating-point expression contraction. -ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them. -ffp-contract=on enables floating-point expression contraction if allowed by the language standard. This is currently not implemented and treated equal to -ffp-contract=off. The default is -ffp-contract=fast.
Clang (whatever version is documented on their website):
-ffp-contract=<arg>
Form fused FP ops (e.g. FMAs): fast (fuses across statements disregarding pragmas) | on (only fuses in the same statement unless dictated by pragmas) | off (never fuses) | fast-honor-pragmas (fuses across statements unless dictated by pragmas). Default is ‘fast’ for CUDA, ‘fast-honor-pragmas’ for HIP, and ‘on’ otherwise. must be ‘fast’, ‘on’, ‘off’ or ‘fast-honor-pragmas’.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh... If only things were more consistent. I agree, let's omit. If we need to change to fast-honor-pragmas
for clang later we can!
780cb26
to
f293785
Compare
Okay, pushed an update again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good. Got to love it when "on" is a synonym for "off".
Could you update the PR description and title because I think you dropped some changes? |
Proposed changes
no-math-errno
allows the compiler to switch things likesqrt
to vector intrinsics.a / b
anda * one_over_b
.Upgrade instructions
Code review checklist
make doc
to generate the documentation locally intoBUILD_DIR/docs/html
.Then open
index.html
.code review guide.
bugfix
ornew feature
if appropriate.Further comments