Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed clang -Ofast issue. #9409

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

SpaceSleuth
Copy link

@SpaceSleuth SpaceSleuth commented Jul 28, 2024

Fixed issue clang compiler flag -Ofast [Deprecating -Ofast Issue #9407]

CMakeLists.txt Outdated
@@ -346,7 +346,7 @@ endif()
if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
set(OPT_FLAGS_RELEASE "-O2")
else()
set(OPT_FLAGS_RELEASE "-Ofast")
set(OPT_FLAGS_RELEASE "-O3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang recommends to replace -Ofast with -O3 -ffast-math:

Users are advised to switch to -O3 -ffast-math if the use of non-standard math behavior is intended, and -O3 otherwise.

I don't think this PR should change the current behavior which enables fast math.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang recommends to replace -Ofast with -O3 -ffast-math:

Users are advised to switch to -O3 -ffast-math if the use of non-standard math behavior is intended, and -O3 otherwise.

I don't think this PR should change the current behavior which enables fast math.

I see, I noticed that there is a conflicting opinion between SChernykh and 0xFFFC0000.

As of now, I have removed the lines as suggested by 0xFFFC0000. Should I add the flag -ffast-math instead?

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Jul 28, 2024

Imho, if we remove the following lines we are at better shape:

if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
  set(OPT_FLAGS_RELEASE "-O2")
else()
  set(OPT_FLAGS_RELEASE "-O3")
endif()

CMake will add -O2 flag to release builds and -g to Debug builds by default.

Regarding -O3 vs -O2, I prefer -O2 since there are some unnecessary optimization enabled in -O3.

@SpaceSleuth
Copy link
Author

SpaceSleuth commented Jul 30, 2024

Imho, if we remove the following lines we are at better shape:

if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
  set(OPT_FLAGS_RELEASE "-O2")
else()
  set(OPT_FLAGS_RELEASE "-O3")
endif()

CMake will add -O2 flag to release builds and -g to Debug builds by default.

Regarding -O3 vs -O2, I prefer -O2 since there are some unnecessary optimization enabled in -O3.

Hi, removed thes lines. I noticed that there is a conflicting opinion between SChernykh and 0xFFFC0000.

As of now, I have removed the lines as suggested by 0xFFFC0000. Should I add the flag -ffast-math instead?

@SChernykh
Copy link
Contributor

My comment was only regarding the "equivalence" of -Ofast and -O3 -ffast-math

@0xFFFC0000 why don't you like -O3? Do we have performance tests showing that -O3 doesn't speed up wallet or node sync, for example? I'd rather keep this PR restricted to a purely technical thing of replacing a deprecated command line option with its equivalent. Testing different optimization levels and changing them deserves more research and a different PR.

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Jul 30, 2024

@SChernykh Usually -O3 does not provide meaningful performance difference.

To clarify, I am more hesitant about —fast-math than -O3. —fast-math enables -funsafe-math-optimizations which might have some performance improvements. But it is equivalent of silently shooting yourself in the foot.

About -O3, this is quite old [1], but I think the updated numbers would be roughly same too. Picture from figure 4.

IMG_7365

If I remember correctly most open source projects use -O2 by default. And this is Gentoo warning about compiling with -O3 [2]:

Using -O3 may cause some packages to break either during the compilation or misbehave at runtime, although -O3 retains standard conformance, hence any breakage is either undefined behaviour in the application, or (very rarely) a compiler bug.

In some cases -O3 might introduce slowdowns too [3], but this is not generally the case.

Overall, let me emphasize it again, I can live with -O3, but I am extremely hesitant about —ffast-math.

  1. Hoste, Kenneth, and Lieven Eeckhout. "Cole: compiler optimization level exploration." Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. 2008.
  2. https://wiki.gentoo.org/wiki/GCC_optimization
  3. https://www.phoronix.com/review/clang-gcc-opts/2

@SChernykh
Copy link
Contributor

Using -O3 may cause some packages to break either during the compilation or misbehave at runtime, although -O3 retains standard conformance, hence any breakage is either undefined behaviour in the application, or (very rarely) a compiler bug.

It's a slippery slope to abandon -O3 just because there might be an undefined behavior somewhere. UB is a bug and it should be fixed. It's better to run UBSAN as a part of unit tests.

@SChernykh
Copy link
Contributor

SChernykh commented Jul 30, 2024

As for -ffast-math, I'm not aware of any critical code paths using floats to begin with, they use integers (128-bit where needed) to always guarantee the same result. I took a quick glance over the code, and floats are mostly used for display output and in logs.

Edit: and in wallet2's gamma picker. But even there, the difference in rounding caused by -ffast-math will be negligible

@iamamyth
Copy link

Testing different optimization levels and changing them deserves more research and a different PR.

I agree, especially because users tend to be very sensitive to hash rates. Imagine, for example, that the graph above (labeled "Picture from figure 4") proves representative, and optimized builds boost daemon hash rates (or reduce overall daemon CPU usage) by 1%: Considering a single release build has a lot of downloads, and has a shelf life of 1-2 years, that 1% savings amounts to a huge win, despite the fact that the compilation takes a lot longer than -O2.

@0xFFFC0000
Copy link
Collaborator

I am repeating myself, but IMHO—ffast-math is unsafe.

@SChernykh
Copy link
Contributor

SChernykh commented Jul 30, 2024

--ffast-math is unsafe.

This is too broad statement and it lacks details. It's not unsafe per se, it just changes how floating point is implemented in a number of ways. The real difference between "normal" and "fast" math will be visible in a really edge cases: https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

Consider also that all Monero code has been running with it enabled this whole time.

TLDR: in 99.9% of cases, fast math will just result in different rounding of intermediate operations. As long as computation doesn't involve subtracting small values from huge values where relative rounding errors can be big, it's perfectly safe and will just give a slightly different result (a couple last bits in 64-bit double value will be different).

@0xFFFC0000
Copy link
Collaborator

@SChernykh
Copy link
Contributor

First, I don't consider it a bug-bug, rather it's a different behavior under ffast-math. This was already mentioned in the link I posted https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

When -ffast-math is used while linking, GCC will link with CRT startup code that sets FPU flags differently. For example on x86, it sets the SSE mxcsr FTZ and DAZ control bits, to flush subnormals to 0 instead of doing gradual underflow (which takes a microcode assist on many CPUs.) (FTZ = Flush To Zero for subnormal results, DAZ = Denormals Are Zero for subnormal inputs to instructions including compares.)

FTZ and DAZ flags are totally fine for 99.9% of calculations, and I don't know of any place in Monero code where denormals are used. So my post still stands.

@0xFFFC0000
Copy link
Collaborator

Or another example:

https://twitter.com/kwalfridsson/status/1450556903994675205

There are just too many moving parts with -ffast-math.

Edit: and in wallet2's gamma picker. But even there, the difference in rounding caused by -ffast-math will be negligible

We have random test failure there which I suspect caused by -ffast-math.

Having -ffast-math enabled by default is unsafe imho. It is like walking on landmine field and saying look there is no problem. But problem will happen once you walked few steps.

@SChernykh
Copy link
Contributor

I would say using floating point at all is walking on landmine. It's fine in non-critical parts of the code, and critical code must use integers (and unsigned ones at that) - that includes the wallet2's gamma picker. Fast math or no fast math doesn't matter if it's just used for display and logs, and compiler bugs can happen anywhere, not just with fast math.

I would rather increase test coverage and run tests also with UBSAN to feel safer, than simply disabling fast math and be like "yeah it'll be fine".

@0xFFFC0000
Copy link
Collaborator

Another one:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

to be fair. -ffast-math might have some nice optimization [1] too, but enabling it by default seems extremely risky. Even though we don’t rely on floating point much, but still there are some valid concerns.

  1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645

@0xFFFC0000
Copy link
Collaborator

My proposal is even if we are going to use -ffast-math lets have a CMake flag to enable or disable it. Something like -DFFAST_MATH_DISABLED=OFF or -DFFAST_MATH_ENABLED=ON.

@0xFFFC0000
Copy link
Collaborator

To be fair to -O3, yesterday I saw this article [1] and I have to mention that canonical officially is considering using -O3 to optimize its ubuntu packages:

  1. https://discourse.ubuntu.com/t/exploring-o3-optimization-for-ubuntu/46892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants