Fixed clang -Ofast issue. #9409

SpaceSleuth · 2024-07-28T02:03:41Z

Fixed issue clang compiler flag -Ofast [Deprecating -Ofast Issue #9407]

SChernykh · 2024-07-28T17:51:37Z

CMakeLists.txt

@@ -346,7 +346,7 @@ endif()
 if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
  set(OPT_FLAGS_RELEASE "-O2")
 else()
-  set(OPT_FLAGS_RELEASE "-Ofast")
+  set(OPT_FLAGS_RELEASE "-O3")


clang recommends to replace -Ofast with -O3 -ffast-math:

Users are advised to switch to -O3 -ffast-math if the use of non-standard math behavior is intended, and -O3 otherwise.

I don't think this PR should change the current behavior which enables fast math.

clang recommends to replace -Ofast with -O3 -ffast-math:

Users are advised to switch to -O3 -ffast-math if the use of non-standard math behavior is intended, and -O3 otherwise.

I don't think this PR should change the current behavior which enables fast math.

I see, I noticed that there is a conflicting opinion between SChernykh and 0xFFFC0000.

As of now, I have removed the lines as suggested by 0xFFFC0000. Should I add the flag -ffast-math instead?

0xFFFC0000 · 2024-07-28T19:01:30Z

Imho, if we remove the following lines we are at better shape:

if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
  set(OPT_FLAGS_RELEASE "-O2")
else()
  set(OPT_FLAGS_RELEASE "-O3")
endif()

CMake will add -O2 flag to release builds and -g to Debug builds by default.

Regarding -O3 vs -O2, I prefer -O2 since there are some unnecessary optimization enabled in -O3.

SpaceSleuth · 2024-07-30T06:35:59Z

Imho, if we remove the following lines we are at better shape:
if(WIN32 OR ARM OR PPC64LE OR PPC64 OR PPC)
  set(OPT_FLAGS_RELEASE "-O2")
else()
  set(OPT_FLAGS_RELEASE "-O3")
endif()
CMake will add -O2 flag to release builds and -g to Debug builds by default.

Regarding -O3 vs -O2, I prefer -O2 since there are some unnecessary optimization enabled in -O3.

Hi, removed thes lines. I noticed that there is a conflicting opinion between SChernykh and 0xFFFC0000.

As of now, I have removed the lines as suggested by 0xFFFC0000. Should I add the flag -ffast-math instead?

SChernykh · 2024-07-30T07:33:52Z

My comment was only regarding the "equivalence" of -Ofast and -O3 -ffast-math

@0xFFFC0000 why don't you like -O3? Do we have performance tests showing that -O3 doesn't speed up wallet or node sync, for example? I'd rather keep this PR restricted to a purely technical thing of replacing a deprecated command line option with its equivalent. Testing different optimization levels and changing them deserves more research and a different PR.

0xFFFC0000 · 2024-07-30T08:56:57Z

@SChernykh Usually -O3 does not provide meaningful performance difference.

To clarify, I am more hesitant about —fast-math than -O3. —fast-math enables -funsafe-math-optimizations which might have some performance improvements. But it is equivalent of silently shooting yourself in the foot.

About -O3, this is quite old [1], but I think the updated numbers would be roughly same too. Picture from figure 4.

If I remember correctly most open source projects use -O2 by default. And this is Gentoo warning about compiling with -O3 [2]:

Using -O3 may cause some packages to break either during the compilation or misbehave at runtime, although -O3 retains standard conformance, hence any breakage is either undefined behaviour in the application, or (very rarely) a compiler bug.

In some cases -O3 might introduce slowdowns too [3], but this is not generally the case.

Overall, let me emphasize it again, I can live with -O3, but I am extremely hesitant about —ffast-math.

Hoste, Kenneth, and Lieven Eeckhout. "Cole: compiler optimization level exploration." Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. 2008.
https://wiki.gentoo.org/wiki/GCC_optimization
https://www.phoronix.com/review/clang-gcc-opts/2

SChernykh · 2024-07-30T09:01:54Z

Using -O3 may cause some packages to break either during the compilation or misbehave at runtime, although -O3 retains standard conformance, hence any breakage is either undefined behaviour in the application, or (very rarely) a compiler bug.

It's a slippery slope to abandon -O3 just because there might be an undefined behavior somewhere. UB is a bug and it should be fixed. It's better to run UBSAN as a part of unit tests.

SChernykh · 2024-07-30T09:06:21Z

As for -ffast-math, I'm not aware of any critical code paths using floats to begin with, they use integers (128-bit where needed) to always guarantee the same result. I took a quick glance over the code, and floats are mostly used for display output and in logs.

Edit: and in wallet2's gamma picker. But even there, the difference in rounding caused by -ffast-math will be negligible

iamamyth · 2024-07-30T21:37:45Z

Testing different optimization levels and changing them deserves more research and a different PR.

I agree, especially because users tend to be very sensitive to hash rates. Imagine, for example, that the graph above (labeled "Picture from figure 4") proves representative, and optimized builds boost daemon hash rates (or reduce overall daemon CPU usage) by 1%: Considering a single release build has a lot of downloads, and has a shelf life of 1-2 years, that 1% savings amounts to a huge win, despite the fact that the compilation takes a lot longer than -O2.

0xFFFC0000 · 2024-07-30T21:44:55Z

I am repeating myself, but IMHO—ffast-math is unsafe.

SChernykh · 2024-07-30T22:27:40Z

--ffast-math is unsafe.

This is too broad statement and it lacks details. It's not unsafe per se, it just changes how floating point is implemented in a number of ways. The real difference between "normal" and "fast" math will be visible in a really edge cases: https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

Consider also that all Monero code has been running with it enabled this whole time.

TLDR: in 99.9% of cases, fast math will just result in different rounding of intermediate operations. As long as computation doesn't involve subtracting small values from huge values where relative rounding errors can be big, it's perfectly safe and will just give a slightly different result (a couple last bits in 64-bit double value will be different).

0xFFFC0000 · 2024-07-31T06:31:55Z

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522

SChernykh · 2024-07-31T06:48:47Z

First, I don't consider it a bug-bug, rather it's a different behavior under ffast-math. This was already mentioned in the link I posted https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

When -ffast-math is used while linking, GCC will link with CRT startup code that sets FPU flags differently. For example on x86, it sets the SSE mxcsr FTZ and DAZ control bits, to flush subnormals to 0 instead of doing gradual underflow (which takes a microcode assist on many CPUs.) (FTZ = Flush To Zero for subnormal results, DAZ = Denormals Are Zero for subnormal inputs to instructions including compares.)

FTZ and DAZ flags are totally fine for 99.9% of calculations, and I don't know of any place in Monero code where denormals are used. So my post still stands.

0xFFFC0000 · 2024-07-31T06:53:47Z

Or another example:

https://twitter.com/kwalfridsson/status/1450556903994675205

There are just too many moving parts with -ffast-math.

Edit: and in wallet2's gamma picker. But even there, the difference in rounding caused by -ffast-math will be negligible

We have random test failure there which I suspect caused by -ffast-math.

Having -ffast-math enabled by default is unsafe imho. It is like walking on landmine field and saying look there is no problem. But problem will happen once you walked few steps.

SChernykh · 2024-07-31T07:01:42Z

I would say using floating point at all is walking on landmine. It's fine in non-critical parts of the code, and critical code must use integers (and unsigned ones at that) - that includes the wallet2's gamma picker. Fast math or no fast math doesn't matter if it's just used for display and logs, and compiler bugs can happen anywhere, not just with fast math.

I would rather increase test coverage and run tests also with UBSAN to feel safer, than simply disabling fast math and be like "yeah it'll be fine".

0xFFFC0000 · 2024-07-31T07:12:53Z

Another one:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

to be fair. -ffast-math might have some nice optimization [1] too, but enabling it by default seems extremely risky. Even though we don’t rely on floating point much, but still there are some valid concerns.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645

0xFFFC0000 · 2024-07-31T07:17:31Z

My proposal is even if we are going to use -ffast-math lets have a CMake flag to enable or disable it. Something like -DFFAST_MATH_DISABLED=OFF or -DFFAST_MATH_ENABLED=ON.

0xFFFC0000 · 2024-08-03T17:49:18Z

To be fair to -O3, yesterday I saw this article [1] and I have to mention that canonical officially is considering using -O3 to optimize its ubuntu packages:

https://discourse.ubuntu.com/t/exploring-o3-optimization-for-ubuntu/46892

Fixed clang -Ofast issue.

9e14e52

SpaceSleuth mentioned this pull request Jul 28, 2024

Deprecating -Ofast #9407

Open

SChernykh reviewed Jul 28, 2024

View reviewed changes

selsta added the build system label Jul 28, 2024

SpaceSleuth added 2 commits July 28, 2024 19:47

-O3 -ffast-math instead of -Ofast.

2e1e903

Removed explicit -O2 and -O3 build flags.

c2d9d58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed clang -Ofast issue. #9409

Fixed clang -Ofast issue. #9409

SpaceSleuth commented Jul 28, 2024 •

edited

Loading

SChernykh Jul 28, 2024

SpaceSleuth Jul 30, 2024

0xFFFC0000 commented Jul 28, 2024 •

edited

Loading

SpaceSleuth commented Jul 30, 2024 •

edited

Loading

SChernykh commented Jul 30, 2024

0xFFFC0000 commented Jul 30, 2024 •

edited

Loading

SChernykh commented Jul 30, 2024

SChernykh commented Jul 30, 2024 •

edited

Loading

iamamyth commented Jul 30, 2024

0xFFFC0000 commented Jul 30, 2024

SChernykh commented Jul 30, 2024 •

edited

Loading

0xFFFC0000 commented Jul 31, 2024

SChernykh commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

SChernykh commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

0xFFFC0000 commented Aug 3, 2024

Fixed clang -Ofast issue. #9409

Are you sure you want to change the base?

Fixed clang -Ofast issue. #9409

Conversation

SpaceSleuth commented Jul 28, 2024 • edited Loading

SChernykh Jul 28, 2024

Choose a reason for hiding this comment

SpaceSleuth Jul 30, 2024

Choose a reason for hiding this comment

0xFFFC0000 commented Jul 28, 2024 • edited Loading

SpaceSleuth commented Jul 30, 2024 • edited Loading

SChernykh commented Jul 30, 2024

0xFFFC0000 commented Jul 30, 2024 • edited Loading

SChernykh commented Jul 30, 2024

SChernykh commented Jul 30, 2024 • edited Loading

iamamyth commented Jul 30, 2024

0xFFFC0000 commented Jul 30, 2024

SChernykh commented Jul 30, 2024 • edited Loading

0xFFFC0000 commented Jul 31, 2024

SChernykh commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

SChernykh commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

0xFFFC0000 commented Jul 31, 2024

0xFFFC0000 commented Aug 3, 2024

SpaceSleuth commented Jul 28, 2024 •

edited

Loading

0xFFFC0000 commented Jul 28, 2024 •

edited

Loading

SpaceSleuth commented Jul 30, 2024 •

edited

Loading

0xFFFC0000 commented Jul 30, 2024 •

edited

Loading

SChernykh commented Jul 30, 2024 •

edited

Loading

SChernykh commented Jul 30, 2024 •

edited

Loading