Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain tests fail on gcc 12.0.0 pre-release #439

Open
musicinmybrain opened this issue Jan 14, 2022 · 16 comments
Open

Certain tests fail on gcc 12.0.0 pre-release #439

musicinmybrain opened this issue Jan 14, 2022 · 16 comments

Comments

@musicinmybrain
Copy link
Contributor

musicinmybrain commented Jan 14, 2022

Fedora Linux just introduced a pre-release build of gcc 12.0.0 to the development version of Fedora (Rawhide). (Fedora 36 will be released with GCC 12 in mid-2022.) This has resulted in a few new test failures in the sleef package on several architectures.

The release notes for GCC 12 are here. It’s not obvious whether these new failures are a Sleef bug or a GCC bug. Disabling LTO makes no difference.

armv7hl

OK (no failures)

i686

OK (no failures)

x86_64

The following tests FAILED:
	 28 - iutpurecfma_scalar (Failed)
	 29 - iutypurecfma_scalar (Failed)

aarch64

The following tests FAILED:
	 14 - iutpurec_scalar (Failed)
	 15 - iutypurec_scalar (Failed)
	 17 - iutpurecfma_scalar (Failed)
	 18 - iutypurecfma_scalar (Failed)

ppc64le

The following tests FAILED:
	  8 - iutpurec_scalar (Failed)
	  9 - iutypurec_scalar (Failed)
	 11 - iutpurecfma_scalar (Failed)
	 12 - iutypurecfma_scalar (Failed)

s390x

The following tests FAILED:
	  3 - iutyzvector2 (Failed)
	  6 - iutyzvector2nofma (Failed)
	  8 - iutpurec_scalar (Failed)
	  9 - iutypurec_scalar (Failed)
	 11 - iutpurecfma_scalar (Failed)
	 12 - iutypurecfma_scalar (Failed)
@musicinmybrain
Copy link
Contributor Author

Updated with ppc64le results.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Apr 12, 2023
This updates sleef to latest upstream commit
85440a5e87dae36ca1b891de14bc83b441ae7c43 which fixes a bunch of things
including AVX2 detection. DFT library is now disabled by default because
it has issues indicated by issue reports in upstream repo. There are
also some issues with unit tests and/or library itself which may
results in bugs however unit tests works fine on 13.2-RELEASE amd64
at least. Import patch from Debian to fix build of unit tests

References:
shibatch/sleef#439
https://salsa.debian.org/science-team/sleef/-/blob/master/debian/patches/disable-duplicate-mpfr-funcs.patch

PR:		266784
Reviewed by:	jmd (maintainer)
@musicinmybrain
Copy link
Contributor Author

I re-tested this with Sleef 3.5.1 and GCC 14.0.1. I see some regressions in Sleef 3.6 (bug report to follow), and I wanted to make sure I still had a correct baseline to compare against. Note that I am not building the DFT library due to #214, and I am not building the quad-precision library since it is still marked experimental in 3.5.1. (If possible, I plan to enable it for 3.6.)

Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, so this is just tested on the four 64-bit primary architectures: x86_64, aarch64, ppc64le, and s390x. I am still building with LTO; disabling LTO didn’t help when I made the original report, so I didn’t try it again.

All of the test failures on each architecture are still exactly the same as in the original report.

I don’t really know how to debug these, but please let me know if there’s anything I can do to help.

@blapie
Copy link
Collaborator

blapie commented Mar 5, 2024

Again thanks for the detailed report! AFAICT your baseline sounds sensible, these bugs have not been fixed yet unfortunately. I'm going to look into these failures right now.

I don’t really know how to debug these, but please let me know if there’s anything I can do to help.

Thanks, I'll let you know if I need help reproducing, but IIRC these were easy to reproduce.

Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval,

Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.

@musicinmybrain
Copy link
Contributor Author

Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.

We don’t build for true i386. For historical reasons, we sometimes use i386/i686 interchangeably in referring to our 32-bit x86 architecture, which is really i686. As mentioned in https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, there is no longer a 32-bit Fedora kernel or 32-bit release image, and the i686 packages are only used for multilib support on x86_64. (So, to be even more pedantic, they must be running on 64-bit capable hardware, and can assume SSE2 support.)

Anyway, i686 is our only 32-bit primary architecture, and it’s multilib-only as described above. The primary architectures that you can actually install on are x86_64, aarch64, pp64le, and s390x. It seems likely that riscv64 will be added at some point, mostly waiting on the availability of appropriate server-class hardware, but that’s not here yet, and I don’t have access to RISC-V hardware.

@blapie
Copy link
Collaborator

blapie commented Mar 6, 2024

So the issue with the iut(y)purec(fma)_scalar tests seem to be related to the -fno-trapping-math option.

The issue appears at -O1 and with gcc12 -fno-trapping-math, and can be reproduced with the following C-code (appearing in first branch of tanf_purec(fma)_scalar).

int q = (int)rintf(d);
float u = (float)q;

Expected result with gcc12 -O1 or gcc11 -O1 -fno-trapping-math or simply at -O0. Codegen on AArch64

 400684:      1e274000       frintx s0, s0
 400688:      5ea1b800       fcvtzs s0, s0
 40068c:      5e21d800       scvtf  s0, s0

Unexpected result with gcc12 -O1 -fno-trapping-math. The conversion have been optimised out and the sign of the output may change if x=-0.0f.

400684:      1e274000       frintx s0, s0

If input is -0.0f, then output is 0.0f in first case, -0.0f in second.
Does this sound like a compiler bug or rather does it look like the most recent compiler is doing the right thing?
I'm leaning towards compiler bug since the result is different, but I could be missing a point.

It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?

@musicinmybrain
Copy link
Contributor Author

It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?

It was introduced in 33ffaff via #169. Unfortunately, the rationale was never documented. Normally -fno-trapping-math and -fno-math-errno are purely performance optimizations that sacrifice strict IEEE 754 and/or C conformance in exchange for a sometimes significant performance benefit. At least -fno-trapping-math should affect SIMD codegen – usually for the better, since 99.9% of the time nobody is trying to use floating point exceptions and trap handlers.

I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).

@musicinmybrain
Copy link
Contributor Author

@blapie
Copy link
Collaborator

blapie commented Mar 7, 2024

It was introduced in 33ffaff via #169. Unfortunately, the rationale was never documented.

The comment seems to indicate that the aim was to prevent compilers from emitting calls to libm, e.g. sqrt. But only the flag -fno-math-errno is necessary in this case, see https://godbolt.org/z/54zs59hTr .

At least -fno-trapping-math should affect SIMD codegen

You mean NOT affect, right?

I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).

After discussing with gcc people we have reached a similar conclusion, it looks like a violation of the default -fsigned-zeros.

So we have 2 options here, we could wait for the bug to be fixed (might take some time but safer) or simply remove the option (and deal with minor consequences on scalar performance).

@blapie
Copy link
Collaborator

blapie commented Mar 15, 2024

Hi! Good news, this was filed as a definite gcc bug and has been taken care of by @joeramsay and people in our GNU team.
It is now merged in gcc HEAD and will be backported soon.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7dd3b2b09cbeb6712ec680a0445cb0ad41070423

These signed-zero cases often get forgotten, for obvious reasons.

Will close this and #483 once fix is backported and tests pass. Unless there are some leftover tests failing.

@musicinmybrain
Copy link
Contributor Author

Thank you! I will keep an eye on this. The GCC package in Fedora tracks upstream pretty closely, so I should be able to test the fix quickly once it’s released. If I have time, I might build my own GCC package with that commit backported just to validate the fix.

@musicinmybrain
Copy link
Contributor Author

Ok, it took more than a day to build the patched GCC in COPR, but it looks like—once it’s released and packaged—I can expect that the GCC change will indeed fix all of the test failures reported in this issue.

Next I’ll double-check Sleef 3.6 and see if all of the test failures from #518 are still present with the patched GCC.

@blapie
Copy link
Collaborator

blapie commented Mar 18, 2024

Hi! Thanks, we appreciate the effort. Let me know how it goes, if new failures are still present in 3.6 we will investigate each of them to better characterise the failures and open new issues.

@musicinmybrain
Copy link
Contributor Author

musicinmybrain commented Mar 18, 2024

Using -ftrapping-math does seem to be a usable workaround in 3.5.1 (not sure about 3.6 yet), so I’ll add it to the CXXFLAGS in Fedora and build bugfix updates. I’ll drop -ftrapping-math again once the GCC fix is released.

@blapie
Copy link
Collaborator

blapie commented Mar 18, 2024

Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter).
We might drop the -fno-trapping-math in the future as we believe it was introduced for the wrong reasons (only -fno-math-errno was necessary to make sure no call to libm was generate for fsqrt).

@musicinmybrain
Copy link
Contributor Author

Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter).

For now, I’d rather favor strict correctness over maximum scalar performance. In any case, once the GCC fix is released, I’ll drop -ftrapping-math and go back to respecting whatever Sleef chooses upstream.

@musicinmybrain
Copy link
Contributor Author

Plot twist: either -ftrapping-math wasn’t a usable workaround after all, or perhaps it wasn’t correctly placed to override -fno-trapping-math. (I didn’t go back and double-check.) Instead, the test builds with -ftrapping-math worked because the maintainer of the gcc package in Fedora backported the fix yesterday, and I didn’t notice. That was fast!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants