Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.3/x86_64] 8 tests failed out of 31 #214

Open
cdluminate opened this issue Aug 17, 2018 · 29 comments
Open

[3.3/x86_64] 8 tests failed out of 31 #214

cdluminate opened this issue Aug 17, 2018 · 29 comments
Labels

Comments

@cdluminate
Copy link

http://debomatic-amd64.debian.net/distribution#experimental/sleef/3.3-1/buildlog

/usr/bin/ctest --force-new-ctest-process -j8
Test project /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu
      Start  1: gnuabi_compatibility_SSE2
      Start  2: gnuabi_compatibility_AVX
      Start  3: gnuabi_compatibility_AVX2
      Start  4: gnuabi_compatibility_AVX512F
      Start  5: gnuabi_compatibility_AVX512F_masked
      Start  6: naivetestdp_1
      Start  7: naivetestdp_2
      Start  8: naivetestdp_3
 1/31 Test  #1: gnuabi_compatibility_SSE2 .............   Passed    0.01 sec
      Start  9: naivetestdp_4
 2/31 Test  #2: gnuabi_compatibility_AVX ..............   Passed    0.02 sec
      Start 10: naivetestdp_5
 3/31 Test  #3: gnuabi_compatibility_AVX2 .............   Passed    0.02 sec
      Start 11: naivetestdp_10
 4/31 Test  #4: gnuabi_compatibility_AVX512F ..........   Passed    0.03 sec
      Start 12: naivetestsp_1
 5/31 Test  #5: gnuabi_compatibility_AVX512F_masked ...   Passed    0.05 sec
      Start 13: naivetestsp_2
 6/31 Test  #6: naivetestdp_1 .........................   Passed    0.05 sec
      Start 14: naivetestsp_3
 7/31 Test  #7: naivetestdp_2 .........................   Passed    0.05 sec
      Start 15: naivetestsp_4
 8/31 Test #12: naivetestsp_1 .........................   Passed    0.02 sec
      Start 16: naivetestsp_5
 9/31 Test  #8: naivetestdp_3 .........................   Passed    0.09 sec
      Start 17: naivetestsp_10
10/31 Test #13: naivetestsp_2 .........................   Passed    0.09 sec
      Start 18: roundtriptest1ddp_12
11/31 Test  #9: naivetestdp_4 .........................   Passed    0.15 sec
      Start 19: roundtriptest1ddp_16
12/31 Test #10: naivetestdp_5 .........................   Passed    0.17 sec
      Start 20: roundtriptest1dsp_12
13/31 Test #15: naivetestsp_4 .........................   Passed    0.17 sec
      Start 21: roundtriptest1dsp_16
14/31 Test #14: naivetestsp_3 .........................   Passed    0.18 sec
      Start 22: roundtriptest2ddp_2_2
15/31 Test #16: naivetestsp_5 .........................   Passed    0.23 sec
      Start 23: roundtriptest2ddp_4_4
16/31 Test #22: roundtriptest2ddp_2_2 .................   Passed    0.81 sec
      Start 24: roundtriptest2ddp_8_8
17/31 Test #17: naivetestsp_10 ........................   Passed    1.02 sec
      Start 25: roundtriptest2ddp_10_10
18/31 Test #11: naivetestdp_10 ........................   Passed    1.23 sec
      Start 26: roundtriptest2ddp_5_15
19/31 Test #25: roundtriptest2ddp_10_10 ...............***Failed    0.73 sec
Path(random) :1(ST) 4(ST) 2(ST) 2(ST) 1(ST) 
ISA : AVX2 256 bit double
transpose NoMT(measured): 63591
transpose   MT(measured): 173729
Path(random) :2(ST) 4(ST) 4(ST) 
ISA : AVX2 256 bit double
transpose NoMT(loaded): 63591
transpose   MT(loaded): 173729
complex : NG (0.855868)

      Start 27: roundtriptest2dsp_2_2
20/31 Test #26: roundtriptest2ddp_5_15 ................***Failed    1.08 sec
Path(random) :1(ST) 4(ST) 2(ST) 2(ST) 3(ST) 1(ST) 1(ST) 1(ST) 
ISA : AVX2 256 bit double
Path(random) :1(ST) 2(ST) 2(ST) 
ISA : AVX2 256 bit double
transpose NoMT(measured): 76946
transpose   MT(measured): 174351
Path(random) :3(ST) 1(ST) 3(ST) 3(ST) 1(ST) 2(ST) 2(ST) 
ISA : AVX2 256 bit double
Path(random) :4(ST) 1(ST) 
ISA : AVX2 256 bit double
transpose NoMT(loaded): 76946
transpose   MT(loaded): 174351
complex : NG (0.861336)

      Start 28: roundtriptest2dsp_4_4
21/31 Test #20: roundtriptest1dsp_12 ..................   Passed    2.41 sec
      Start 29: roundtriptest2dsp_8_8
22/31 Test #18: roundtriptest1ddp_12 ..................   Passed    2.50 sec
      Start 30: roundtriptest2dsp_10_10
23/31 Test #27: roundtriptest2dsp_2_2 .................   Passed    0.89 sec
      Start 31: roundtriptest2dsp_5_15
24/31 Test #24: roundtriptest2ddp_8_8 .................***Failed    2.07 sec
Path(random) :3(ST) 2(ST) 3(ST) 
ISA : AVX2 256 bit double
transpose NoMT(measured): 22646
transpose   MT(measured): 1810050
Path(random) :1(ST) 2(ST) 3(ST) 2(ST) 
ISA : AVX2 256 bit double
transpose NoMT(loaded): 22646
transpose   MT(loaded): 1810050
complex : NG (0.85523)

25/31 Test #30: roundtriptest2dsp_10_10 ...............***Failed    0.54 sec
Path(random) :3(ST) 2(ST) 2(ST) 3(ST) 
ISA : AVX2 256 bit float
transpose NoMT(measured): 36704
transpose   MT(measured): 166311
Path(random) :2(ST) 2(ST) 3(ST) 3(ST) 
ISA : AVX2 256 bit float
transpose NoMT(loaded): 36704
transpose   MT(loaded): 166311
complex : NG (1.72439)

26/31 Test #31: roundtriptest2dsp_5_15 ................***Failed    0.61 sec
Path(random) :3(ST) 2(ST) 2(ST) 4(ST) 4(ST) 
ISA : AVX2 256 bit float
Path(random) :2(ST) 3(ST) 
ISA : AVX2 256 bit float
transpose NoMT(measured): 34700
transpose   MT(measured): 119216
Path(random) :3(ST) 4(ST) 2(ST) 3(ST) 3(ST) 
ISA : AVX2 256 bit float
Path(random) :2(ST) 3(ST) 
ISA : AVX2 256 bit float
transpose NoMT(loaded): 34700
transpose   MT(loaded): 119216
complex : NG (1.72819)

27/31 Test #29: roundtriptest2dsp_8_8 .................***Failed    2.05 sec
Path(random) :4(ST) 4(ST) 
ISA : AVX2 256 bit float
transpose NoMT(measured): 13515
transpose   MT(measured): 1884567
Path(random) :3(ST) 3(ST) 2(ST) 
ISA : AVX2 256 bit float
transpose NoMT(loaded): 13515
transpose   MT(loaded): 1884567
complex : NG (1.71725)

28/31 Test #21: roundtriptest1dsp_16 ..................   Passed    4.78 sec
29/31 Test #19: roundtriptest1ddp_16 ..................   Passed    4.99 sec
30/31 Test #28: roundtriptest2dsp_4_4 .................***Failed   15.35 sec
Path(random) :2(ST) 2(ST) 
ISA : AVX2 256 bit float
transpose NoMT(measured): 5788
transpose   MT(measured): 15256963
Path(random) :2(ST) 2(ST) 
ISA : AVX2 256 bit float
transpose NoMT(loaded): 5788
transpose   MT(loaded): 15256963
complex : NG (0.958348)

31/31 Test #23: roundtriptest2ddp_4_4 .................***Failed   17.41 sec
Path(random) :2(ST) 2(ST) 
ISA : AVX2 256 bit double
transpose NoMT(measured): 15802
transpose   MT(measured): 17304167
Path(random) :2(ST) 2(ST) 
ISA : AVX2 256 bit double
transpose NoMT(loaded): 15802
transpose   MT(loaded): 17304167
complex : NG (0.869041)


74% tests passed, 8 tests failed out of 31

Total Test time (real) =  17.71 sec

The following tests FAILED:
	 23 - roundtriptest2ddp_4_4 (Failed)
	 24 - roundtriptest2ddp_8_8 (Failed)
	 25 - roundtriptest2ddp_10_10 (Failed)
	 26 - roundtriptest2ddp_5_15 (Failed)
	 28 - roundtriptest2dsp_4_4 (Failed)
	 29 - roundtriptest2dsp_8_8 (Failed)
	 30 - roundtriptest2dsp_10_10 (Failed)
	 31 - roundtriptest2dsp_5_15 (Failed)
Errors while running CTest

build flags

dh_auto_configure -- \
	-DCMAKE_BUILD_TYPE=RelWithDebInfo \
	-DSLEEF_TEST_ALL_IUT=ON
	cd obj-x86_64-linux-gnu && cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_RUNSTATEDIR=/run "-GUnix Makefiles" -DCMAKE_BUILD_TYPE=RelWithDebInfo -DSLEEF_TEST_ALL_IUT=ON ..

machine configuration
oneapi-src/oneDNN#208 (comment)

@shibatch
Copy link
Owner

Hello @cdluminate,

I greatly appreciate that you work on debian packaging. Thank you for your report.

I haven't seen an error like this for some time.
It is hard to debug since I don't have access to that computer.
How about disabling DFT library for now? It is not used by any project yet.
If that is acceptable, please specify -DBUILT_DFT=FALSE as a CMake option.

@cdluminate
Copy link
Author

@shibatch Thanks for the hint. Nothing breaks if I disabled libsleefdft

http://debomatic-amd64.debian.net/distribution#experimental/sleef/3.3-1/buildlog

I checked pytorch's code and there is no keyword SleefDFT, so I thinks it's fine if we disable it.

@cdluminate
Copy link
Author

I don't have shell access to that machine too.

@btashton
Copy link

@shibatch I have the same results with my laptop, if there is something I can do to help debug this, let me know.

@shibatch
Copy link
Owner

@btashton Thank you!
Can I see the full build log?

@btashton
Copy link

@shibatch here is the build log including running the tests:

https://gist.github.com/btashton/1f4ccfd27244100560d4ec010f5201a9

@shibatch
Copy link
Owner

Could you also try compiling and testing with clang?

@btashton
Copy link

They all pass with clang.

@shibatch
Copy link
Owner

@btashton And, please also let me know of the system configuration, which is OS version, CPU model, etc.

@btashton
Copy link

Fedora 28, the details are listed here:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Stepping:            4
CPU MHz:             1095.807
CPU max MHz:         2700.0000
CPU min MHz:         500.0000
BogoMIPS:            4390.15
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            3072K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts flush_l1d
[bashton@localhost build]$ uname -a
Linux localhost.localdomain 4.17.14-202.fc28.x86_64 #1 SMP Wed Aug 15 12:29:25 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux```

@shibatch
Copy link
Owner

Do you have other versions of gcc installed on your computer?
If so, please try testing with those versions.

@shibatch
Copy link
Owner

And please try testing once again after executing the following command.

export OMP_WAIT_POLICY=passive

@btashton
Copy link

Unfortunately the only easy version of gcc for me to install right now is 3.4 since the distro includes it as a compat package and this library will not build due to some gcc flags. I did try setting the OMP_WAIT_POLICY and it did not seem to have any effect.

@shibatch
Copy link
Owner

Okay, I have been suspecting libgomp since the beginning.
It's only 2D DFT that is failing, and the difference between 1D DFT and 2D DFT is pretty simple, though the compiler generates a complex code.

It is possible to check the failing part with gdb, but I think that will not provide very useful information.

@shibatch
Copy link
Owner

It is still difficult to make sure that something is wrong with libgomp or gcc itself.

@cdluminate
Copy link
Author

This is interesting. I'll compare clang and gcc results too.

@shibatch
Copy link
Owner

@cdluminate Is it gcc-8 that was used to build the failing tests at the server?
I cannot see the log anymore.

@btashton
Copy link

@shibatch I set up a script to run again the official GCC docker images for 4.9, 5.5, 6.4, 7.3, 8.2 and I could not reproduce this failure on the same hardware. Any other thoughts?

@cdluminate
Copy link
Author

@shibatch It should be gcc8. Debian unstable had been shipping with gcc-8 as the defualt compiler for some time.

@shibatch
Copy link
Owner

shibatch commented Sep 18, 2018

@btashton No, I have no idea at all.
In my CI environment, it is tested with gcc-4, gcc-7, gcc-8 in addition to clang, icc and MSVC.
It seems that the problem only occurs with x86 and gcc-8.

And it is not always problematic with gcc-8.
I don't have any problem with that combination.

@chriselrod
Copy link

chriselrod commented Sep 30, 2018

I can also reproduce.

$ make test
Running tests...
Test project /home/chriselrod/Documents/libraries/sleef/build
      Start  1: gnuabi_compatibility_SSE2
 1/31 Test  #1: gnuabi_compatibility_SSE2 .............   Passed    0.00 sec
      Start  2: gnuabi_compatibility_AVX
 2/31 Test  #2: gnuabi_compatibility_AVX ..............   Passed    0.00 sec
      Start  3: gnuabi_compatibility_AVX2
 3/31 Test  #3: gnuabi_compatibility_AVX2 .............   Passed    0.00 sec
      Start  4: gnuabi_compatibility_AVX512F
 4/31 Test  #4: gnuabi_compatibility_AVX512F ..........   Passed    0.00 sec
      Start  5: gnuabi_compatibility_AVX512F_masked
 5/31 Test  #5: gnuabi_compatibility_AVX512F_masked ...   Passed    0.00 sec
      Start  6: naivetestdp_1
 6/31 Test  #6: naivetestdp_1 .........................   Passed    0.00 sec
      Start  7: naivetestdp_2
 7/31 Test  #7: naivetestdp_2 .........................   Passed    0.01 sec
      Start  8: naivetestdp_3
 8/31 Test  #8: naivetestdp_3 .........................   Passed    0.01 sec
      Start  9: naivetestdp_4
 9/31 Test  #9: naivetestdp_4 .........................   Passed    0.00 sec
      Start 10: naivetestdp_5
10/31 Test #10: naivetestdp_5 .........................   Passed    0.01 sec
      Start 11: naivetestdp_10
11/31 Test #11: naivetestdp_10 ........................   Passed    0.28 sec
      Start 12: naivetestsp_1
12/31 Test #12: naivetestsp_1 .........................   Passed    0.00 sec
      Start 13: naivetestsp_2
13/31 Test #13: naivetestsp_2 .........................   Passed    0.01 sec
      Start 14: naivetestsp_3
14/31 Test #14: naivetestsp_3 .........................   Passed    0.00 sec
      Start 15: naivetestsp_4
15/31 Test #15: naivetestsp_4 .........................   Passed    0.01 sec
      Start 16: naivetestsp_5
16/31 Test #16: naivetestsp_5 .........................   Passed    0.00 sec
      Start 17: naivetestsp_10
17/31 Test #17: naivetestsp_10 ........................   Passed    0.27 sec
      Start 18: roundtriptest1ddp_12
18/31 Test #18: roundtriptest1ddp_12 ..................   Passed    0.12 sec
      Start 19: roundtriptest1ddp_16
19/31 Test #19: roundtriptest1ddp_16 ..................   Passed    1.35 sec
      Start 20: roundtriptest1dsp_12
20/31 Test #20: roundtriptest1dsp_12 ..................   Passed    0.10 sec
      Start 21: roundtriptest1dsp_16
21/31 Test #21: roundtriptest1dsp_16 ..................   Passed    1.12 sec
      Start 22: roundtriptest2ddp_2_2
22/31 Test #22: roundtriptest2ddp_2_2 .................   Passed    0.04 sec
      Start 23: roundtriptest2ddp_4_4
23/31 Test #23: roundtriptest2ddp_4_4 .................***Failed    0.13 sec
      Start 24: roundtriptest2ddp_8_8
24/31 Test #24: roundtriptest2ddp_8_8 .................***Failed    0.03 sec
      Start 25: roundtriptest2ddp_10_10
25/31 Test #25: roundtriptest2ddp_10_10 ...............***Failed    0.10 sec
      Start 26: roundtriptest2ddp_5_15
26/31 Test #26: roundtriptest2ddp_5_15 ................***Failed    0.17 sec
      Start 27: roundtriptest2dsp_2_2
27/31 Test #27: roundtriptest2dsp_2_2 .................   Passed    0.05 sec
      Start 28: roundtriptest2dsp_4_4
28/31 Test #28: roundtriptest2dsp_4_4 .................***Failed    0.13 sec
      Start 29: roundtriptest2dsp_8_8
29/31 Test #29: roundtriptest2dsp_8_8 .................***Failed    0.02 sec
      Start 30: roundtriptest2dsp_10_10
30/31 Test #30: roundtriptest2dsp_10_10 ...............***Failed    0.08 sec
      Start 31: roundtriptest2dsp_5_15
31/31 Test #31: roundtriptest2dsp_5_15 ................***Failed    0.13 sec

74% tests passed, 8 tests failed out of 31

Total Test time (real) =   4.21 sec

The following tests FAILED:
	 23 - roundtriptest2ddp_4_4 (Failed)
	 24 - roundtriptest2ddp_8_8 (Failed)
	 25 - roundtriptest2ddp_10_10 (Failed)
	 26 - roundtriptest2ddp_5_15 (Failed)
	 28 - roundtriptest2dsp_4_4 (Failed)
	 29 - roundtriptest2dsp_8_8 (Failed)
	 30 - roundtriptest2dsp_10_10 (Failed)
	 31 - roundtriptest2dsp_5_15 (Failed)
Errors while running CTest
make: *** [Makefile:95: test] Error 8

I am on Fedora 29beta, with gcc 8.2.1, glibc 2.28, and an x86 processor (with avx-512).
cmake call:

cmake -DCMAKE_BUILD_TYPE=Release -DBUILT_DFT=FALSE ..

When I tried:

cmake -DCMAKE_BUILD_TYPE=Debug -DBUILT_DFT=FALSE ..

make fails with:

/usr/bin/ld: ../../lib/libsleefgnuabi.so.3.3: undefined reference to `Sleef_x86CpuID'
collect2: error: ld returned 1 exit status
make[2]: *** [src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX.dir/build.make:86: bin/gnuabi_compatibility_AVX] Error 1
make[1]: *** [CMakeFiles/Makefile2:3018: src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: ../../lib/libsleefgnuabi.so.3.3: undefined reference to `Sleef_x86CpuID'
collect2: error: ld returned 1 exit status
/usr/bin/ld: ../../lib/libsleefgnuabi.so.3.3: undefined reference to `Sleef_x86CpuID'
collect2: error: ld returned 1 exit status
/usr/bin/ld: ../../lib/libsleefgnuabi.so.3.3: undefined reference to `Sleef_x86CpuID'
collect2: error: ld returned 1 exit status
make[2]: *** [src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX2.dir/build.make:86: bin/gnuabi_compatibility_AVX2] Error 1
make[2]: *** [src/libm-tester/CMakeFiles/gnuabi_compatibility_SSE2.dir/build.make:86: bin/gnuabi_compatibility_SSE2] Error 1
make[1]: *** [CMakeFiles/Makefile2:3245: src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX2.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2374: src/libm-tester/CMakeFiles/gnuabi_compatibility_SSE2.dir/all] Error 2
make[2]: *** [src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX512F.dir/build.make:86: bin/gnuabi_compatibility_AVX512F] Error 1
make[1]: *** [CMakeFiles/Makefile2:2449: src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX512F.dir/all] Error 2
/usr/bin/ld: ../../lib/libsleefgnuabi.so.3.3: undefined reference to `Sleef_x86CpuID'
collect2: error: ld returned 1 exit status
make[2]: *** [src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX512F_masked.dir/build.make:86: bin/gnuabi_compatibility_AVX512F_masked] Error 1
make[1]: *** [CMakeFiles/Makefile2:2337: src/libm-tester/CMakeFiles/gnuabi_compatibility_AVX512F_masked.dir/all] Error 2
[ 64%] Built target sleefsse2
[ 64%] Built target sleefavx512fnofma
make: *** [Makefile:141: all] Error 2

EDIT: Things would've gone better had I been able to spell "build" correctly the first time. =P

@shibatch
Copy link
Owner

shibatch commented Oct 1, 2018

@chriselrod Thank you for your report. It is known problem that build fails when you specify -DCMAKE_BUILD_TYPE=Debug to build GNUABI libs. Please try something like

cmake -DCMAKE_BUILD_TYPE=Debug -DBUILT_DFT=TRUE -DBUILD_GNUABI_LIBS=FALSE ..

@shibatch
Copy link
Owner

shibatch commented Oct 1, 2018

So, it seems to have something to do with gcc-8, and it is likely to reproduce on Fedora.
I will install Fedora to my computer and test it.

@musicinmybrain
Copy link
Contributor

I can reproduce something similar—perhaps the same problem?—if I build with the CFLAGS typically used for RPM packaging on Fedora 32. You can check these with rpm -E '%set_build_flags':

  CFLAGS="${CFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CFLAGS ; 
  CXXFLAGS="${CXXFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CXXFLAGS ; 
  FFLAGS="${FFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FFLAGS ; 
  FCFLAGS="${FCFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FCFLAGS ; 
  LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld}" ; export LDFLAGS ; 
  LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib64:}" ; export LT_SYS_LIBRARY_PATH

I am invoking cmake the way the %cmake, %cmake_build, and %ctest RPM macros would, except that I am requesting the Ninja backend since the documentation says that is needed for a parallel build:

# first set CFLAGS, LDFLAGS, etc. as above, then:
  /usr/bin/cmake \
         \
         \
        -DCMAKE_C_FLAGS_RELEASE:STRING="-DNDEBUG" \
        -DCMAKE_CXX_FLAGS_RELEASE:STRING="-DNDEBUG" \
        -DCMAKE_Fortran_FLAGS_RELEASE:STRING="-DNDEBUG" \
        -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
        -DCMAKE_INSTALL_PREFIX:PATH=/usr \
        -DINCLUDE_INSTALL_DIR:PATH=/usr/include \
        -DLIB_INSTALL_DIR:PATH=/usr/lib64 \
        -DSYSCONF_INSTALL_DIR:PATH=/etc \
        -DSHARE_INSTALL_PREFIX:PATH=/usr/share \
        -DLIB_SUFFIX=64 \
        -DBUILD_SHARED_LIBS:BOOL=ON \
        -GNinja \
        ../sleef

  /usr/bin/cmake --build "." -j4 --verbose

  /usr/bin/ctest --output-on-failure --force-new-ctest-process -j4 --verbose

Here is what I see:

The following tests FAILED:
	 55 - fftwtest2ddp_4_4 (Failed)
	 56 - fftwtest2ddp_8_8 (Failed)
	 57 - fftwtest2ddp_10_10 (Failed)
	 58 - fftwtest2ddp_5_15 (Failed)
	 60 - fftwtest2dsp_4_4 (Failed)
	 61 - fftwtest2dsp_8_8 (Failed)
	 62 - fftwtest2dsp_10_10 (Failed)
	 63 - fftwtest2dsp_5_15 (Failed)

Changing --O2 to --O1 or --Og in CFLAGS causes the tests to pass. Strangely, so does changing it to --O3. The -- instead of - is a bit of macro fussiness, and the actual compiler flag is -O2, -O1, etc.

The machine I am testing on is an ancient x86_64 box that only supports SSE2. Fedora 32 currently has cmake 3.17.4, gcc-10.2.1, and I used sleef cc4b021.

I don’t know if this sheds any light on anything or not. Particularly, I do not know if @cdluminate was explicitly setting build flags in this manner or not. I am happy to run any tests on other versions of Fedora, CentOS, etc. I can also try it with a VM on a machine that supports AVX2 if it matters.

@musicinmybrain
Copy link
Contributor

It turns out the details about the flags added for RPM packaging on Fedora are not relevant. The following fails in the same way on Fedora 32, in an empty build directory and with no special environment variables set:

/usr/bin/cmake -DBUILD_SHARED_LIBS:BOOL=ON -GNinja ../sleef &&
    /usr/bin/cmake --build "." -j4 --verbose &&
    /usr/bin/ctest --output-on-failure --force-new-ctest-process -j4 --verbose

@shibatch
Copy link
Owner

How about just turning off DFT?
No one is using DFT, so that should be okay.

@chriselrod
Copy link

Out of curiosity, have you benchmarked against fftw?

@shibatch
Copy link
Owner

Yes. See the benchmark.
In some cases, it’s better than FFTW.
It has still problems in planner, though.
I have to manually specify the plan to maximize the performance.

@musicinmybrain
Copy link
Contributor

I just tried this again with sleef 3.6, using the latest patched GCC 14.0.1 in Fedora Rawhide. It looks like the DFT tests are passing now (on x86_64, aarch64, ppc64le, and s390x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants