New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.3/x86_64] 8 tests failed out of 31 #214
Comments
Hello @cdluminate, I greatly appreciate that you work on debian packaging. Thank you for your report. I haven't seen an error like this for some time. |
@shibatch Thanks for the hint. Nothing breaks if I disabled libsleefdft http://debomatic-amd64.debian.net/distribution#experimental/sleef/3.3-1/buildlog I checked pytorch's code and there is no keyword |
I don't have shell access to that machine too. |
@shibatch I have the same results with my laptop, if there is something I can do to help debug this, let me know. |
@btashton Thank you! |
@shibatch here is the build log including running the tests: https://gist.github.com/btashton/1f4ccfd27244100560d4ec010f5201a9 |
Could you also try compiling and testing with clang? |
They all pass with clang. |
@btashton And, please also let me know of the system configuration, which is OS version, CPU model, etc. |
Fedora 28, the details are listed here:
|
Do you have other versions of gcc installed on your computer? |
And please try testing once again after executing the following command.
|
Unfortunately the only easy version of gcc for me to install right now is 3.4 since the distro includes it as a compat package and this library will not build due to some gcc flags. I did try setting the OMP_WAIT_POLICY and it did not seem to have any effect. |
Okay, I have been suspecting libgomp since the beginning. It is possible to check the failing part with gdb, but I think that will not provide very useful information. |
It is still difficult to make sure that something is wrong with libgomp or gcc itself. |
This is interesting. I'll compare clang and gcc results too. |
@cdluminate Is it gcc-8 that was used to build the failing tests at the server? |
@shibatch I set up a script to run again the official GCC docker images for 4.9, 5.5, 6.4, 7.3, 8.2 and I could not reproduce this failure on the same hardware. Any other thoughts? |
@shibatch It should be gcc8. Debian unstable had been shipping with gcc-8 as the defualt compiler for some time. |
@btashton No, I have no idea at all. And it is not always problematic with gcc-8. |
I can also reproduce.
I am on Fedora 29beta, with gcc 8.2.1, glibc 2.28, and an x86 processor (with avx-512).
When I tried:
make fails with:
EDIT: Things would've gone better had I been able to spell "build" correctly the first time. =P |
@chriselrod Thank you for your report. It is known problem that build fails when you specify
|
So, it seems to have something to do with gcc-8, and it is likely to reproduce on Fedora. |
I can reproduce something similar—perhaps the same problem?—if I build with the CFLAGS="${CFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CFLAGS ;
CXXFLAGS="${CXXFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CXXFLAGS ;
FFLAGS="${FFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FFLAGS ;
FCFLAGS="${FCFLAGS:--O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FCFLAGS ;
LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld}" ; export LDFLAGS ;
LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib64:}" ; export LT_SYS_LIBRARY_PATH I am invoking cmake the way the # first set CFLAGS, LDFLAGS, etc. as above, then:
/usr/bin/cmake \
\
\
-DCMAKE_C_FLAGS_RELEASE:STRING="-DNDEBUG" \
-DCMAKE_CXX_FLAGS_RELEASE:STRING="-DNDEBUG" \
-DCMAKE_Fortran_FLAGS_RELEASE:STRING="-DNDEBUG" \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
-DCMAKE_INSTALL_PREFIX:PATH=/usr \
-DINCLUDE_INSTALL_DIR:PATH=/usr/include \
-DLIB_INSTALL_DIR:PATH=/usr/lib64 \
-DSYSCONF_INSTALL_DIR:PATH=/etc \
-DSHARE_INSTALL_PREFIX:PATH=/usr/share \
-DLIB_SUFFIX=64 \
-DBUILD_SHARED_LIBS:BOOL=ON \
-GNinja \
../sleef
/usr/bin/cmake --build "." -j4 --verbose
/usr/bin/ctest --output-on-failure --force-new-ctest-process -j4 --verbose Here is what I see:
Changing The machine I am testing on is an ancient x86_64 box that only supports SSE2. Fedora 32 currently has cmake 3.17.4, gcc-10.2.1, and I used sleef cc4b021. I don’t know if this sheds any light on anything or not. Particularly, I do not know if @cdluminate was explicitly setting build flags in this manner or not. I am happy to run any tests on other versions of Fedora, CentOS, etc. I can also try it with a VM on a machine that supports AVX2 if it matters. |
It turns out the details about the flags added for RPM packaging on Fedora are not relevant. The following fails in the same way on Fedora 32, in an empty build directory and with no special environment variables set: /usr/bin/cmake -DBUILD_SHARED_LIBS:BOOL=ON -GNinja ../sleef &&
/usr/bin/cmake --build "." -j4 --verbose &&
/usr/bin/ctest --output-on-failure --force-new-ctest-process -j4 --verbose |
How about just turning off DFT? |
Out of curiosity, have you benchmarked against fftw? |
Yes. See the benchmark. |
I just tried this again with sleef 3.6, using the latest patched GCC 14.0.1 in Fedora Rawhide. It looks like the DFT tests are passing now (on |
http://debomatic-amd64.debian.net/distribution#experimental/sleef/3.3-1/buildlog
build flags
machine configuration
oneapi-src/oneDNN#208 (comment)
The text was updated successfully, but these errors were encountered: