New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ppc64le] Many tests are failing on linux IBM power9 machines #1222
Comments
The test suite shows many errors. The float128 datatype is know to be unsupported on PPC which explains some errors (but not all)
|
The https://github.com/lattera/glibc/blob/master/sysdeps/powerpc/powerpc64/power7/memcpy.S |
Interesting information ... Nevertheless I doubt the glibc is broken as
most of our tests exhibit excellent results, especially when it comes
down to the bandwidth and copying data.
I noticed the HDF5 library provided by anaconda does not exhibit this
bug, moreover they are applying a couple of bugs specific to ppc64le
(mainly related to 128-bits floats which are known unsuported on this
platform).
https://github.com/AnacondaRecipes/hdf5-feedstock/tree/master/recipe
I will try to apply the patch and check if it is better.
Thanks for your feed-back.
|
Also, this looks similar to #1164, though I haven't looked through all the failures to see if they're the same. |
This would explain why it works with former version of h5py and also why hdf5 is still limited to 1.10.2 on anaconda.
|
I confirm there is no "seg-fault" when running the test-suite on h5py 2.8 (2 tests are failing) . There is a regression in 2.9 and in the master branch.
|
A PR that fixes this would be very welcome. I don't have the architeture needed to test this. |
I did not fully understand what is the cause of the problem ...
moreover there are various patches (from anaconda) which I believe are
interfering.
Do you have an idea of now to fix this ? We have currently a test
computer loaded by IBM for debugging this kind of issues.
Cheers,
Jerome
|
It seems like HDF5 isn't able to create a real file. I am not sure why. You can try issuing a bug report to HDF5 and seeing if they know what is going on here. |
Thanks for the advice.
|
Hi, The description of long-double on ppc64le: 106 bits mantissa but only 11 bits mantissa... |
Ahh very interesing! Can you please put in a PR that fixes this for you? |
Removing the "specific case" for ppc64le apparently addresses the problem.
|
It looks like the ppc64le correction was originally introduced in #842 by @mraspaud to fix some (unspecified in the PR) type conversion to longdouble. Then for 2.9 it was moved in #1114 and used in more places. Ideally it would be good if we could understand what went wrong. Was using it in more places (#1114) inappropriate? Or is the special case needed on some ppc64le machines and not others? |
I tried and my code still seg-faults ... apparently it is enough to make the tests execute (actually all are not passing but none are seg-faulting). |
Tell me if I can help with something. Unfortunately I don't have access to this platform anymore. |
The HDF Group got access to Power9 last week and we will be looking into the problems in the HDF5 library exposed by regression tests and h5py. Unfortunately, it will take time to sort out the issues in both. We will keep you informed on our findings. One thing to try is to disable compiler optimization when building HDF5 and see if it helps with the datatype conversion. Elena |
Thanks all. @mraspaud - any chance you can find/write a code sample that would have failed before your fix? Then other people could see if proposed fixes for this problem reintroduce the bug you were fixing. |
Looking at it, I see you re-enabled a test that was skipped on ppc64le: |
Yes, exactly, the |
Thanks Elena,
For now I am basing (all) my test on 1.10.5 built with the default
options but I may indeed test witout those optimizations.
|
Disabling the float128 datatype on PPC64le allows h5py to work properly. This is clearly a workaround but at least it the library does no more crash when reading float32 or float64 which are much more used than float128. |
I'm a bit mystified why disabling float128 would make a difference to other data types. But if it does, then that sounds like a pragmatic interim fix - as you say, float32 and float64 are more common. |
I was like you initially as the datatypes I am interested in are either (10, 16, -14, <class 'numpy.float16'>, 'ppc64le')
(23, 128, -126, <class 'numpy.float32'>, 'ppc64le')
(52, 1024, -1022, <class 'numpy.float64'>, 'ppc64le')
(105, 1024, -1022, <class 'numpy.float128'>, 'ppc64le')
Before read
(10, 16, -14, <class 'numpy.float16'>, 'ppc64le')
(23, 128, -126, <class 'numpy.float32'>, 'ppc64le')
(52, 1024, -1022, <class 'numpy.float64'>, 'ppc64le')
reading dataset (4, 16384, 1024) float64
... This is with float128 activated (some prints are also from h5py, I added them):
|
Actually, only the |
Can you make a PR from the patch so we can see what the change is? |
Hi,
I am facing many seg-faults when reading HDF5 files with h5py. Here are outcome of may investigations based on freshly built hdf5 (1.10.5). Note that all tests passed on the HDF5 side.
While trying to read a float64-dataset of shape (4, 16384, 1024), uncompressed, unchunked I got this exception (from gdb):
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
__memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:392
392 ../sysdeps/powerpc/powerpc64/power7/memcpy.S: No such file or directory.
Note that the power7 is a big-endian architecture, while power9 is little endian, so no surprise this ends badly.
the full backtrace is:
The text was updated successfully, but these errors were encountered: