Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ppc64le] test suite failing on power9 #1244

Closed
kif opened this issue Jun 19, 2019 · 10 comments · Fixed by #1319
Closed

[ppc64le] test suite failing on power9 #1244

kif opened this issue Jun 19, 2019 · 10 comments · Fixed by #1319

Comments

@kif
Copy link
Contributor

kif commented Jun 19, 2019

As suggested by @takluyver, here are the results of the test-suite running on IBM power9 computer running Ubuntu 18.04. The python core is 3.6, all modules have been pip installed in a venv (and recompiled as no binary wheels exist yet on this platform) and HDF5 has been compile from sources in version 1.10.5.

Most failing tests are related in a way or another to "float128" which have just been deactivated in #1243. They should probably be marked as "xfailed".

(venv-kieffer_system) test@power9:~/HDF5/h5py/build/lib.linux-ppc64le-3.6$ python
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> h5py.run_tests()
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.6.7, pytest-4.4.2, py-1.8.0, pluggy-0.11.0
rootdir: /home/test/HDF5/h5py, inifile: pytest.ini
collected 537 items                                                                                                                                                                       

h5py/tests/hl/test_attribute_create.py ...                                                                                                                                          [  0%]
h5py/tests/hl/test_completions.py ..                                                                                                                                                [  0%]
h5py/tests/hl/test_dataset_getitem.py .................................................x....................................                                                        [ 16%]
h5py/tests/hl/test_dataset_swmr.py ssss..........                                                                                                                                   [ 19%]
h5py/tests/hl/test_datatype.py .........F.....ssss                                                                                                                                  [ 23%]
h5py/tests/hl/test_deprecation.py .                                                                                                                                                 [ 23%]
h5py/tests/hl/test_dims_dimensionproxy.py .                                                                                                                                         [ 23%]
h5py/tests/hl/test_file.py .................                                                                                                                                        [ 26%]
h5py/tests/hl/test_filters.py .s.                                                                                                                                                   [ 27%]
h5py/tests/hl/test_threads.py ..                                                                                                                                                    [ 27%]
h5py/tests/hl/test_vds/test_highlevel_vds.py ......                                                                                                                                 [ 28%]
h5py/tests/hl/test_vds/test_lowlevel_vds.py ...                                                                                                                                     [ 29%]
h5py/tests/hl/test_vds/test_virtual_source.py ................                                                                                                                      [ 32%]
h5py/tests/old/test_attrs.py ...............s                                                                                                                                       [ 35%]
h5py/tests/old/test_attrs_data.py ......F.............                                                                                                                              [ 38%]
h5py/tests/old/test_base.py .....                                                                                                                                                   [ 39%]
h5py/tests/old/test_dataset.py ......F..x...............................s.................................x..........................                                               [ 58%]
h5py/tests/old/test_datatype.py ..                                                                                                                                                  [ 59%]
h5py/tests/old/test_dimension_scales.py ....s................                                                                                                                       [ 63%]
h5py/tests/old/test_file.py .............ss...ssss....................s.........                                                                                                    [ 72%]
h5py/tests/old/test_file_image.py ..                                                                                                                                                [ 73%]
h5py/tests/old/test_group.py .................................ssssss...................................................                                                             [ 89%]
h5py/tests/old/test_h5.py .....                                                                                                                                                     [ 90%]
h5py/tests/old/test_h5d_direct_chunk_write.py .                                                                                                                                     [ 91%]
h5py/tests/old/test_h5f.py .....                                                                                                                                                    [ 91%]
h5py/tests/old/test_h5p.py ........                                                                                                                                                 [ 93%]
h5py/tests/old/test_h5t.py ....                                                                                                                                                     [ 94%]
h5py/tests/old/test_objects.py ...                                                                                                                                                  [ 94%]
h5py/tests/old/test_selections.py ....                                                                                                                                              [ 95%]
h5py/tests/old/test_slicing.py ........................                                                                                                                             [100%]

======================================================================================== FAILURES =========================================================================================
__________________________________________________________________________ TestOffsets.test_float_round_tripping __________________________________________________________________________

self = <h5py.tests.hl.test_datatype.TestOffsets testMethod=test_float_round_tripping>

    @ut.skipIf(
        platform.machine() in x86_32_BIT_SYSTEMS,
        'Test fails on i386, need to sort out long double FIX THIS')
    def test_float_round_tripping(self):
        dtypes = set(f for f in np.typeDict.values()
                     if (np.issubdtype(f, np.floating) or
                         np.issubdtype(f, np.complexfloating))
                     )
    
        dtype_dset_map = {str(j): d
                          for j, d in enumerate(dtypes)}
    
        fname = self.mktemp()
    
        with h5py.File(fname, 'w') as f:
            for n, d in dtype_dset_map.items():
                data = np.arange(10,
                                 dtype=d)
    
                f.create_dataset(n, data=data)
    
        with h5py.File(fname, 'r') as f:
            for n, d in dtype_dset_map.items():
                ldata = f[n][:]
>               self.assertEqual(ldata.dtype, d)
E               AssertionError: dtype('<f8') != <class 'numpy.float128'>

h5py/tests/hl/test_datatype.py:307: AssertionError
__________________________________________________________________________________ TestTypes.test_float ___________________________________________________________________________________

self = <h5py.tests.old.test_attrs_data.TestTypes testMethod=test_float>

    def test_float(self):
        """ Storage of floating point types """
        dtypes = tuple(np.dtype(x) for x in ('<f4','>f4','<f8','>f8'))
    
        for dt in dtypes:
            data = np.ndarray((1,), dtype=dt)
            data[...] = 42.3
            self.f.attrs['x'] = data
            out = self.f.attrs['x']
>           self.assertEqual(out.dtype, dt)
E           AssertionError: dtype('>f16') != dtype('>f8')

h5py/tests/old/test_attrs_data.py:112: AssertionError
____________________________________________________________________________ TestCreateShape.test_long_double _____________________________________________________________________________

self = <h5py.tests.old.test_dataset.TestCreateShape testMethod=test_long_double>

    def test_long_double(self):
        """ Confirm that the default dtype is float """
        dset = self.f.create_dataset('foo', (63,), dtype=np.longdouble)
>       self.assertEqual(dset.dtype, np.longdouble)
E       AssertionError: dtype('<f8') != <class 'numpy.float128'>

h5py/tests/old/test_dataset.py:94: AssertionError
==================================================================================== warnings summary =====================================================================================
build/lib.linux-ppc64le-3.6/h5py/tests/hl/test_dataset_getitem.py::Test1DZeroFloat::test_mask
  /home/test/HDF5/h5py/build/lib.linux-ppc64le-3.6/h5py/_hl/dataset.py:534: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    if args == (Ellipsis,) or args == tuple():

-- Docs: https://docs.pytest.org/en/latest/warnings.html
========================================================= 3 failed, 506 passed, 25 skipped, 3 xfailed, 1 warnings in 3.97 seconds =========================================================
1
@takluyver
Copy link
Member

Thanks. That's a bit confusing. It seems that creating a dataset with '>f8' uses a 128-bit float (second failure), but creating with np.longdouble uses a 64-bit float (3rd and probably 1st).

@kif
Copy link
Contributor Author

kif commented Jun 19, 2019

I noted the same ... Some tests have easy workaround and I am preparing a PR but the float64 attribute which ends in long-double hides something deeper.

@kif
Copy link
Contributor Author

kif commented Jun 20, 2019

Hi Thomas,

I confirm that the setting of an attribute with an array of type "big-endian float64" results in a bug.

import numpy, h5py
a=numpy.array([43], dtype=">f8")                                                                                                                          

In [5]: a                                                                                                                                                                            
Out[5]: array([43.])

In [6]: a.dtype                                                                                                                                                                      
Out[6]: dtype('>f8')

In [7]: f=h5py.File("test.h5", "a")                                                                                                                                                  

In [8]: f.attrs["test>f8"] = a[0]                                                                                                                                                    

In [9]: f.attrs["test>f8"]                                                                                                                                                           
Out[9]: 43.0

In [10]: f.attrs["test>f8"].dtype                                                                                                                                                    
Out[10]: dtype('float64')

In [11]: f.attrs["test>f8_1"]=a                                                                                                                                                      

In [12]: f.attrs["test>f8_1"]                                                                                                                                                        
Out[12]: array([-1.10439183e+232], dtype=float128)

I can provide a PR for the other 2 failing tests but this one deserves further investigation

@aragilar
Copy link
Member

One thing I did notice is that the values for ppc64le is different to that of ppc64:

h5py/h5py/h5t.pyx

Lines 308 to 317 in 369d9cf

if ftype_ == np.longdouble and MACHINE == 'ppc64':
# values reported by hdf5
nmant = 116
maxexp = 1024
minexp = -1022
elif ftype_ == np.longdouble and MACHINE == 'ppc64le':
# values reported by hdf5
nmant = 52
maxexp = 1024
minexp = -1022

I'd expect the mantissa size to be the same. Could that be the issue?

@kif What format is np.float128 and np.longdouble (e.g. double-double, ieee quad)?

@kif
Copy link
Contributor Author

kif commented Jun 20, 2019

Hi James,
I suspect that the bug is between IBM power8 and 9 which are both ppc64le but the implement different OpenPower ISA, respectively 2 and 3 and the C-longdouble differs. I have asked confirmation to IBM and hope to get some guidance on the issue. The Power9 is the first processor with Arm64 to implement effectively ieee754-128bit float in hardware.

Unfortunately, numpy sees the numpy.float128 as numpy.longdouble (in conformance with numpy's doc) which is:

In [1]: import numpy                                                                                                                                                                 

In [2]: numpy.finfo("float128")                                                                                                                                                      
Out[2]: finfo(resolution=1e-31, min=-1.79769313486231580793728971405301e+308, max=1.79769313486231580793728971405301e+308, dtype=float128)

In [3]: numpy.finfo("float128").nmant                                                                                                                                                
Out[3]: 105

In [4]: numpy.finfo("float128").nexp                                                                                                                                                 
Out[4]: 11

@terjeros
Copy link

Tried to import 2.10.0 in Fedora rawhide, got

=================================== FAILURES ===================================
_____________________________ TestTypes.test_float _____________________________
self = <h5py.tests.test_attrs_data.TestTypes testMethod=test_float>
    def test_float(self):
        """ Storage of floating point types """
        dtypes = tuple(np.dtype(x) for x in ('<f4', '>f4', '>f8', '<f8'))
    
        for dt in dtypes:
            data = np.ndarray((1,), dtype=dt)
            data[...] = 42.3
            self.f.attrs['x'] = data
            out = self.f.attrs['x']
            # TODO: Clean up after issue addressed !
            print("dtype: ", out.dtype, dt)
            print("value: ", out, data)
>           self.assertEqual(out.dtype, dt)
E           AssertionError: dtype('>f16') != dtype('>f8')
h5py/tests/test_attrs_data.py:119: AssertionError
----------------------------- Captured stdout call -----------------------------
dtype:  float32 float32
value:  [42.3] [42.3]
dtype:  >f4 >f4
value:  [42.3] [42.3]
dtype:  >f16 >f8
value:  [0.] [42.3]

> 
=================================== FAILURES ===================================
____________________ TestOffsets.test_float_round_tripping _____________________
self = <h5py.tests.test_dtype.TestOffsets testMethod=test_float_round_tripping>
    def test_float_round_tripping(self):
        dtypes = set(f for f in np.typeDict.values()
                     if (np.issubdtype(f, np.floating) or
                         np.issubdtype(f, np.complexfloating)))
    
        if platform.machine() in UNSUPPORTED_LONG_DOUBLE:
>           dtype_dset_map = {str(j): d
                              for j, d in enumerate(dtypes)
                              if d not in (np.float128, np.complex256)}
h5py/tests/test_dtype.py:296: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.0 = <enumerate object at 0xec364608>
    dtype_dset_map = {str(j): d
                      for j, d in enumerate(dtypes)
>                     if d not in (np.float128, np.complex256)}
E   AttributeError: module 'numpy' has no attribute 'float128'
h5py/tests/test_dtype.py:298: AttributeError

@tacaswell
Copy link
Member

On ppc64le we are upcasting 64 bit big endian floats to 128bit big endian floats and then making the value 0?

It looks like we are seeing similar failures on conda-forge trying to build the 2.9 packages on ppc64le

https://travis-ci.org/conda-forge/h5py-feedstock/builds/577406044?utm_source=github_status&utm_medium=notification

https://travis-ci.org/conda-forge/h5py-feedstock/jobs/577406047

@tacaswell
Copy link
Member

We can reproduce the ppc64le failures on travis! https://travis-ci.org/h5py/h5py/jobs/584366489#L11992-L12013

It fails a little bit different:

11993_____________________________ TestTypes.test_float _____________________________
11994
11995self = <h5py.tests.test_attrs_data.TestTypes testMethod=test_float>
11996
11997    def test_float(self):
11998        """ Storage of floating point types """
11999        dtypes = tuple(np.dtype(x) for x in ('<f4', '>f4', '>f8', '<f8'))
12000    
12001        for dt in dtypes:
12002            data = np.ndarray((1,), dtype=dt)
12003            data[...] = 42.3
12004            self.f.attrs['x'] = data
12005            out = self.f.attrs['x']
12006            # TODO: Clean up after issue addressed !
12007            print("dtype: ", out.dtype, dt)
12008            print("value: ", out, data)
12009>           self.assertEqual(out.dtype, dt)
12010E           AssertionError: dtype('>f16') != dtype('>f8')
12011
12012py37-test-deps/lib/python3.7/site-packages/h5py/tests/test_attrs_data.py:119: AssertionError
----------------------------- Captured stdout call -----------------------------
12014dtype:  float32 float32
12015value:  [42.3] [42.3]
12016dtype:  >f4 >f4
12017value:  [42.3] [42.3]
12018dtype:  >f16 >f8
12019value:  [0.000000 4] [42.3]

@tacaswell
Copy link
Member

I am pretty sure the issue is in

h5py/h5py/h5t.pyx

Lines 303 to 322 in fd0753a

cdef (int, int, int) _correct_float_info(ftype_, finfo):
nmant = finfo.nmant
maxexp = finfo.maxexp
minexp = finfo.minexp
# workaround for numpy's buggy finfo on float128 on ppc64 archs
if ftype_ == np.longdouble and MACHINE == 'ppc64':
# values reported by hdf5
nmant = 116
maxexp = 1024
minexp = -1022
elif ftype_ == np.longdouble and MACHINE == 'ppc64le':
# values reported by hdf5
nmant = 52
maxexp = 1024
minexp = -1022
elif nmant == 63 and finfo.nexp == 15:
# This is an 80-bit float, correct mantissa size
nmant += 1
return nmant, maxexp, minexp

h5py/h5py/h5t.pyx

Lines 1369 to 1397 in fd0753a

def _get_float_dtype_to_hdf5():
float_le = {}
float_be = {}
h5_be_list = [IEEE_F16BE, IEEE_F32BE, IEEE_F64BE, IEEE_F128BE,
LDOUBLE_BE]
h5_le_list = [IEEE_F16LE, IEEE_F32LE, IEEE_F64LE, IEEE_F128LE]
if MACHINE != 'ppc64le':
h5_le_list.append(LDOUBLE_LE)
for ftype_, finfo, size in _available_ftypes:
nmant, maxexp, minexp = _correct_float_info(ftype_, finfo)
for h5type in h5_be_list:
spos, epos, esize, mpos, msize = h5type.get_fields()
ebias = h5type.get_ebias()
if (finfo.iexp == esize and nmant == msize and
(maxexp - 1) == ebias
):
float_be[ftype_] = h5type
for h5type in h5_le_list:
spos, epos, esize, mpos, msize = h5type.get_fields()
ebias = h5type.get_ebias()
if (finfo.iexp == esize and nmant == msize and
(maxexp - 1) == ebias
):
float_le[ftype_] = h5type
if ORDER_NATIVE == H5T_ORDER_LE:
float_nt = dict(float_le)
else:
float_nt = dict(float_be)
return float_le, float_be, float_nt
or

h5py/h5py/h5t.pyx

Lines 1050 to 1069 in fd0753a

cdef object py_dtype(self):
# Translation function for floating-point types
order = _order_map[self.get_order()] # string with '<' or '>'
s_offset, e_offset, e_size, m_offset, m_size = self.get_fields()
e_bias = self.get_ebias()
# Handle non-standard exponent and mantissa sizes.
for ftype_, finfo, size in _available_ftypes:
nmant, maxexp, minexp = _correct_float_info(ftype_, finfo)
if (size >= self.get_size() and m_size <= nmant and
(2**e_size - e_bias - 1) <= maxexp and (1 - e_bias) >= minexp):
new_dtype = np.dtype(ftype_).newbyteorder(order)
break
else:
raise ValueError('Insufficient precision in available types to ' +
'represent ' + str(self.get_fields()))
return new_dtype

We now have CI via travis to test this, but if someone has a ppc64le machine they can debug on it would likely go much faster.

@tacaswell tacaswell added this to To do in h5py code camp Sep 18, 2019
@tacaswell tacaswell moved this from To do to In progress in h5py code camp Sep 19, 2019
@kif
Copy link
Contributor Author

kif commented Sep 19, 2019

I just transfered the buggy file created like that and found out the error is more likely to be at the writing than at the reading.
The file faulty attribute cannot be read on amd64 due to missing numpy.float128. on arm64 which supports float128 the value is read as ">f16" with a value of zero.

kif pushed a commit to kif/h5py that referenced this issue Sep 19, 2019
Issue: ">f8" was written as ">f16" on ppc64le
Actually the dict doing the mapping between numpy and hdf5 was wrong.
Stop iterating when first matching is found, avoiding "f16" to replace "f8"
@tacaswell tacaswell moved this from In progress to Done in h5py code camp Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants