Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using 'loadmat' with h5py 3.0 #102

Closed
Blubbaa opened this issue Nov 5, 2020 · 7 comments
Closed

Error using 'loadmat' with h5py 3.0 #102

Blubbaa opened this issue Nov 5, 2020 · 7 comments
Labels

Comments

@Blubbaa
Copy link

Blubbaa commented Nov 5, 2020

I have recently upgraded to h5py 3.0.0, as i need some of the new features. As #101 also pointed out, currently hdf5storage is broken when using 3.0.0. However for me using the master branch with v2.0 does not fix it. I am adding an example here, as I am frequently loading v7.3 .mat files from Matlab.

The following code produces an ValueError, which is actually hidden if you supply a list of variable_names. After some debugging and reading the change list from 3.0, I still don't really understand exactly whats going wrong there. It seems to read an attribute named 'MATLAB_fields' from the file, thats where it fails.

Example

print("h5py version: ", h5py.__version__)
print("hdf5storage version: ", hdf5storage.__version__)

file_name = r'test_file.mat'
file_dict = hdf5storage.loadmat(file_name, appendmat=False, variable_names=None)

Output

h5py version:  3.0.0
hdf5storage version:  0.2
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-15-4a4156bc29cd> in <module>
      7 
      8 file_name = r'./data/test_file.mat'
----> 9 file_dict = hdf5storage.loadmat(file_name, appendmat=False, variable_names=None)
     10 
     11 

c:\users\jonas\documents\phd\venv\lib\site-packages\hdf5storage\__init__.py in loadmat(file_name, mdict, appendmat, variable_names, marshaller_collection, **keywords)
   2557         with File(filename, writable=False, options=options) as f:
   2558             if variable_names is None:
-> 2559                 data = {pathesc.unescape_path(k): v for k, v in f.items()}
   2560             else:
   2561                 data = dict()

c:\users\jonas\documents\phd\venv\lib\site-packages\hdf5storage\__init__.py in <dictcomp>(.0)
   2557         with File(filename, writable=False, options=options) as f:
   2558             if variable_names is None:
-> 2559                 data = {pathesc.unescape_path(k): v for k, v in f.items()}
   2560             else:
   2561                 data = dict()

C:\Program Files\Python38\lib\_collections_abc.py in __iter__(self)
    742     def __iter__(self):
    743         for key in self._mapping:
--> 744             yield (key, self._mapping[key])
    745 
    746 ItemsView.register(dict_items)

c:\users\jonas\documents\phd\venv\lib\site-packages\hdf5storage\__init__.py in __getitem__(self, path)

-> 2053         return self.reads((path, ))[0]
   2054 
   2055     def __setitem__(self, path, data):

c:\users\jonas\documents\phd\venv\lib\site-packages\hdf5storage\__init__.py in reads(self, paths)
   1919                         + groupname + '.')
   1920                 # Hand off everything to the low level reader.
-> 1921                 datas.append(utilities.read_data(self._file,
   1922                                                  self._file[groupname],
   1923                                                  targetname,

c:\users\jonas\documents\phd\venv\lib\site-packages\hdf5storage\utilities.py in read_data(f, grp, name, options, dsetgrp)
    210     # Get all attributes with values.
    211     defaultfactory = type(None)
--> 212     attributes = collections.defaultdict(defaultfactory,
    213                                          dsetgrp.attrs.items())
    214 

c:\users\jonas\documents\phd\venv\lib\site-packages\h5py\_hl\base.py in __iter__(self)
    430         with phil:
    431             for key in self._mapping:
--> 432                 yield (key, self._mapping.get(key))
    433 
    434 

C:\Program Files\Python38\lib\_collections_abc.py in get(self, key, default)
    658         'D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.'
    659         try:
--> 660             return self[key]
    661         except KeyError:
    662             return default

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

c:\users\jonas\documents\phd\venv\lib\site-packages\h5py\_hl\attrs.py in __getitem__(self, name)
     75 
     76         arr = numpy.ndarray(shape, dtype=dtype, order='C')
---> 77         attr.read(arr, mtype=htype)
     78 
     79         string_info = h5t.check_string_dtype(dtype)

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5a.pyx in h5py.h5a.AttrID.read()

h5py\_proxy.pyx in h5py._proxy.attr_rw()

h5py\_conv.pyx in h5py._conv.vlen2ndarray()

h5py\_conv.pyx in h5py._conv.conv_vlen2ndarray()

ValueError: data type must provide an itemsize
@kb-
Copy link

kb- commented Jan 15, 2021

savemat is also broken with h5pi 3.x. The following code stopped working:
hdf5storage.savemat(file, data, format='7.3', oned_as='row', store_python_metadata=True, matlab_compatible=True)

Reverting to h5pi to 2.10.0 lets it work with the following warning:

\lib\site-packages\hdf5storage_init_.py: 1234 : H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
f = h5py.File(filename)

@frejanordsiek
Copy link
Owner

frejanordsiek commented Feb 17, 2021

Sorry I have taken so long to get around to this.

The problem appears is a backwards incompatible change in h5py or a bug. Specifically, the problem comes up with reading the 'MATLAB_fields' Attribute which has a quite unusual type. It can be written, but it can no longer be read in any way except probably through h5py's low level API which is no longer documented.

The bug shows up if one does the following to make an Attribute with the same type

>>> import numpy, h5py
>>> dt = h5py.vlen_dtype(numpy.dtype('S1'))
>>> a = numpy.empty((1, ), dtype=dt)
>>> a[0] = numpy.array([b'a', b'b'], dtype='S1')
>>> f = h5py.File('data.h5', mode='a')
>>> f.attrs.create('test', a)
>>> f.attrs['test']

The output from h5dump data.h5 is

HDF5 "data.h5" {
GROUP "/" {
   ATTRIBUTE "test" {
      DATATYPE  H5T_VLEN { H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }}
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): ("a", "b")
      }
   }
}
}

I am going to bring this up with h5py and see what can be done about it, including whether there is a good work around using the low level API (the more raw libhdf5 bindings).

@sethrj
Copy link

sethrj commented Feb 18, 2021

See h5py/h5py#1817

@frejanordsiek
Copy link
Owner

Workarounds added in commit 3008efs for the main branch and commit a63128b for the 0.1.x branch. The package should now work for h5py 3.0 and 3.1. I will be uploading version 0.1.16 to PyPI shortly.

@frejanordsiek
Copy link
Owner

Fixed for 32-bit little endian systems in commit 9f021ee for the 0.1.x branch and commit c8a306e for the main branch. I still don't know if it works on big-endian systems.

@frejanordsiek
Copy link
Owner

Had a bug in the commits fixing the issues on 32-bit systems. Recent commits fix that.

@frejanordsiek
Copy link
Owner

Just released version 0.1.17 on PyPI which includes the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants