Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem reading Matlab 4.5 file with scipy.io.loadmat #2452

Closed
msoos opened this issue May 6, 2013 · 10 comments

Comments

Projects
None yet
3 participants
@msoos
Copy link

commented May 6, 2013

When reading in a Matlab 4.5 file, I get the error:

In [1]: import scipy

In [2]: import scipy.io

In [3]: scipy.__version__
Out[3]: '0.12.0'

In [4]: a = scipy.io.loadmat('/home/soos/tmp/test.mat')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/soos/tmp/<ipython-input-4-214cad8644c0> in <module>()
----> 1 a = scipy.io.loadmat('/home/soos/tmp/test.mat')

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio.pyc in loadmat(file_name, mdict, appendmat, **kwargs)
    174     variable_names = kwargs.pop('variable_names', None)
    175     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 176     matfile_dict = MR.get_variables(variable_names)
    177     if mdict is not None:
    178         mdict.update(matfile_dict)

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio4.pyc in get_variables(self, variable_names)
    390                 self.mat_stream.seek(next_position)
    391                 continue
--> 392             mdict[name] = self.read_var_array(hdr)
    393             self.mat_stream.seek(next_position)
    394             if variable_names:

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio4.pyc in read_var_array(self, header, process)
    367            `process`.
    368         '''
--> 369         return self._matrix_reader.array_from_header(header, process)
    370 
    371     def get_variables(self, variable_names=None):

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio4.pyc in array_from_header(self, hdr, process)
    135         mclass = hdr.mclass
    136         if mclass == mxFULL_CLASS:
--> 137             arr = self.read_full_array(hdr)
    138         elif mclass == mxCHAR_CLASS:
    139             arr = self.read_char_array(hdr)

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio4.pyc in read_full_array(self, hdr)
    199             res_j = self.read_sub_array(hdr, copy=False)
    200             return res + (res_j * 1j)
--> 201         return self.read_sub_array(hdr)
    202 
    203     def read_char_array(self, hdr):

/usr/local/lib/python2.7/dist-packages/scipy/io/matlab/mio4.pyc in read_sub_array(self, hdr, copy)
    174                          dtype=dt,
    175                          buffer=self.mat_stream.read(int(num_bytes)),
--> 176                          order='F')
    177         if copy:
    178             arr = arr.copy()

TypeError: buffer is too small for requested array

You can access the offending file (gzipped) here:

http://msoos.org/largefiles/test.mat.gz

@rgommers

This comment has been minimized.

Copy link
Member

commented May 6, 2013

Can you give us the actual Matlab version? 4.5 doesn't exist according to https://en.wikipedia.org/wiki/MATLAB, and if it did it would be from 1995.

@msoos

This comment has been minimized.

Copy link
Author

commented May 6, 2013

Oops, sorry. It's not actually produced by Matlab, but by a program that has a matlab output mode, a PicoScope: http://www.picotech.com/picoscope-oscilloscope-software.html I was just guessing the Matlab version, I am sorry. But Matlab can read it, so it's supposed to be OK...

@rgommers

This comment has been minimized.

Copy link
Member

commented May 6, 2013

Hmm, it's likely that it doesn't produce identical output to Matlab itself. If you load it in Matlab, then save it to a new .mat file, does it work?

@rgommers

This comment has been minimized.

Copy link
Member

commented May 6, 2013

@matthew-brett any idea how sensitive io.matlab would be to small differences?

@matthew-brett

This comment has been minimized.

Copy link
Contributor

commented May 6, 2013

Sensitive I guess :)

I would ike sio.loadmat to load any file that matlab can load, so I'll try and get this one to work. Am on a sprint at the moment, can get to it by the weekend.

@msoos

This comment has been minimized.

Copy link
Author

commented May 6, 2013

Thanks for the quick responses! Importing into matlab and exporting again produces this file:

http://msoos.org/largefiles/test2.mat.gz

which can be read by scipy without any issues and contains the correct data as far as I can see. Would be awesome if you could fix the problem in scilab so it could import the file, too.

@matthew-brett

This comment has been minimized.

Copy link
Contributor

commented May 6, 2013

We'll give it a try ...

@matthew-brett

This comment has been minimized.

Copy link
Contributor

commented May 13, 2013

I had a look. It turns out that your file has a matrix 'B', for which there is no or not enough data. Matlab appears to silently ignore it, octave does this:

octave:1> load test.mat
error: load: reading matrix data for `B'
error: load: trouble reading binary file `test.mat'
octave:1> whos
Variables in the current scope:
   Attr Name            Size                     Bytes  Class
   ==== ====            ====                     =====  ===== 
        A         1785976x1                   14287808  double
        Length          1x1                          8  double
        Tinterval       1x1                          8  double
        Tstart          1x1                          8  double
        ans             1x26                        26  char

Which is essentially what (my) version of matlab does:

>> load test.mat
Error using load
Can't read file /home/mb312/tmp/test.mat.
 
>> whos
  Name                 Size               Bytes  Class     Attributes
  A              1785976x1             14287808  double              
  Length               1x1                    8  double              
  Tinterval            1x1                    8  double              
  Tstart               1x1                    8  double              
  ans                  1x11                  22  char

Scipy sees the variable names, including 'B', which appears to be last in the table:

In [10]: import scipy.io as sio
In [11]: sio.whosmat('test.mat')
Out[11]: 
[('A', (1785976, 1), 'double'),
 ('Tstart', (1, 1), 'double'),
 ('Tinterval', (1, 1), 'double'),
 ('Length', (1, 1), 'double'),
 ('B', (1785976, 1), 'double')]

The question is whether to return the correctly read variables and raise a warning rather than raise an error in the reader. I really don't want to let unfixably malformed files pass without an error because we may really be reading files wrong still, and we (and our users) need to know if so. So, I would prefer not to change the behavior here, and suggest using:

In [12]: infos = sio.whosmat('test.mat')
In [13]: names, sizes, types = zip(*infos)
In [14]: names
Out[14]: ('A', 'Tstart', 'Tinterval', 'Length', 'B')
In [16]: res = sio.loadmat('test.mat', variable_names=list(names[:-1]))

Would that work for you?

@msoos

This comment has been minimized.

Copy link
Author

commented May 14, 2013

Perfect! And sorry for the late answer. I will send this link to the PicoScope people so they can fix their own bug. Thank you very much for this.

A small note: maybe the error could be more verbose, possibly with the example code you put above. After all, Matlab silently ignores the error, so maybe we could give the option/help to the user how to silently ignore it, too. At least this bugreport and your time won't go to waste then :)

@matthew-brett

This comment has been minimized.

Copy link
Contributor

commented May 18, 2013

Thanks for the feedback - I've raised a more helpful error in this pull request:

#2499

jnothman pushed a commit to jnothman/scipy that referenced this issue Jan 22, 2014

ENH: More helpful error reading bad mat4 file
Prompted by discussion here:

scipy#2452

Tell user what matrix they are reading and how they might get to useful
data nevertheless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.