New regression test for reading compressed matlab streams. (Trac #918) #1445

Closed
scipy-gitbot opened this Issue Apr 25, 2013 · 3 comments

Projects

None yet

1 participant

@scipy-gitbot

Original ticket http://projects.scipy.org/scipy/ticket/918 on 2009-04-11 by @thouis, assigned to unknown.

The test case, "test_skip_variable" in the included version of test_mio.py, demonstrates a bug in the 0.7.0 version of mio.mat_reader_factory that becomes apparent when you use the factory's get_variables method to skip over a first large, compressed variable to read a variable that appears after that variable.

This bug is no longer present in the current development tree. This patch adds a regression test. Lee Kamentsky discovered the bug, wrote the test case and had developed a patch relative to 0.7.0 (no longer relevant, but available if needed).

Below is the message from Lee describing the bug.


The test case, "test_skip_variable" in the included version of test_mio.py, demonstrates a bug in the mio.mat_reader_factory that becomes apparent when you use the factory's get_variables method to skip over a first large, compressed variable to read a variable that appears after that variable. The current code base misplaces the file object read location within the compressed variable's data rather than at its end, causing an exception when the code attempts to read the second variable.

I've got a distilled test case, scipy/io/matlab/tests/data/test_skip_variable.mat (20K), which is just barely large enough to trigger the bug (happens for compressed object > 16384).

The code:

273 def matrix_getter_factory(self):
274 __ Returns reader for next matrix at top level __
275 tag = self.read_dtype(self.dtypes['tag_full'])
276 mdtype = tag['mdtype'].item()
277 byte_count = tag['byte_count'].item()
278 if mdtype == miCOMPRESSED:
279 getter = Mat5ZArrayReader(self, byte_count).matrix_getter_factory()
280 elif not mdtype == miMATRIX:
281 raise TypeError, \
282 'Expecting miMATRIX type here, got %d' % mdtype
283 else:
284 getter = self.current_getter(byte_count)
285 next_pos = self.mat_stream.tell() + byte_count
286 getter.next_position = next_pos
287 return getter

For a compressed array, you enter this loop twice. The first time through, it reads the compression wrapper and, at line 279, it calls matrix_getter_factory again. This continues through line 284 to create the actual matrix reader and at line 286, it sets next_pos to the correct, uncompressed size. The code returns to line 279 and, in the old code, falls through to lines 286 (which was unindented in the old version) and mistakenly sets the getter next_position to the compressed size.

getter.next_position is used in miobase to in MatMatrixGetter.to_next() to advance past the object. There were no test cases of this function and the code would appear to work on small compressed objects of compressed size < 16384 because the entire object would be read and buffered. The test case included fails because it requires a second read to advance the stream.

@scipy-gitbot

@thouis wrote on 2009-04-11

Oops. Lost some formatting in the original post. The relevant bit:

The code:

273    def matrix_getter_factory(self):
274        __ Returns reader for next matrix at top level __
275        tag = self.read_dtype(self.dtypes['tag_full'])
276        mdtype = tag['mdtype'].item()
277        byte_count = tag['byte_count'].item()
278        if mdtype == miCOMPRESSED:
279            getter = Mat5ZArrayReader(self, byte_count).matrix_getter_factory()
280        elif not mdtype == miMATRIX:
281            raise TypeError, \
282                  'Expecting miMATRIX type here, got %d' %  mdtype
283        else:
284            getter = self.current_getter(byte_count)
285            next_pos = self.mat_stream.tell() + byte_count
286            getter.next_position = next_pos
287        return getter

For a compressed array, you enter this loop twice. The first time through, it reads the compression wrapper and, at line 279, it calls matrix_getter_factory again. This continues through line 284 to create the actual matrix reader and at line 286, it sets next_pos to the correct, uncompressed size. The code returns to line 279 and, in the old code, falls through to lines 286 (which was unindented in the old version) and mistakenly sets the getter next_position to the compressed size.

getter.next_position is used in miobase to in MatMatrixGetter.to_next() to advance past the object. There were no test cases of this function and the code would appear to work on small compressed objects of compressed size < 16384 because the entire object would be read and buffered. The test case included fails because it requires a second read to advance the stream.

@scipy-gitbot

@stefanv wrote on 2009-04-12

Applied in 50359f6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment