Detect over-long compressed sequences in .mat files #2812

Merged
merged 3 commits into from Sep 19, 2013

Projects

None yet

4 participants

@batterseapower
Contributor

We recently had a problem on our internal network where .mat files were served to users with corrupt content. Despite these files being bogus, they were silently accepted by Matlab and other .mat reading libraries.

Even though the .mat format does not contain any integrity-checking features per-se, it turns out that it is possible to perform substantial integrity checks on the contents by verifying the trailing ZLib checksums that get written as part of any miCOMPRESSED element. I've already raised a ticket with the Mathworks asking them to actually check these checksums in their internal .mat file loader -- this pull request tests and implements the same functionality for Scipy's .mat reader.

Scipy's reader would already validate the checksum correctly for many forms of corruption, but there was one missing check -- when the corruption caused the uncompressed stream to be larger than the length of the uncompressed stream that we are expecting. In this case Scipy's reader would consume only a prefix of the uncompressed data and so zlib would never get to see the trailing checksum on the compressed data stream.

This pull request verifies that that the uncompressed stream is entirely consumed by reading the compressed element, preventing this kind of corruption from going undetected. Furthermore, I have added tests for two kinds of corrupted file (one where the Zlib checksum is incorrect, and one where the uncompressed stream is overlong).

@argriffing
Contributor

Just a comment, it seems likely that systems that scan for malicious files like these zip-bombs will complain about scipy if these files are distributed for testing.

@matthew-brett

Add docstring for new parameter?

@matthew-brett

False easier to read than 0

@matthew-brett

process = False easier to read.

@matthew-brett

break up string on several lines to fit 79 char limit in PEP 8? http://www.python.org/dev/peps/pep-0008/#maximum-line-length

@matthew-brett

Test this separately as well?

@matthew-brett
with open(...) as fp:

instead of try finally.

@matthew-brett

Thanks - looks good. I hope it doesn't break anyone's code though. How about an option to disable this check?

@batterseapower
Contributor

Thanks for the feedback. I've incorporated your comments into the latest version of the code, most notably adding a new option to turn off this check.

@pv
Member
pv commented Sep 18, 2013

+1 this looks good to me

@pv pv merged commit 4938da3 into scipy:master Sep 19, 2013

1 check passed

default The Travis CI build passed
Details
@pv
Member
pv commented Sep 19, 2013

Thanks, merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment