We recently had a problem on our internal network where .mat files were served to users with corrupt content. Despite these files being bogus, they were silently accepted by Matlab and other .mat reading libraries.
Even though the .mat format does not contain any integrity-checking features per-se, it turns out that it is possible to perform substantial integrity checks on the contents by verifying the trailing ZLib checksums that get written as part of any miCOMPRESSED element. I've already raised a ticket with the Mathworks asking them to actually check these checksums in their internal .mat file loader -- this pull request tests and implements the same functionality for Scipy's .mat reader.
Scipy's reader would already validate the checksum correctly for many forms of corruption, but there was one missing check -- when the corruption caused the uncompressed stream to be larger than the length of the uncompressed stream that we are expecting. In this case Scipy's reader would consume only a prefix of the uncompressed data and so zlib would never get to see the trailing checksum on the compressed data stream.
This pull request verifies that that the uncompressed stream is entirely consumed by reading the compressed element, preventing this kind of corruption from going undetected. Furthermore, I have added tests for two kinds of corrupted file (one where the Zlib checksum is incorrect, and one where the uncompressed stream is overlong).
Detect over-long compressed sequences, test mat file data integrity c…
Just a comment, it seems likely that systems that scan for malicious files like these zip-bombs will complain about scipy if these files are distributed for testing.
Add docstring for new parameter?
False easier to read than 0
process = False easier to read.
break up string on several lines to fit 79 char limit in PEP 8? http://www.python.org/dev/peps/pep-0008/#maximum-line-length
Test this separately as well?
with open(...) as fp:
instead of try finally.
Thanks - looks good. I hope it doesn't break anyone's code though. How about an option to disable this check?
Update compressed data check in response to code review
Test verify_compressed_data_integrity option
Thanks for the feedback. I've incorporated your comments into the latest version of the code, most notably adding a new option to turn off this check.
+1 this looks good to me