This repository has been archived by the owner on Jul 22, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 20
Make rawspeed much more resilient to crashing #141
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rawspeed works by reading the whole file into memory and then indexing into it. This together with using file inputs as offsets can lead to out of bounds reads in memory. Implement a better internal API to make these checks both easier to write and harder to skip. This covers three main cases: 1) Whenever we are accessing data directly from the file we used to do something like: uchar8 *data = mFile->getData(offset); Replace that with: uchar8 *data = mFile->getData(offset, len); and then do proper bounds checking to make sure we can safely read len bytes from the returned pointer. 2) When we want to read a byte stream we will often do something like: ByteStream in(mFile->getData(off), mFile->getSize()-off); which basically means we allow reading from the offset all the way to the end of the file. Make that simpler by just doing: ByteStream in(mFile, off); and have that do all the needed bounds checking. 3) When we want to read a byte stream up to a certain number of bytes we will do something like: ByteStream in(mFile->getData(off), len); which may be unsafe if len > mFile->getSize()-off. Instead do: ByteStream in(mFile, off, len); and that will make sure the bounds checking is done properly.
FileMap includes by default 16 bytes of extra space so that things like BitPumpMSB can read safely a bit beyond the size of the file. Take that into account properly when doing bounds checking. Fixes some broken D800 files that had the image data go all the way to the end of the file.
TiffEntry parsing can lead to out-of-bounds access. Guard against that with getData() checks instead of CHEKSIZE and bail out of processing IFDs or Entries when the checks fail but don't bail out of the whole file because of that. Raw formats being what they are they are still parseable even if some entries are broken.
CIFF has a similar method for reading where the number of entries is fetched from the file. Do the same checking of offsets and handling of errors that was already done for TIFF.
Don't force there to be an extra byte at the end of a TiffEntry/CiffEntry for it to be valid. As far as I can tell this didn't break any files but it seems wrong anyway so go ahead and fix it.
Integer overflow is tricky. Make sure offset+size never overflows by doing the calculation in 64bits.
When offset==size we're exactly 1byte over the end of the file so reject that location.
When a Tiff entry is empty (count == 0) instead of doing a NULL-dereference on things like getInt() just return some empty data instead.
If a TIFF entry had an empty string we'd try and allocate a zero-sized array and then set to \0 the position before that array resulting in SIGSEGV. Instead just return the empty string.
TIFF entry count of items is a 32bit integer but the corresponding byte size can overflow those 32bits if the count is very large and the byte size is 4 or 8. Reject the tiff entry if the byte size exceeds UINT32_MAX as we don't handle files larger than that anyway.
A common pattern in the code was isValid(offset+count) which is actually wrong as offset+count is the first byte after the region we want. Create a new API that's isValid(offset,count) that does the right thing by default and convert all the users to it.
It's easy enough to generate an infinite looping TIFF by having a SubIFD that points towards itself (or other more complex indirections). Avoid crashing on this by setting a maximum TIFF depth and aborting once reached.
Just as TIFF allows for inifinite looping SubIFDs so does CIFF (CRW base format). Avoid those loops the same way as we already did for TIFF.
At the root level TIFF has the notion of the next IFD and if that's made into a loop we get into a loop of heap-allocating calls, eventually running out of memory. Fix that just by limiting the max number of SubIFDs the root one can have to 100 which should be more than plenty for any sane file.
This isn't actually used anywhere so just remove it.
Now that the API itself does enough bounds checking we don't need to have these macros around anymore.
The TIFF metadata block may not have been found so check for that before trying to dereference the pointer to look for make and model.
When doing bad pixel processing it makes no sense to keep on going when a lot of pixels are bad. Bail out after 500. This avoids a pathological case where all pixels are marked as bad and so we turn raw processing into O(n^2) or worse.
A malformed DNG file can cause a huge memory allocation and memset by claiming to have a huge CFA array. Bail out earlier as the decode is never going to work anyway.
klauspost
added a commit
that referenced
this pull request
Mar 19, 2016
Make rawspeed much more resilient to crashing
Thanks a bunch! |
:) there should be quite a few more on the way, afl-fuzz is really good at finding issues. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Just like discussed in #137 I've made the getData() API safer. I've also been running rawspeed under afl-fuzz and fixing quite a few crashing bugs. I've ran regression testing against the rawsamples.ch sample set and everything is still working as it should.