Skip to content
This repository has been archived by the owner on Jul 22, 2018. It is now read-only.

Make rawspeed much more resilient to crashing #141

Merged
merged 23 commits into from
Mar 19, 2016

Conversation

pedrocr
Copy link
Contributor

@pedrocr pedrocr commented Feb 21, 2016

Just like discussed in #137 I've made the getData() API safer. I've also been running rawspeed under afl-fuzz and fixing quite a few crashing bugs. I've ran regression testing against the rawsamples.ch sample set and everything is still working as it should.

pedrocr and others added 23 commits February 21, 2016 15:32
Rawspeed works by reading the whole file into memory and then
indexing into it. This together with using file inputs as offsets
can lead to out of bounds reads in memory. Implement a better
internal API to make these checks both easier to write and harder
to skip. This covers three main cases:

1) Whenever we are accessing data directly from the file we
   used to do something like:

uchar8 *data = mFile->getData(offset);

Replace that with:

uchar8 *data = mFile->getData(offset, len);

and then do proper bounds checking to make sure we can safely
read len bytes from the returned pointer.

2) When we want to read a byte stream we will often do something
   like:

ByteStream in(mFile->getData(off), mFile->getSize()-off);

which basically means we allow reading from the offset all the
way to the end of the file. Make that simpler by just doing:

ByteStream in(mFile, off);

and have that do all the needed bounds checking.

3) When we want to read a byte stream up to a certain number of
   bytes we will do something like:

ByteStream in(mFile->getData(off), len);

which may be unsafe if len > mFile->getSize()-off. Instead do:

ByteStream in(mFile, off, len);

and that will make sure the bounds checking is done properly.
FileMap includes by default 16 bytes of extra space so that
things like BitPumpMSB can read safely a bit beyond the size
of the file. Take that into account properly when doing bounds
checking. Fixes some broken D800 files that had the image data
go all the way to the end of the file.
TiffEntry parsing can lead to out-of-bounds access. Guard
against that with getData() checks instead of CHEKSIZE and
bail out of processing IFDs or Entries when the checks
fail but don't bail out of the whole file because of that.
Raw formats being what they are they are still parseable
even if some entries are broken.
CIFF has a similar method for reading where the number
of entries is fetched from the file. Do the same checking
of offsets and handling of errors that was already done
for TIFF.
Don't force there to be an extra byte at the end of a
TiffEntry/CiffEntry for it to be valid. As far as I
can tell this didn't break any files but it seems
wrong anyway so go ahead and fix it.
Integer overflow is tricky. Make sure offset+size never
overflows by doing the calculation in 64bits.
When offset==size we're exactly 1byte over the end of the
file so reject that location.
When a Tiff entry is empty (count == 0) instead of
doing a NULL-dereference on things like getInt()
just return some empty data instead.
If a TIFF entry had an empty string we'd try and
allocate a zero-sized array and then set to \0 the
position before that array resulting in SIGSEGV.
Instead just return the empty string.
TIFF entry count of items is a 32bit integer but the
corresponding byte size can overflow those 32bits if the
count is very large and the byte size is 4 or 8. Reject
the tiff entry if the byte size exceeds UINT32_MAX as
we don't handle files larger than that anyway.
A common pattern in the code was isValid(offset+count) which
is actually wrong as offset+count is the first byte after the
region we want. Create a new API that's isValid(offset,count)
that does the right thing by default and convert all the users
to it.
It's easy enough to generate an infinite looping TIFF
by having a SubIFD that points towards itself (or other
more complex indirections). Avoid crashing on this by
setting a maximum TIFF depth and aborting once reached.
Just as TIFF allows for inifinite looping SubIFDs so
does CIFF (CRW base format). Avoid those loops the
same way as we already did for TIFF.
At the root level TIFF has the notion of the next IFD
and if that's made into a loop we get into a loop of
heap-allocating calls, eventually running out of memory.
Fix that just by limiting the max number of SubIFDs the
root one can have to 100 which should be more than plenty
for any sane file.
This isn't actually used anywhere so just remove it.
Now that the API itself does enough bounds checking
we don't need to have these macros around anymore.
The TIFF metadata block may not have been found so check
for that before trying to dereference the pointer to look
for make and model.
When doing bad pixel processing it makes no sense to
keep on going when a lot of pixels are bad. Bail out
after 500. This avoids a pathological case where all
pixels are marked as bad and so we turn raw processing
into O(n^2) or worse.
A malformed DNG file can cause a huge memory allocation and
memset by claiming to have a huge CFA array. Bail out earlier
as the decode is never going to work anyway.
klauspost added a commit that referenced this pull request Mar 19, 2016
Make rawspeed much more resilient to crashing
@klauspost klauspost merged commit f3cdd8e into klauspost:develop Mar 19, 2016
@klauspost
Copy link
Owner

Thanks a bunch!

@pedrocr
Copy link
Contributor Author

pedrocr commented Mar 19, 2016

:) there should be quite a few more on the way, afl-fuzz is really good at finding issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants