FDK bit buffer causes access violations #61

Closed
enzo1982 opened this Issue Mar 31, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@enzo1982
Contributor

enzo1982 commented Mar 31, 2017

The FDK bit buffer implementation is broken in that it regularly reads past the provided buffer - seemingly as a performance optimization. This causes access violation errors if the bytes directly following the buffer belong to a protected page.

You can see this in action in Valgrind where you get lots of invalid access errors when decoding an AAC stream with libfdk: valgrind.txt

On Windows I get an access violation about once every 4000 files when converting AAC. Here's a dump I got using Dr. Mingw: fdk-aac-dump.txt

The code that causes the bug is basically:

unsigned char	*buffer	    = NIL;
unsigned int	 bufferSize = 0;

MP4ReadSample(mp4File, mp4Track, 1, (uint8_t **) &buffer, (uint32_t *) &bufferSize, NIL, NIL, NIL, NIL);

unsigned int	 bytesValid = bufferSize;
short		 outputBuffer[16384];

aacDecoder_Fill(handle, &buffer, &bufferSize, &bytesValid);
aacDecoder_DecodeFrame(handle, outputBuffer, 16384, 0);

MP4Free(buffer);

I'm working around this bug in my software by copying the bytes returned from MP4ReadSample to a new buffer that is 4 bytes larger than the original one, thereby making sure FDK_get will not access protected pages:

unsigned char	*buffer	    = NIL;
unsigned int	 bufferSize = 0;

MP4ReadSample(mp4File, mp4Track, 1, (uint8_t **) &buffer, (uint32_t *) &bufferSize, NIL, NIL, NIL, NIL);

unsigned int	 bytesValid = bufferSize;
unsigned char	*inputBuffer  = new unsigned char [bufferSize + 4];
short		 outputBuffer[16384];

memcpy(inputBuffer, buffer, bufferSize);

aacDecoder_Fill(handle, &inputBuffer, &bufferSize, &bytesValid);
aacDecoder_DecodeFrame(handle, outputBuffer, 16384, 0);

delete [] inputBuffer;

MP4Free(buffer);

It would be great if this could be fixed in libfdk!

It does not seem to be easy, however. The bit buffer would need to know the actual buffer size, but transportDec_FillData always initializes it with 65536 instead of passing bufferSize, because "the FDK bit buffer implementation needs a number 2^x" which seems quite obscure to me.

@mstorsjo

This comment has been minimized.

Show comment
Hide comment
@mstorsjo

mstorsjo Mar 31, 2017

Owner

Thanks for reporting this issue!

The comment about the bit buffer implementation needing a power of two number of bytes sounds very strange to me as well. On a very brief look through libFDK/src/FDK_bitbuffer.cpp, it seems like the bitstream buffer is used as a circular buffer, and for that reason, a power-of-two buffer size is needed. For the cases when data is read straight from the input buffer allocated by the caller, this obviously is an issue though.

Allocating a slightly large buffer for the input might seem odd at first, but in practice, this would not be the only library with such a restriction. For example, libavcodec also requires input buffers to have extra zero padding at the end - see e.g. https://git.libav.org/?p=libav.git;a=blob;f=libavcodec/avcodec.h;h=1c58fe2d689c1c3daca248208c1e2154737e7568;hb=10f4511f14a4e830c0ed471df4cd1cc2a18a481a#l3734.

Ideally, the library could of course be modified to not overread, but if that doesn't turn out to be straightforward, this issue could just be documented instead.

Owner

mstorsjo commented Mar 31, 2017

Thanks for reporting this issue!

The comment about the bit buffer implementation needing a power of two number of bytes sounds very strange to me as well. On a very brief look through libFDK/src/FDK_bitbuffer.cpp, it seems like the bitstream buffer is used as a circular buffer, and for that reason, a power-of-two buffer size is needed. For the cases when data is read straight from the input buffer allocated by the caller, this obviously is an issue though.

Allocating a slightly large buffer for the input might seem odd at first, but in practice, this would not be the only library with such a restriction. For example, libavcodec also requires input buffers to have extra zero padding at the end - see e.g. https://git.libav.org/?p=libav.git;a=blob;f=libavcodec/avcodec.h;h=1c58fe2d689c1c3daca248208c1e2154737e7568;hb=10f4511f14a4e830c0ed471df4cd1cc2a18a481a#l3734.

Ideally, the library could of course be modified to not overread, but if that doesn't turn out to be straightforward, this issue could just be documented instead.

@enzo1982

This comment has been minimized.

Show comment
Hide comment
@enzo1982

enzo1982 Apr 1, 2017

Contributor

Thank you for your quick response! After looking a little deeper into it, I opt for documenting this behaviour instead of a real fix.

At first I thought replacing

  UINT tx = (hBitBuf->Buffer [ byteOffset    & byteMask] << 24) |
            (hBitBuf->Buffer [(byteOffset+1) & byteMask] << 16) |
            (hBitBuf->Buffer [(byteOffset+2) & byteMask] <<  8) |
             hBitBuf->Buffer [(byteOffset+3) & byteMask];

  if (bitOffset)
  {
    tx <<= bitOffset;
    tx |= hBitBuf->Buffer [(byteOffset+4) & byteMask] >> (8-bitOffset);
  }

with

  UINT tx =                                hBitBuf->Buffer [ byteOffset    & byteMask] << 24 << bitOffset;

  if (numberOfBits + bitOffset >  8) tx |= hBitBuf->Buffer [(byteOffset+1) & byteMask] << 16 << bitOffset;
  if (numberOfBits + bitOffset > 16) tx |= hBitBuf->Buffer [(byteOffset+2) & byteMask] <<  8 << bitOffset;
  if (numberOfBits + bitOffset > 24) tx |= hBitBuf->Buffer [(byteOffset+3) & byteMask]       << bitOffset;
  if (numberOfBits + bitOffset > 32) tx |= hBitBuf->Buffer [(byteOffset+4) & byteMask] >> (8 - bitOffset);

in FDK_get would suffice, but FDK_bitstream.h is doing its own caching to do things like synchronizing and aligning to byte boundaries on top of FDK_BitBuffer and thus still reads behind the buffer.

So, fixing this would probably be a major undertaking and just documenting it properly might be the better way to go.

Contributor

enzo1982 commented Apr 1, 2017

Thank you for your quick response! After looking a little deeper into it, I opt for documenting this behaviour instead of a real fix.

At first I thought replacing

  UINT tx = (hBitBuf->Buffer [ byteOffset    & byteMask] << 24) |
            (hBitBuf->Buffer [(byteOffset+1) & byteMask] << 16) |
            (hBitBuf->Buffer [(byteOffset+2) & byteMask] <<  8) |
             hBitBuf->Buffer [(byteOffset+3) & byteMask];

  if (bitOffset)
  {
    tx <<= bitOffset;
    tx |= hBitBuf->Buffer [(byteOffset+4) & byteMask] >> (8-bitOffset);
  }

with

  UINT tx =                                hBitBuf->Buffer [ byteOffset    & byteMask] << 24 << bitOffset;

  if (numberOfBits + bitOffset >  8) tx |= hBitBuf->Buffer [(byteOffset+1) & byteMask] << 16 << bitOffset;
  if (numberOfBits + bitOffset > 16) tx |= hBitBuf->Buffer [(byteOffset+2) & byteMask] <<  8 << bitOffset;
  if (numberOfBits + bitOffset > 24) tx |= hBitBuf->Buffer [(byteOffset+3) & byteMask]       << bitOffset;
  if (numberOfBits + bitOffset > 32) tx |= hBitBuf->Buffer [(byteOffset+4) & byteMask] >> (8 - bitOffset);

in FDK_get would suffice, but FDK_bitstream.h is doing its own caching to do things like synchronizing and aligning to byte boundaries on top of FDK_BitBuffer and thus still reads behind the buffer.

So, fixing this would probably be a major undertaking and just documenting it properly might be the better way to go.

@mstorsjo

This comment has been minimized.

Show comment
Hide comment
@mstorsjo

mstorsjo Apr 3, 2017

Owner

I guess the main question, which I can try to look into using valgrind at some point, is - is it enough to add 4 bytes of zero initialized padding, or would it still overread more than that? (If the internal apis actually don't take the size parameter into account?) That is, if you take a real AAC packet, truncate it and copy to a new allocation of that exact size + 4 bytes zeros, would it still overread past those extra padding bytes, or would that still work (failing decoding cleanly)?

Owner

mstorsjo commented Apr 3, 2017

I guess the main question, which I can try to look into using valgrind at some point, is - is it enough to add 4 bytes of zero initialized padding, or would it still overread more than that? (If the internal apis actually don't take the size parameter into account?) That is, if you take a real AAC packet, truncate it and copy to a new allocation of that exact size + 4 bytes zeros, would it still overread past those extra padding bytes, or would that still work (failing decoding cleanly)?

@mstorsjo

This comment has been minimized.

Show comment
Hide comment
@mstorsjo

mstorsjo Apr 8, 2017

Owner

FWIW, I tested decoding with truncated packets, and in that case, having allocated the input buffer 4 bytes larger than the truncated data is not enough - it can still overread in those cases. So just allocating a larger buffer doesn't seem to be enough to be safe, in case the input data is broken somehow.

Not sure how large undertaking it is to fix it properly though...

Owner

mstorsjo commented Apr 8, 2017

FWIW, I tested decoding with truncated packets, and in that case, having allocated the input buffer 4 bytes larger than the truncated data is not enough - it can still overread in those cases. So just allocating a larger buffer doesn't seem to be enough to be safe, in case the input data is broken somehow.

Not sure how large undertaking it is to fix it properly though...

@enzo1982

This comment has been minimized.

Show comment
Hide comment
@enzo1982

enzo1982 Apr 9, 2017

Contributor

The issue with truncated packets is easy to fix. Just add a guard to FDK_get (and FDK_get32) in FDK_bitbuffer.cpp:

if (numberOfBits > hBitBuf->ValidBits + 31) return 0;

and change the subtraction of hBitBuf->ValidBits so it doesn't wrap around (alternatively, make ValidBits signed):

hBitBuf->ValidBits -= fMin(numberOfBits, hBitBuf->ValidBits);

and in FDK_get32:

if (hBitBuf->ValidBits == 0) return 0;

and:

hBitBuf->ValidBits -= fMin(32, hBitBuf->ValidBits);

The +31 in the first line is needed because of the caching in FDK_bitstream.h, so this fixes issues with truncated packets, but not the original 4 bytes overflow problem.

Contributor

enzo1982 commented Apr 9, 2017

The issue with truncated packets is easy to fix. Just add a guard to FDK_get (and FDK_get32) in FDK_bitbuffer.cpp:

if (numberOfBits > hBitBuf->ValidBits + 31) return 0;

and change the subtraction of hBitBuf->ValidBits so it doesn't wrap around (alternatively, make ValidBits signed):

hBitBuf->ValidBits -= fMin(numberOfBits, hBitBuf->ValidBits);

and in FDK_get32:

if (hBitBuf->ValidBits == 0) return 0;

and:

hBitBuf->ValidBits -= fMin(32, hBitBuf->ValidBits);

The +31 in the first line is needed because of the caching in FDK_bitstream.h, so this fixes issues with truncated packets, but not the original 4 bytes overflow problem.

@mstorsjo

This comment has been minimized.

Show comment
Hide comment
@mstorsjo

mstorsjo Apr 9, 2017

Owner

Oh, thanks for looking into it! Would you care to make a PR out of those changes?

Owner

mstorsjo commented Apr 9, 2017

Oh, thanks for looking into it! Would you care to make a PR out of those changes?

@enzo1982

This comment has been minimized.

Show comment
Hide comment
@enzo1982

enzo1982 Apr 9, 2017

Contributor

Just found an even better solution that will fix the 4 bytes overflow too. It involves my code from comment 3 plus a check for ValidBits in FDKreadBits.

I'll need a few days to get this ready and will create a PR then.

Contributor

enzo1982 commented Apr 9, 2017

Just found an even better solution that will fix the 4 bytes overflow too. It involves my code from comment 3 plus a check for ValidBits in FDKreadBits.

I'll need a few days to get this ready and will create a PR then.

@mstorsjo

This comment has been minimized.

Show comment
Hide comment
@mstorsjo

mstorsjo Jun 12, 2017

Owner

This should be fixed now.

Owner

mstorsjo commented Jun 12, 2017

This should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment