New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-compliant decoding according to section 4? #502

Closed
dsnet opened this Issue Feb 6, 2017 · 11 comments

Comments

Projects
None yet
3 participants
@dsnet
Member

dsnet commented Feb 6, 2017

I was running fuzz testing on my Brotli implementation and noticed that the following input differs when decompressed by my version compared to the canonical one here.

The input causing issues (in hex): e2e184886a563080e0391603e430f9

When decompressed by the google/brotli C implementation:
worldwhitemediafirstirstirstirstirstirstirstirstirstirstirst...
When decompressed by my implementation:
worldorldorldorldorldorldorldorldorldorldorldorldorldorldorl...

According to section 4 of the RFC:

When a distance symbol 0 appears, the distance it represents (i.e.,
the last distance in the sequence of distances) is not pushed to the
ring buffer of last distances

However, this seems contrary to what C decoder currently does:

if (s->distance_code >= 0) {
	--s->dist_rb_idx;
	s->distance_code = s->dist_rb[s->dist_rb_idx & 3];
	goto postReadDistance;  /* We already have the implicit distance */
}

The logic in the C version seems contrary to the stated prose in the RFC, which makes no mention that an implicit zero reverse rotates the distance ring-buffer.

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 6, 2017

This piece of code is very tricky; it allows continuation after break in decoding.
We use special value (-1) to specify that distance is not decoded yet.
If distance is decoded, we retake it from distance ring-buffer.
Please, take a look where s->dist_rb_idx is incremented and decremented. For distance == 0 it has one extra decrement, so after processing the command it remains unchanged.

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 6, 2017

I believe there is in a mismatch in the increments and decrements in the case where the implicit zero command is repeatedly used and the static dictionary case is hit. It seems that it will short-circuit the increment logic (which only occurs in the dynamic dictionary case), causing the handling of implicit zeros to accidentally decrement one too much each time.

I compiled with debugging and logging enabled and ran bro -d. The relevant snippet is:

...
[ProcessCommandsInternal] pos = 0 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 0 distance = 4
[ProcessCommandsInternal] dictionary word: [world]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9995
[ProcessCommandsInternal] pos = 5 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 5 distance = 11
[ProcessCommandsInternal] dictionary word: [white]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9990
[ProcessCommandsInternal] pos = 10 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 10 distance = 15
[ProcessCommandsInternal] dictionary word: [media]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9985
[ProcessCommandsInternal] pos = 15 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 15 distance = 16
[ProcessCommandsInternal] dictionary word: [first]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9980
[ProcessCommandsInternal] pos = 20 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 20 distance = 4
UPDATE_RING_BUFFER
[ProcessCommandsInternal] pos = 25 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 25 distance = 4
UPDATE_RING_BUFFER
[ProcessCommandsInternal] pos = 30 insert = 0 copy = 5
IMPLICIT_ZERO
...

The IMPLICIT_ZERO message was a printf added after L1706 of decode.c. In the log printout, you see that the distance is sweeping through the sequences [4, 11, 15, 16, 4, 4, 4, 4, 4, ...]. Notice it how it goes in reverse order through the initial distances in the ring buffer (i.e., 4, 11, 15, 16) even though only the zero code is used? After exhausting the ring buffer, it no longer performs static dictionary lookups, and starts doing dynamic dictionary lookups, causing it to properly stay at 4 for the rest of the time.

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 7, 2017

Oh! I see. Thanks for the report. Going to fix that ASAP.

eustas added a commit that referenced this issue Feb 7, 2017

Fix #502 decoder bug
Decoder may have produced invalid output if:
 * at offset 0..3 dictionary word with index 3..0 for some length N is used
   and distance is encoded with direct distance code 0, and
 * at least one of next 4 commands use value from distance ringbuffer

@eustas eustas closed this in 0749d9c Feb 7, 2017

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 7, 2017

Fixed. Again, thank you! This bug is a new "survival champion" - it was introduced about 1.5 years ago =O

@ralfjunker

This comment has been minimized.

ralfjunker commented Feb 7, 2017

Testing 0749d9c does not give me the same decompressed output as @dsnet proposes in his initial report of this issue.

Decompressing hex e2e184886a563080e0391603e430f9 results in

worldsmallmallmallmallmallmallmallmallmallmallmallmallma... (10000 bytes in total)

instead of the expected

worldorldorldorldorldorldorldorldorldorldorldorldorldorldor... (no total given).

@dsnet's "wrong" output before 0749d9c matches mine, so I am inclined to believe that his fixed output is also correct, in which case the fix is wrong.

Could @dsnet or @eustas please shed some light? Thanks!

PS: Will there be a test case?

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 7, 2017

@ralfjunker, what implementation is yours?

Mine repeats "orld" until the 10000 byte limit is hit.

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 7, 2017

My interpretation of what should happen (assuming all implementations agree about huffman decoding of commands):

There is a repeated sequence of commands of the form (length:5, dist: 0).

According to section 4, dist:0 should result in the last distance being used, which defaults to 4 (without updating the distance ring buffer; not that it matters).

Thus, this is really a repeated sequence of (length:5, dist:4) commands.

For the first command, there is no output, so it falls under the static dictionary case according to:

if distance is less than the max allowed distance plus one
	move backwards distance bytes in the uncompressed data,
	and copy CLEN bytes from this position to
	the uncompressed stream
else
	look up the static dictionary word, transform the word as
	directed, and copy the result to the uncompressed stream

This copies "world" to the output.

For all subsequent commands, there is sufficient data in the output such that a distance of 4 is always satisfied by the dynamic dictionary. Thus, copying "orldo" the first time, "rldor" the second time, "ldorl" the third time, etc.

@ralfjunker

This comment has been minimized.

ralfjunker commented Feb 8, 2017

I am using the C implementation from this repository, in particular this check-in: 0749d9c.

It repeats mall until it reaches 10000 bytes.

Did you test @eustas' fix to this C implementation?

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 8, 2017

For testing the fix we have internally added 3 synthetic tests:

TEST(SynthTest, IntactDistanceRingBuffer0) {
  uint8_t compressed[] = {
    0x1b, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0xa1, 0x80,
    0x20, 0x00
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 11, 1
      command_inscopy_easy: 0, 7 // "himself" from dictionary
      bits: "000000" // distance = 4 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // copy "self"
      bits: "000000" // distance = 4 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "himselfself");
}

TEST(SynthTest, IntactDistanceRingBuffer1) {
  uint8_t compressed[] = {
    0x1b, 0x09, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x21, 0xa0,
    0x20, 0x00
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 10, 1
      command_inscopy_easy: 0, 6 // "scroll" from dictionary
      bits: "100000" // distance = 11 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // copy "roll"
      bits: "000000" // distance = 4 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "scrollroll");
}

TEST(SynthTest, IntactDistanceRingBuffer2) {
  uint8_t compressed[] = {
    0x1b, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x41, 0x80,
    0x20, 0x50, 0x10, 0x24, 0x08, 0x06
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 16, 1
      command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 4 - 1)
      bits: "000000" // distance = 4 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 11 - 5)
      bits: "100000" // distance = 11 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 15 - 9)
      bits: "010000" // distance = 15 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 16 - 13)
      bits: "110000" // distance = 16 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "leftdatadataleft");
}

The first parameter of Verify is DSL code snippet used to generate compressed.
The last parameter is an expected output string.

@eustas eustas reopened this Feb 8, 2017

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 8, 2017

Ooops, missed the case when distance code is baked into command.

@eustas

This comment has been minimized.

Contributor

eustas commented Feb 8, 2017

I hope, now it is finally fixed with #506 =)

@eustas eustas closed this Feb 8, 2017

danielrh added a commit to dropbox/rust-brotli-decompressor that referenced this issue Nov 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment