Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for overflow of next_code when adding burst_size #30

Merged

Conversation

shutton
Copy link
Contributor

@shutton shutton commented Apr 19, 2022

This addresses a problem in the GIF parser detected during fuzzing of a crate that utilizes image-rs. The issue can be reproduced using this short test program and the attached file (too large to include in the source -- unzip first).

const BAD_GIF: &[u8] = include_bytes!(concat!(
    env!("CARGO_MANIFEST_DIR"),
    "/5220731288420352.gif"
));

fn main() {
    if let Err(e) = image::load_from_memory(BAD_GIF) {
        eprintln!("failed to load: {e}");
    }
}

5220731288420352.gif.zip

After the fix, loading the malformed image produces a usable Err result:

Format error decoding Gif: unexpected EOF

break;
}
} else {
// next_code overflowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. That's an odd bug. This overflow would imply there's a code to be assigned larger than max_code. In particular we will assign one later on. Since codes are at most 12bit and not 16, reaching the overflow should be impossible without incorrectly assumed codes before that point.

Copy link
Contributor Author

@shutton shutton Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I was seeing was that this code was being called in a loop, and next_code just eventually stepped its way past 64K (it started at 18 using the provided repro).

Copy link
Contributor Author

@shutton shutton Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/.../weezl/src/decode.rs:765] &self.next_code = 18
[/.../weezl/src/decode.rs:765] &burst_size = 0
...
[/,,,/weezl/src/decode.rs:765] &self.next_code = 65535
[/.../weezl/src/decode.rs:765] &burst_size = 0
[/.../weezl/src/decode.rs:765] &self.next_code = 65535
[/.../weezl/src/decode.rs:765] &burst_size = 1
[/.../weezl/src/decode.rs:765] &self.next_code = 65535
[/.../weezl/src/decode.rs:765] &burst_size = 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-optimized code path the increment itself is gated behind !self.table.is_full() https://github.com/image-rs/lzw/blob/d55c610f1bff4ef0ed2f72af2c823b4a244d51a3/src/decode.rs#L893-L904 aka self.inner.len() < MAX_ENTRIES where MAX_ENTRIES = 1 << 12.

I'm decently sure that if we add a debug assertion to Table::derive, which asserts this invariant, then we happen across a much sooner violation.

@HeroicKatora
Copy link
Member

I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:

Patch file
From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001
From: Andreas Molzer <andreas.molzer@gmx.de>
Date: Tue, 19 Apr 2022 21:29:13 +0200
Subject: [PATCH] Add debug assertions on internal invariants

---
 src/decode.rs | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/decode.rs b/src/decode.rs
index 283e31f..49f3bfd 100644
--- a/src/decode.rs
+++ b/src/decode.rs
@@ -711,7 +711,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
             Some(tup) => {
                 status = Ok(LzwStatus::Ok);
                 code_link = Some(tup)
-            },
+            }
         };
 
         // Track an empty `burst` (see below) means we made no progress.
@@ -827,6 +827,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 // the case of requiring an allocation (which can't occur in practice).
                 let new_link = self.table.derive(&link, cha, code);
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
                 code = burst;
                 link = new_link;
             }
@@ -918,6 +919,8 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 }
 
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
+
                 new_link = link;
             } else {
                 // It's actually quite likely that the next code will be a reset but just in case.
@@ -1203,6 +1206,13 @@ impl Table {
     }
 
     fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link {
+        debug_assert!(
+            self.inner.len() < MAX_ENTRIES,
+            "Invalid code would be created {:?} {} {:?}",
+            from.prev,
+            byte,
+            prev
+        );
         let link = from.derive(byte, prev);
         let depth = self.depths[usize::from(prev)] + 1;
         self.inner.push(link.clone());
-- 
2.35.1

The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.

But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:

if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {

I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberation: this fixes the immediate issue. All codes that are incorrectly created can not and are not actually used during decoding. While it's not optimal that this waste is being created during decoding, it's also not critically buggy.

So, I'll merge the fix and treat the underlying cause as a separate issue (opening another issue for it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants