Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Panics discovered by fuzzer #8

Open
Shnatsel opened this issue May 26, 2022 · 28 comments
Open

Panics discovered by fuzzer #8

Shnatsel opened this issue May 26, 2022 · 28 comments

Comments

@Shnatsel
Copy link
Contributor

I have decoded the AFL-generated set of exotic JPEGs when fuzzing libjpeg-turbo, as well as the files generated by AFL when targeting the jpeg-decoder crate. Decoding these files causes multiple panics in zune-jpeg.

The set of files triggering the panics is attached: jpeg_fuzzing_seeds.tar.gz

Aside of being interesting test cases, these files can also be used to kickstart fuzzing - simply put them into fuzz/corpus/decode_buffer and run the fuzzer as usual. This should provide much better coverage than starting from scratch.

@etemesi254
Copy link
Owner

Just an update.

Most panics have been fixed in version 0.1.3, but I would like to keep this issue open for future fuzz errors/panics

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Jun 11, 2022

I'm seeing panics on the fuzzer still, for example panicked at 'attempt to subtract with overflow', zune-jpeg/src/headers.rs:138:17

That may not show up in release mode, but fuzzing deliberately enables debug assertions, which includes integer overflow/underflow checks.

Here's the file that triggers the issue: crash-497343daf2b55cc4eeb1c6d66eec23d6508b2914.jpeg

@etemesi254
Copy link
Owner

Yeah that cropped up due to a bug in that fixed a previous issue.

It's fixed now (master branch) and fuzzing should now take a while before it crashes.

Thanks for the heads up.

@Shnatsel
Copy link
Contributor Author

Here's a new one: thread '<unnamed>' panicked at 'called Option::unwrap()on aNone value', /home/shnatsel/Code/zune-jpeg/src/mcu.rs:326:48

The file to trigger it: yet-another-panic.jpeg

@etemesi254
Copy link
Owner

Could you be kind enough to share the corpus you are testing with?

It seems I do not have that with me.

Also fix coming in a few

@Shnatsel
Copy link
Contributor Author

The corpus is just the jpeg_fuzzing_seeds.tar.gz I shared in the first message of the thread, placed in the fuzz/corpus/decode_buffer directory so that the fuzzer can use it as a starting point.

That's how fuzzers were originally meant to be used, in fact! The notion of synthesizing JPEGs out of thin air, without giving the fuzzer a broad sample of JPEGs with various modes and encodings, is more of a happy accident and is not how the tool was originally meant to be used.

@etemesi254
Copy link
Owner

Sorry my bad, I thought I was already fuzzing on this corpus.

I'll ping this issue once all test cases work, for a new release.

@Shnatsel
Copy link
Contributor Author

By the way, you can run the fuzzer in parallel with the -j flag - this is more relevant now that the fuzzer only uses a single thread.

I use cargo +nightly fuzz run -j4 decode_buffer, maybe prepended with nice to give it low CPU priority and have it run in the background.

@etemesi254
Copy link
Owner

etemesi254 commented Jun 14, 2022

Is there a way we can handle errors without using Strings?

Specifically I was benchmarking this piece of code

pub enum Errors
{
    Err1(&'static str),
    Err2(String),
}

// Type your code here, or load an example.
pub fn err1() -> Result<(), Errors>
{
    Err(Errors::Err2(String::from("Life32")))
}

// Type your code here, or load an example.
pub fn err2() -> Result<(), Errors>
{
    Err(Errors::Err1("Life"))
}

that arose with me trying to remove some unsafe code, causing a bad perf regression and I wondered what was the cost of the string.

Criterion reports 10x difference

errors/err String       time:   [10.381 ns 10.536 ns 10.744 ns]     (300 M)                          
errors/err str          time:   [1.7569 ns 1.7573 ns 1.7581 ns]    (2.8 B)

And this probably slows us a teensy bit down in areas we don't see.

The assembly is https://godbolt.org/z/shcP1zaG3

@Shnatsel
Copy link
Contributor Author

You can handle errors without using the String type and the associated allocation, but you will not be able to include arbitrary text determined at runtime. You'll only be able to use hardcoded strings.

10 nanoseconds sounds about right for the cost of an allocation, but it is a tiny amount. Unless you're returning a String in a hot loop, this will never matter in practice. I recall eliminating thousands of allocations per second in jpeg-decoder and seeing near-zero performance difference, so I would not worry about that.

I can help profile before and after the change if you need help evaluating it.

@etemesi254
Copy link
Owner

etemesi254 commented Jun 14, 2022

The unsafe code i was trying to remove was

zune-jpeg/src/mcu.rs

Lines 248 to 259 in 050a232

let dc_table = unsafe {
self.dc_huffman_tables
.get_unchecked(component.dc_huff_table)
.as_ref()
.unwrap_or_else(|| std::hint::unreachable_unchecked())
};
let ac_table = unsafe {
self.ac_huffman_tables
.get_unchecked(component.ac_huff_table)
.as_ref()
.unwrap_or_else(|| std::hint::unreachable_unchecked())
};

to be https://gist.github.com/etemesi254/e2cc65afbdf49b9b402506a9c9ff6802#file-mcu-rs-L248-L281

It's the only unsafe code left that isn't platform specific intrinsics which can be disabled by turning off the x86 feature, so that means we can do a cfg[not(feature="x86)]+ #[forbid(unsafe_code)] but its a 3-7% regression on the benchmark.

The invariant unsafe code had to uphold were checked earlier, (check_tables() method), so it was kind of like manual loop hoisting hence why it was like that

@Shnatsel
Copy link
Contributor Author

Was it 3-7% regression in end-to-end decoding performance, or should I run some specific benchmark to reproduce it?

I can think of a few things to try - outlining the error reporting into functions annotated with #[cold], for example.

But before I do, I'd like to understand how frequently this code is executed. 3-7% is low enough that it could be just spurious due to changes in the code layout, and not any genuine change to the performance of the code. Also, do all of these lookups really have to happen on every iteration of the loop, or can we perhaps cache them somehow?

@etemesi254
Copy link
Owner

etemesi254 commented Jun 15, 2022

How many checks per image

A typical 1920 * 1080 image

First we multiply the image by its dimensions = 2073600

Then we multiply the dimensions by the number of components , a typical image has 3(Y, Cb,Cr that get converted to R,G,B) = 6220800

JPEG divides an image into blocks of 8 by 8 pixels, called MCUs(Minimum Coded Units), the total dimension of an MCU is 64, so we can divide the pixels by 64= 97200

And for every MCU we do two lookups (DC and AC), so we again multiply by 2 = 194,400

Why it must happen on every iteration of the loop

for y in 0..mcu_height{
      for x in 0..mcu_width{
          for c in 0..number_components{
              take table for component
          }
      }
}

The order of the loop cannot be changed as the spec mandates it to be like that

Why I'm sure it caused the regression

Changing this

 self
     .dc_huffman_tables
      .get(component.dc_huff_table)
      .ok_or_else(|| {
          DecodeErrors::HuffmanDecode(format!(
              "No Huffman DC table for component {:?} ",
              component.component_id
          ))
      })?
      .as_ref()
      .ok_or_else(|| {
          DecodeErrors::HuffmanDecode(format!(
              "No DC table for component {:?}",
              component.component_id
          ))
      })?;

To this

self.dc_huffman_tables[component.dc_huff_table & 3]
      .as_ref()
      .ok_or_else(|| {
          DecodeErrors::HuffmanDecode(format!(
              "No DC table for component {:?}",
              component.component_id
          ))
      })?;

Caused 3% perf jump ( the & 3 eliminates bounds checking as it can be verified by the compiler)

@etemesi254
Copy link
Owner

Update, went ahead and did it 7496ae8 and 732a61b.

Now the only unsafe is in sse/avx2 routines for post processing,

@Shnatsel
Copy link
Contributor Author

I've run the fuzzer for 5 minutes with 4 threads on v0.1.4 and found a panic:

thread '<unnamed>' panicked at 'Slice to small cannot write', /home/shnatsel/Code/zune-jpeg/src/color_convert/avx.rs:88:10

The file triggering this is attached:
crash-0abdc9186d3f455eebddb86dcc318238d20dba73

@etemesi254
Copy link
Owner

Acknowledged

@etemesi254
Copy link
Owner

I've run the fuzzer for 5 minutes with 4 threads on v0.1.4 and found a panic:

thread '<unnamed>' panicked at 'Slice to small cannot write', /home/shnatsel/Code/zune-jpeg/src/color_convert/avx.rs:88:10

The file triggering this is attached: crash-0abdc9186d3f455eebddb86dcc318238d20dba73

This is fixed(some new commits landed) but it presents an interesting case of the spec vs the reference implementation.

So the image above isn't valid according to the jpeg spec, because of the following phrase in the spec,

F.2.2.5: NextBit Procedure which says

The only valid marker which may occur within the Huffman coded data is the RSTm marker.

This image (and a lot of fuzz generated images) have this property and the solution should be to report to the user that you have a corrupt image, but libjpeg-turbo tries to parse them as headers, for some reason I really don't understand.

So to ensure compatibility, I decided to do as they do.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Jul 2, 2022

The following file causes a panic superficially similar to #11:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', zune-jpeg/src/mcu_prog.rs:277:90

crash-faff0f2581d3b225f3552f416b546bf7bcbdbbe8

Tested on commit 1f92bac

@etemesi254
Copy link
Owner

I believe that is related to #11

@Shnatsel Shnatsel changed the title Panics on decoding files generated by fuzzers for other decoders Panics discovered by fuzzer Jul 31, 2022
@Shnatsel
Copy link
Contributor Author

Shnatsel commented Jul 31, 2022

The input in the previous message by me still causes a panic:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-jpeg/src/mcu_prog.rs:305:90

@etemesi254
Copy link
Owner

Do other decoders complain about this?

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Jul 31, 2022

djpeg does complain:

Corrupt JPEG data: 1 extraneous bytes before marker 0xdb
Unsupported marker type 0x18

jpeg-decoder also doesn't decode it, but it returns an error instead of panicking:

Format error decoding Jpeg: invalid JPEG format: quantization table contains element with a zero value

@Shnatsel
Copy link
Contributor Author

And another one causing panic at a slightly different place in mcu_prog.rs: crash-b96a4b3c805261b156103da43bdab1a4eb7ed527

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-jpeg/src/mcu_prog.rs:299:26

This one is decoded by djpeg, although being fuzzer-generated it is obviously corrupt.

etemesi254 added a commit that referenced this issue Aug 7, 2022
etemesi254 added a commit that referenced this issue Aug 7, 2022
@etemesi254
Copy link
Owner

And another one causing panic at a slightly different place in mcu_prog.rs: crash-b96a4b3c805261b156103da43bdab1a4eb7ed527

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-jpeg/src/mcu_prog.rs:299:26

This one is decoded by djpeg, although being fuzzer-generated it is obviously corrupt.

Fixed in latest version.

Also pushed to crates.io (v0.1.5)

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Aug 7, 2022

Here's another panic: ya-zune-jpeg-mcu-panic.jpg

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-jpeg/src/mcu_prog.rs:304:48

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Aug 7, 2022

And here's my current corpus, so you could run the fuzzer yourself: jpeg_fuzzing_seeds_2.tar.gz

Place the contents in zune-jpeg/fuzz/corpus/decode_buffer/ folder and run cargo +nightly fuzz run decode_buffer

etemesi254 added a commit that referenced this issue Aug 7, 2022
@etemesi254
Copy link
Owner

Here's another panic: ya-zune-jpeg-mcu-panic.jpg

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /home/shnatsel/Code/zune-jpeg/src/mcu_prog.rs:304:48

Fixed

@etemesi254
Copy link
Owner

And here's my current corpus, so you could run the fuzzer yourself: jpeg_fuzzing_seeds_2.tar.gz

Place the contents in zune-jpeg/fuzz/corpus/decode_buffer/ folder and run cargo +nightly fuzz run decode_buffer

Currently can't fuzz as I'm debugging ffmpeg, those two aren't compatible, so fixing errors arising from this may take some time.

etemesi254 added a commit that referenced this issue Aug 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants