Panic on some real-world images at mcu.rs:354:48 #11

Shnatsel · 2022-07-02T19:09:04Z

The attached image triggers the following panic in zune-jpeg:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', zune-jpeg/src/mcu.rs:354:48

It happens on 88 images out of ~5500 images I have tested. It seems to be the only panic to occur on this dataset.

1130214.jpg

The code to reproduce it is the same as in #10

Tested on commit 1f92bac

The text was updated successfully, but these errors were encountered:

Shnatsel · 2022-07-02T20:27:25Z

Thanks for the quick fix!

etemesi254 · 2022-07-02T20:36:53Z

Welcome.

Also while at it, there are some other quality issues I have seen crop up like
#12

While running tests, could you be reading the files and writing them back and maybe opening some to see if there are any defects.

Currently I'm using such a function to see defects.

fn write(in_file:&str,out_file:&str) {
    std::panic::catch_unwind(|| {
        
        let mut d = Decoder::new();
       d.set_num_threads(1).unwrap();
        d.set_output_colorspace(ColorSpace::RGBX);
        
        let x = d.decode_file(in_file).unwrap();
        
         let mut comp =mozjpeg::Compress::new(mozjpeg::ColorSpace::JCS_EXT_RGBX);
    
        comp.set_size(d.width() as usize, d.height() as usize);
        comp.set_mem_dest();
        comp.start_compress();
        
        //replace with your image data
        let pixels = x;
        assert!(comp.write_scanlines(&pixels));

        comp.finish_compress();
        let jpeg_bytes = comp.data_to_vec().unwrap();
        let mut v = OpenOptions::new()
            .write(true)
            .create(true)
            .open(out_file)
            .unwrap();
        v.write_all(&jpeg_bytes).unwrap();

        // write to file, etc.
    })
    .unwrap();
}

which uses mozjpeg crate backed by libjpeg-turbo to write files.

Since you have a larger corpora, you might be able to identify such defects more quickly than I do.

But if it's not possible, it's fine, still appreciate the bug reports

Shnatsel · 2022-07-02T20:46:34Z

Ah, looking for decoding differences is a great idea! It's something I wanted to dabble in with jpeg-decoder as well, but never got around to it.

I think the simplest way to do that would be decoding the file with zune-jpeg and writing it back to disk in some lossless format, like PNG or BMP, and then running imagemagick difference analysis on the result compared to the input jpeg. Imagemagick would decode the original with libjpeg-turbo internally. I've used a similar setup to test resvg once and it worked well.

If you provide me with a snippet that decodes the input JPEG file and writes it to lightly compressed PNG or to BMP, I'll handle the rest.

Shnatsel · 2022-07-02T20:57:58Z

Also I don't think this is actually fixed, I am still seeing a lot of panics, just at mcu.rs:356:48 this time. Here's a larger sample of images triggering this so you can test it on more than one example: unwrap-panics.tar.gz

etemesi254 · 2022-07-02T21:00:56Z

One thing to note is that libjpeg-turbo needs to be bit-identical to libjpeg , hence whatever optimizations it does should uphold that.

jpeg was probably not written with multithreading in mind, even if I have it working here it's mainly magic and sacrifices.

Specifically

The Horizontal Vertical(HV) upsampler in this library and libjpeg-turbo will never produce the same data, HV type images are what are commonly found out there.
This is because libjpeg-turbo uses the whole downsampled image to do up-sampling while I use image chunks.
The color converter is different, with a bias of +1/-1 as opposed to libjpeg turbo.

This affects the output by +2/-2 of libjpeg-turbo with no way of reducing that without the library becoming single threaded

Shnatsel · 2022-07-02T21:06:29Z

Indeed, I am not looking for them to match up perfectly. This is never going to happen with lossy encoding formats anyway.

The way I've done this with resvg was by calculating the similarity score for all the images, then sorting them by similarity, and looking at the most diverging images. That worked great - actual error cases had very low similarity scores compared to the rest, it was easy to tell when to stop looking from the similarity scores alone.

etemesi254 · 2022-07-02T21:08:00Z

Also I don't think this is actually fixed, I am still seeing a lot of panics, just at mcu.rs:356:48 this time. Here's a larger sample of images triggering this so you can test it on more than one example: unwrap-panics.tar.gz

Looking into it.

Seems to be an issue with images with odd numbered dimensions.(usually require padding bytes)

Shnatsel · 2022-07-03T19:30:48Z

I have rigged a comparison of image backed by jpeg-decoder against imagemagick using the following code. First, I have a converter from JPEG to PNG using image:

use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    use image::io::Reader as ImageReader;
    let input = std::env::args().nth(1).unwrap();
    let output = std::env::args().nth(2).unwrap();
    let img = ImageReader::open(input)?.decode()?;
    img.save(output)?;
    Ok(())
}

I invoke it from a Linux shell script to convert from whatever the input format is to PNG (BMP would be faster but would lose transparency) and then run the imagmagick compare command against the original:

#!/bin/sh

set -e

input="$1"
output="$(mktemp --tmpdir result_XXXXXXXXXXXXX.png)"
trap "rm -f "$output"" EXIT

target/release/image-convert "$input" "$output" || echo "Failed to decode $input" 1>&2
similarity=$(compare -quiet -metric RMSE "$input[0]" "$output" /dev/null 2>&1) || true
echo "$similarity $input"

Then run it in parallel to speed things up and capture the output:

fd '\.jpe?g' | nice ionice parallel ./image-compare.sh > ~/similarities.txt 2>~/errors.txt

A simple sort -n will then show the most diverging images.

I have only run it on a subset of my corpus so far, but it seems to be holding up surprisingly well. I have tested it for decoding errors quite extensively before, but not for incorrect decoding, and I'm happy to see that it's not happening.

etemesi254 · 2022-07-03T19:39:38Z

While looking at jpeg-decoder code It sticks a lot to libjpeg-turbo, even the simd implementations were done by someone who has probably looked into libjpeg-turbo.(which is some good work according to me). So I'd expect this If it's decoding it would be decoding correctly Another thing you can try is different image configs Like using cjpeg tool to test for similarities I.e start with a BMP file, change sampling factors in the command line using `cjpeg -sample (value)`, (valid values are 1x1,2x2 and 4x2 , not so sure, on top of my head) and then decode those as you are and see the similarity score, should test up-sampling algorithms.

…

On Sun, 3 Jul 2022, 22:31 Shnatsel, ***@***.***> wrote: I have rigged a comparison of image backed by jpeg-decoder against imagemagick using the following code. First, I have a converter from JPEG to PNG using image: use std::error::Error; fn main() -> Result<(), Box<dyn Error>> { use image::io::Reader as ImageReader; let input = std::env::args().nth(1).unwrap(); let output = std::env::args().nth(2).unwrap(); let img = ImageReader::open(input)?.decode()?; img.save(output)?; Ok(()) } I invoke it from a Linux shell script to convert from whatever the input format is to PNG (BMP would be faster but would lose transparency) and then run the imagmagick compare command against the original: #!/bin/sh set -e input="$1" output="$(mktemp --tmpdir result_XXXXXXXXXXXXX.png)"trap "rm -f "$output"" EXIT target/release/image-convert "$input" "$output" || echo "Failed to decode $input" 1>&2 similarity=$(compare -quiet -metric RMSE "$input[0]" "$output" /dev/null 2>&1) || trueecho "$similarity $input" Then run it in parallel to speed things up and capture the output: fd '\.jpe?g' | nice ionice parallel ./image-compare.sh > ~/similarities.txt 2>~/errors.txt A simple sort -n will then show the most diverging images. I have only run it on a subset of my corpus so far, but it seems to be holding up surprisingly well. I have tested it for decoding errors quite extensively before, but not for incorrect decoding, and I'm happy to see that it's not happening. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZRVE3ZPFFHUPDSPZ44FRDVSHS7JANCNFSM52PPCPBQ> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

etemesi254 · 2022-07-03T19:41:40Z

This is a good idea. I should probably be doing the same and try to make it a CI script. The only issue is that I'm really tied up at the moment hence fixes and improvements happen over the weekend

…

On Sun, 3 Jul 2022, 22:39 Caleb Etemesi, ***@***.***> wrote: While looking at jpeg-decoder code It sticks a lot to libjpeg-turbo, even the simd implementations were done by someone who has probably looked into libjpeg-turbo.(which is some good work according to me). So I'd expect this If it's decoding it would be decoding correctly Another thing you can try is different image configs Like using cjpeg tool to test for similarities I.e start with a BMP file, change sampling factors in the command line using `cjpeg -sample (value)`, (valid values are 1x1,2x2 and 4x2 , not so sure, on top of my head) and then decode those as you are and see the similarity score, should test up-sampling algorithms. On Sun, 3 Jul 2022, 22:31 Shnatsel, ***@***.***> wrote: > I have rigged a comparison of image backed by jpeg-decoder against > imagemagick using the following code. First, I have a converter from JPEG > to PNG using image: > > use std::error::Error; > fn main() -> Result<(), Box<dyn Error>> { > use image::io::Reader as ImageReader; > let input = std::env::args().nth(1).unwrap(); > let output = std::env::args().nth(2).unwrap(); > let img = ImageReader::open(input)?.decode()?; > img.save(output)?; > Ok(()) > } > > I invoke it from a Linux shell script to convert from whatever the input > format is to PNG (BMP would be faster but would lose transparency) and then > run the imagmagick compare command against the original: > > #!/bin/sh > set -e > > input="$1" > output="$(mktemp --tmpdir result_XXXXXXXXXXXXX.png)"trap "rm -f "$output"" EXIT > > target/release/image-convert "$input" "$output" || echo "Failed to decode $input" 1>&2 > similarity=$(compare -quiet -metric RMSE "$input[0]" "$output" /dev/null 2>&1) || trueecho "$similarity $input" > > Then run it in parallel to speed things up and capture the output: > > fd '\.jpe?g' | nice ionice parallel ./image-compare.sh > ~/similarities.txt 2>~/errors.txt > > A simple sort -n will then show the most diverging images. > > I have only run it on a subset of my corpus so far, but it seems to be > holding up surprisingly well. I have tested it for decoding errors quite > extensively before, but not for incorrect decoding, and I'm happy to see > that it's not happening. > > — > Reply to this email directly, view it on GitHub > <#11 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFZRVE3ZPFFHUPDSPZ44FRDVSHS7JANCNFSM52PPCPBQ> > . > You are receiving this because you modified the open/close state.Message > ID: ***@***.***> >

etemesi254 · 2022-07-30T14:31:27Z

An update.

1.We multi-thread on image decoding, meaning we internally chunk the image(primitively)

The issue is that the chunk can be wrong, sometimes, especially when the image is small and has dimensions not divisible by 8(or 32 depending on the sampling factors).

The solution to this I have is currently

Just allocate bigger space.

The issue with this is that is becomes an overhead for everyone else, because allocating more means that we have a large memory footprint, which bothers everyone (and me especially).

So I'm still currently trying to think of a good solution to this

Shnatsel · 2022-07-30T14:33:45Z

How big the memory overhead will be in practice? Over-allocating up to an extra 32 pixels in each dimension doesn't sound too bad.

etemesi254 · 2022-07-30T14:57:50Z

It depends.

We want to over-allocate a whole row with a height of 8-32 pixels and a width equal to it's image.

Shnatsel · 2022-07-30T15:10:01Z

So in bytes that would be, at worst, 32 pixels * 32 bits per pixel * image width. Let's say the image is small, just 64x64 pixels, and we over-allocate it by 50%. That's ~42kB of the base image and an extra ~21kB we've over-allocated.

I don't think anyone's going to notice an extra 21kb in memory usage, and that's at resolutions where the over-allocation overhead is very large - 50%. For larger images it's going to be more like 5% - and even that occurs only on a handful of files!

Besides, if that extra memory is never written to, then the OS will not even provision the memory pages for it. So I'm really not convinced that this kind of overhead is a problem at all.

etemesi254 · 2022-07-30T15:28:22Z

I'm not sure if that's also the defined behavior in Windows / Mac OS.

Let's hope it is.

Shnatsel · 2022-07-30T18:21:34Z

This panic can still happen even after the fix, here's an image that triggers it: waterfront.jpg

Tested on commit e34e5bd (latest as of this writing)

Another fix for #11

etemesi254 · 2022-10-11T08:40:28Z

Out of those images sent as unwrap-panics.gz, howare there any failing in that archive?

…

On Sat, 30 Jul 2022, 21:21 Shnatsel, ***@***.***> wrote: This panic can still happen even after the fix, here's an image that triggers it: waterfront.jpg <https://user-images.githubusercontent.com/291257/181936561-504c768e-75a5-45d7-b5fa-75bec3ddb9ae.jpg> Tested on commit e34e5bd <e34e5bd> (latest as of this writing) — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZRVE5QHR3NCEOLK3R33XLVWVXDRANCNFSM52PPCPBQ> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

Shnatsel · 2022-10-14T17:15:14Z

As of commit cff242b, none of the files in unwrap-panics.gz cause a panic. Neither does the waterfront.jpg image attached later.

Shnatsel mentioned this issue Jul 2, 2022

Panics discovered by fuzzer #8

Open

etemesi254 closed this as completed in fb66278 Jul 2, 2022

etemesi254 reopened this Jul 2, 2022

etemesi254 closed this as completed in f669f2d Jul 30, 2022

etemesi254 added a commit that referenced this issue Jul 31, 2022

mcu: Increase size limits for extra space.

f40cef3

Another fix for #11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on some real-world images at mcu.rs:354:48 #11

Panic on some real-world images at mcu.rs:354:48 #11

Shnatsel commented Jul 2, 2022 •

edited

Shnatsel commented Jul 2, 2022

etemesi254 commented Jul 2, 2022

Shnatsel commented Jul 2, 2022

Shnatsel commented Jul 2, 2022 •

edited

etemesi254 commented Jul 2, 2022

Shnatsel commented Jul 2, 2022

etemesi254 commented Jul 2, 2022 •

edited

Shnatsel commented Jul 3, 2022

etemesi254 commented Jul 3, 2022 via email

etemesi254 commented Jul 3, 2022 via email

etemesi254 commented Jul 30, 2022 •

edited

Shnatsel commented Jul 30, 2022

etemesi254 commented Jul 30, 2022

Shnatsel commented Jul 30, 2022

etemesi254 commented Jul 30, 2022

Shnatsel commented Jul 30, 2022

etemesi254 commented Oct 11, 2022 via email

Shnatsel commented Oct 14, 2022

Panic on some real-world images at mcu.rs:354:48 #11

Panic on some real-world images at mcu.rs:354:48 #11

Comments

Shnatsel commented Jul 2, 2022 • edited

Shnatsel commented Jul 2, 2022

etemesi254 commented Jul 2, 2022

Shnatsel commented Jul 2, 2022

Shnatsel commented Jul 2, 2022 • edited

etemesi254 commented Jul 2, 2022

Shnatsel commented Jul 2, 2022

etemesi254 commented Jul 2, 2022 • edited

Shnatsel commented Jul 3, 2022

etemesi254 commented Jul 3, 2022 via email

etemesi254 commented Jul 3, 2022 via email

etemesi254 commented Jul 30, 2022 • edited

Shnatsel commented Jul 30, 2022

etemesi254 commented Jul 30, 2022

Shnatsel commented Jul 30, 2022

etemesi254 commented Jul 30, 2022

Shnatsel commented Jul 30, 2022

etemesi254 commented Oct 11, 2022 via email

Shnatsel commented Oct 14, 2022

Shnatsel commented Jul 2, 2022 •

edited

Shnatsel commented Jul 2, 2022 •

edited

etemesi254 commented Jul 2, 2022 •

edited

etemesi254 commented Jul 30, 2022 •

edited