Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash during multiple concurrent/parallel decoding #708

Closed
novomesk opened this issue Oct 8, 2021 · 9 comments · Fixed by #775
Closed

Crash during multiple concurrent/parallel decoding #708

novomesk opened this issue Oct 8, 2021 · 9 comments · Fixed by #775
Assignees
Labels
bug Something isn't working decoder Related to the libjxl decoder

Comments

@novomesk
Copy link
Contributor

novomesk commented Oct 8, 2021

Hello,

This crash occurs when a Qt application decode more JXL files at the same time via my qt-jpegxl-image-plugin

During the crash I see following message:

/var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/image_ops.h:48: JXL_DASSERT: rect_from.IsInside(from)

#0  jxl::Abort () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/base/status.cc:42
#1  0x00007fffe872f964 in jxl::CopyImageTo<float> () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/image_ops.h:48
#2  0x00007fffe8844075 in LoadBorders () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_cache.cc:110
#3  jxl::PassesDecoderState::FinalizeGroup () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_cache.cc:156
#4  0x00007fffe8863925 in jxl::N_AVX2::DecodeGroupImpl () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_group.cc:441
#5  0x00007fffe886458f in jxl::DecodeGroup () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_group.cc:754
#6  0x00007fffe8848f9d in jxl::FrameDecoder::ProcessACGroup () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_frame.cc:579
#7  0x00007fffe88490d5 in operator() () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/dec_frame.cc:746
#8  CallDataFunc () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/base/data_parallel.h:88
#9  0x00007fffe8a011af in jpegxl::ThreadParallelRunner::RunRange () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/threads/thread_parallel_runner_internal.cc:139
#10 0x00007fffe8a012ab in jpegxl::ThreadParallelRunner::ThreadFunc () at /var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/threads/thread_parallel_runner_internal.cc:169
#11 0x00007ffff72b7e30 in std::execute_native_thread_routine (__p=0x7fffe0005390) at /var/tmp/portage/sys-devel/gcc-10.3.0-r2/work/gcc-10.3.0/libstdc++-v3/src/c++11/thread.cc:80
#12 0x00007ffff7f80e3e in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff6fd32cf in clone () from /lib64/libc.so.6

Here is a simple console application I am able to reproduce crash easily:
sources.zip

How to compile and run:

qmake test_crash.pro
make

The application decodes bucuresti2.jxl file in two threads (0 - main thread, 1 - worker thread). Each thread have different instance of the plug-in and each plug-in create own ParallelRunner.
It may not crash during first iteration, but sooner or later it crashes. The output may run like this:

./test_crash
Iteration: 0
[0]     10916x9985      count:1
[1]     10916x9985      count:1
Iteration: 1
[0]     10916x9985      count:1
[1]     10916x9985      count:1
Iteration: 2
[1]     10916x9985      count:1
/var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/image_ops.h:48: JXL_DASSERT: rect_from.IsInside(from)
Neprípustná inštrukcia

Sometime it crashes immediately:

Iteration: 0
/var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/image_ops.h:48: JXL_DASSERT: rect_from.IsInside(from)
/var/tmp/portage/media-libs/libjxl-9999/work/libjxl-9999/lib/jxl/image_ops.h:48: JXL_DASSERT: rect_from.IsInside(from)
Neprípustná inštrukcia

When just one thread with plug-in is running at a time, there is no crash.

@novomesk
Copy link
Contributor Author

novomesk commented Oct 8, 2021

If I put following call for JXL_DEC_FULL_IMAGE event into critical section, I can avoid crash:

status = JxlDecoderProcessInput(m_decoder);

@novomesk
Copy link
Contributor Author

novomesk commented Oct 9, 2021

I am able to reproduce the crash also via gdk-pixbuf loader which use ResizeableParallelRunner.

It is enough to call following procedure in concurrently in multiple threads few times:

#include <gdk-pixbuf/gdk-pixbuf.h>

bool test_plugin_gdk(int id)
{
    GdkPixbuf *pixbuf = gdk_pixbuf_new_from_file("bucuresti2.jxl", NULL);
    if (!pixbuf) {
        printf("Error reading picture via GDK plugin\n");
        return false;
    }
    int x = gdk_pixbuf_get_width(pixbuf);
    int y = gdk_pixbuf_get_height(pixbuf);
    printf("GDK [%d]\t%dx%d\n", id, x, y);
    gdk_pixbuf_unref(pixbuf);
    return true;
}

@saschanaz
Copy link

saschanaz commented Oct 11, 2021

djxl bucuresti2.jxl also fails, btw.

@jonsneyers jonsneyers added bug Something isn't working decoder Related to the libjxl decoder labels Oct 13, 2021
@deymo deymo self-assigned this Oct 13, 2021
@deymo
Copy link
Contributor

deymo commented Oct 18, 2021

@novomesk what's the source image for that .jxl file? Does it have a vertical green rectangle on the top right corner of the image by any chance? I'm not sure if that rectangle was on the source image or that's also a (maybe different) bug.

I wasn't able to reproduce it directly with djxl but it has to do with the order in which the groups are processed so the more threads/CPUs you have the more likely it is. I was able to reliably reproduce it with an random-order single threaded parallel runner.

@novomesk
Copy link
Contributor Author

deymo added a commit to deymo/libjxl that referenced this issue Oct 26, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes libjxl#708.
deymo added a commit to deymo/libjxl that referenced this issue Oct 26, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes libjxl#708.
deymo added a commit to deymo/libjxl that referenced this issue Oct 26, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes libjxl#708.
deymo added a commit to deymo/libjxl that referenced this issue Oct 26, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes libjxl#708.
deymo added a commit that referenced this issue Oct 26, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes #708.
deymo added a commit to deymo/libjxl that referenced this issue Oct 27, 2021
When processing groups out of order with an image where the last group
size  is smaller than the needed border it was possible to attempt to
load the right or bottom border from the next group when already
processing the rightmost or bottom group respectively. This situation
was causing an out-of-bounds copy on saved Image3F buffer in release
mode (or hitting a JXL_DASSERT in debug mode).

The order in which the groups are processed depends on many factors,
including the order in which the threads are scheduled when using
multiple threads, and potentially the order of the groups in the file
(not checked).

Added a test to force the out-of-order situation in a simulated parallel
runner that forces a random order on the tasks. The new test triggers
the assert in debug mode, which is now fixed.

Fixes libjxl#708.
(cherry picked from commit e649705)
(cherry picked from PR libjxl#775)
@deymo
Copy link
Contributor

deymo commented Oct 27, 2021

Thanks @novomesk for the link. The green rectangle is in the source image too, so that's good.

@deymo
Copy link
Contributor

deymo commented Oct 29, 2021

Note: this bug got assigned CVE-2021-22564

@deymo
Copy link
Contributor

deymo commented Nov 1, 2021

@novomesk Please let me know if you would like to be credited in the CVE description and how (name, company affiliation, etc).

@novomesk
Copy link
Contributor Author

novomesk commented Nov 1, 2021

Thanks, no need to give credit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working decoder Related to the libjxl decoder
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants