Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict #586

Closed
malaterre opened this issue Mar 25, 2022 · 10 comments
Closed

Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict #586

malaterre opened this issue Mar 25, 2022 · 10 comments

Comments

@malaterre
Copy link

malaterre commented Mar 25, 2022

Have you searched the existing issues (both open and closed) in the libjpeg-turbo issue tracker to ensure that this bug report is not a duplicate?

yes

Does this bug report describe one of the two known and unsolvable issues with the JPEG format?

no

Clear and concise description of the bug:

jpeg_make_d_derived_tbl is too strict and check all codes within the Huffman table, even if not used by the decoder.

Steps to reproduce the bug (using only libjpeg-turbo):

Using the attached JPEG file (WITH_12BIT:BOOL=ON), here is what I see:

% ./djpeg -outfile libjpeg-turbo.pgm sample_12bits_huffman_17.jpg
Bogus Huffman table definition

Image(s) needed in order to reproduce the bug (if applicable):

sample_12bits_huffman_17

Expected behavior:

% ./djpeg -outfile libjpeg-turbo.pgm sample_12bits_huffman_17.jpg && echo "success"
success

Observed behavior:

% ./djpeg -outfile libjpeg-turbo.pgm sample_12bits_huffman_17.jpg
Bogus Huffman table definition

Platform(s) (compiler version, operating system version, CPU) on which the bug was observed:

Debian/bullseye amd64.

libjpeg-turbo release(s), commit(s), or branch(es) in which the bug was observed (always test the tip of the main branch or the latest stable pre-release to verify that the bug hasn't already been fixed):

% ./djpeg -version
libjpeg-turbo version 2.1.4 (build 20211129)

If the bug is a regression, the specific commit that introduced the regression (use git bisect to determine this):

Seems like commit is: 5ead57a

Additional information:

Using the following local patch, I can properly decode the image:

 % git diff
diff --git a/jdhuff.c b/jdhuff.c
index 679d2216..037fd184 100644
--- a/jdhuff.c
+++ b/jdhuff.c
@@ -252,7 +252,7 @@ jpeg_make_d_derived_tbl(j_decompress_ptr cinfo, boolean isDC, int tblno,
   if (isDC) {
     for (i = 0; i < numsymbols; i++) {
       int sym = htbl->huffval[i];
-      if (sym < 0 || sym > 15)
+      if (sym < 0 || sym > 17)
         ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
     }
   }

Using the above patch, I get a result compatible with another open-source implementation of ITU 81:

% jpeg sample_12bits_huffman_17.jpg jpeg.pgm
jpeg Copyright (C) 2012-2018 Thomas Richter, University of Stuttgart
and Accusoft

For license conditions, see README.license for details.


0 bytes memory not yet released.

1108286 bytes maximal required.

249 allocations performed.
@malaterre
Copy link
Author

For background information, here is the original thread that started it:

@dcommander
Copy link
Member

Yes, I have been discussing this with @thorfdbg in e-mail. This was my response:

The Huffman decoder tends to be the first line of defense against malformed JPEG images, some of which may trigger exploitable security issues farther down in the decoding pipeline. Thus, browsers and other security-conscious applications are very sensitive to whether and how libjpeg-turbo fails to decode such images. Historically, about 9 out of 10 reported security bugs in libjpeg-turbo have been related to the Huffman decoder, so I am hesitant to change it without a good reason.

The code in question was written by Tom Lane in 1998, and I have no insight into why he decided to be strict about the DC symbol range. (The change log simply says, "Huffman tables are checked for validity much more carefully than before.") The fact that his code comments say "to ensure safe decoding" makes me very cautious about changing or removing the code. At minimum, I would need to fully understand why that code was added in the first place. For example, it may be that the strictness is necessary in order to guard against buffer overruns due to the specific implementation. I assume it would be possible to craft an image that actually does use the unsupported symbols, so I need to understand how the Huffman decoder, as it is currently implemented, would respond to that. Any assistance you can provide is appreciated. After a cursory read of the code, I can see several places at which an out-of-range symbol might cause a problem.

Several things I need to understand before proceeding:

  • How are such images (images with Huffman tables containing symbols >= 16 but that do not actually use those symbols) generated? How likely would it be for a user to encounter such an image in the wild?
  • If the image actually did use a Huffman symbol >= 16, would it bork libjpeg-turbo? (Let's just say that I have my suspicions.)
  • Is there an officially specified range limit for these symbols, or did you just increase the range limit arbitrarily to 17 to make one specific image work? In other words, if I adopted your patch, what would prevent someone from crafting a JPEG image with a Huffman table containing a symbol value of 18, which would cause the same error?

The answers to these questions will determine whether it makes sense to modify libjpeg-turbo (for instance, to change the error to a warning) or to simply add a code comment indicating that libjpeg-turbo is stricter than the spec requires and that this is because of limitations in the libjpeg-turbo code. The latter is my strong preference at the moment.

@thorfdbg
Copy link

thorfdbg commented Mar 26, 2022 via email

@dcommander
Copy link
Member

That would require a check in the code, plus proper error handling.

Right, but doing that in a performant manner may prove problematic. The Huffman decoder has a fast path that avoids branches (if() statements) as much as possible.

Again, I need to fully understand the ramifications of the proposed changes from a practical, as opposed to theoretical, perspective before I can consider them. libjpeg-turbo has an indirect user base of billions of people, yet it only has funding for a bit more than a day of full-time labor per month. Thus, the more leg work you and others can do to answer the questions I listed above, the more likely I will be to consider a modification. Otherwise, the default course of action will be to add a code comment and move on.

@malaterre
Copy link
Author

Just for later reference. Here is what was done in one libjpeg-fork (DCMTK copy):

  if (isDC) {
    for (i = 0; i < numsymbols; i++) {
      int sym = htbl->huffval[i];
      if (sym < 0 || sym > 16)
#ifdef DCMTK_ENABLE_STRICT_HUFFMAN_TABLE_CHECK
    ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
#else
    TRACEMS1(cinfo, 1, JTRC_UNOPT_HUFF_TABLE, sym);
#endif
    }
  }

ref:

I cannot comment on the security aspect raised above. But my understanding is that this piece of code has been there since day one, and will really only impact a limited set of people (JPEG without optimized huffman table). I also believe this piece of code may be revisited when implementing #402 (if ever). 2cts

@dcommander
Copy link
Member

The likelihood that lossless JPEG will ever be implemented in libjpeg-turbo is small. It would require a larger amount of funded development than any that has ever been secured for this project. My own research indicates that lossless JPEG is a mixed bag. It can produce better compression ratios than webp and PNG for some types of images but not for others, and the performance is not exceptional.

The only relevant question at the moment is: how should libjpeg-turbo handle JPEG images with unused DC symbols >= 16 in the Huffman tables. The answer to that depends on the answers to the three questions I posed above. Until/unless I get answers to those three questions, there will be no movement on this issue.

@malaterre
Copy link
Author

@dcommander Forget my comment about lossless. This was simply a reference to the original issue where:

% jpeg -c -p lena.ppm lena.jpg

does produce a large huffman table with unused symbols, which eventually exercise the code section in jpeg_make_d_derived_tbl.

@dcommander
Copy link
Member

@malaterre Fair enough. That image would not successfully decode with libjpeg-turbo anyhow.

@dcommander
Copy link
Member

@malaterre Actually, an image created with the command line above does not exercise the code section in jpeg_make_d_derived_tbl(). It fails earlier than that with Unsupported JPEG process: SOF type 0xc3.

So let me be more clear: I need an example of an image created with a known JPEG encoder (as opposed to an artificially-crafted image) that would otherwise decompress normally with libjpeg-turbo were it not for the strict checking of the DC Huffman table.

@dcommander
Copy link
Member

I believe that we have given this issue adequate time to breathe. My opinion at the moment is that libjpeg had a good reason for disallowing DC symbols > 15. Until/unless someone can provide the information I requested above, there is insufficient basis for me to change that opinion. I will reopen this issue if further information emerges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants