-
-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bogus Huffman table definition: jpeg_make_d_derived_tbl too strict #586
Comments
For background information, here is the original thread that started it: |
Yes, I have been discussing this with @thorfdbg in e-mail. This was my response: The Huffman decoder tends to be the first line of defense against malformed JPEG images, some of which may trigger exploitable security issues farther down in the decoding pipeline. Thus, browsers and other security-conscious applications are very sensitive to whether and how libjpeg-turbo fails to decode such images. Historically, about 9 out of 10 reported security bugs in libjpeg-turbo have been related to the Huffman decoder, so I am hesitant to change it without a good reason. The code in question was written by Tom Lane in 1998, and I have no insight into why he decided to be strict about the DC symbol range. (The change log simply says, "Huffman tables are checked for validity much more carefully than before.") The fact that his code comments say "to ensure safe decoding" makes me very cautious about changing or removing the code. At minimum, I would need to fully understand why that code was added in the first place. For example, it may be that the strictness is necessary in order to guard against buffer overruns due to the specific implementation. I assume it would be possible to craft an image that actually does use the unsupported symbols, so I need to understand how the Huffman decoder, as it is currently implemented, would respond to that. Any assistance you can provide is appreciated. After a cursory read of the code, I can see several places at which an out-of-range symbol might cause a problem. Several things I need to understand before proceeding:
The answers to these questions will determine whether it makes sense to modify libjpeg-turbo (for instance, to change the error to a warning) or to simply add a code comment indicating that libjpeg-turbo is stricter than the spec requires and that this is because of limitations in the libjpeg-turbo code. The latter is my strong preference at the moment. |
Darrell, others,
Several things I need to understand before proceeding:
* How are such images (images with Huffman tables containing symbols
>= 16 but that do not actually use those symbols) generated? How
likely would it be for a user to encounter such an image in the wild?
One can only guess. Those symbols are not needed for correct decoding of
progressive or sequential codestreams, but they may be needed for other
processes. Potentially, a particular application uses a common
(process-agnostic) default table and applies is irrespectively of the
particular coding process used in the end.
All I can say is that there is no normative requirement that only the
minimum number of symbols are populated, or only the first 16 symbols
are. It's an implementation decision. The ISO implementation uses
default tables depending on the process. If there are, for some image or
some configuration, tables generated that populate more than the first
16 entries, please let me know because I would not know how this has
happened (code for that in coding/huffmantemplate.cpp, lines 142 and
following, val_dc_luminance/chrominance lists the symbols).
* If the image actually did use a Huffman symbol >= 16, would it bork
libjpeg-turbo? (Let's just say that I have my suspicions.)
That would require a check in the code, plus proper error handling. Of
course, I'm not in the depth of your (or Tom's) code for that matter. In
the ISO code, this check is done in coding/huffmantemplate.cpp, line
853. However, this code is much younger, and of course, there may be
errors, despite my attempt to avoid them. It hasn't undergone as
careful (and years long) testing as libjpeg-turbo.
* Is there an officially specified range limit for these symbols, or
did you just increase the range limit arbitrarily to 17 to make one
specific image work? In other words, if I adopted your patch, what
would prevent someone from crafting a JPEG image with a Huffman
table containing a symbol value of 18, which would cause the same error?
The official limit is that the table can carry 256 symbols. That would
of course not prevent any source from populating a symbol #18 or a
symbol #255 - this does not establish an error by itself. It would be an
error as soon as such a symbol is received during the actual decoding
process. Again, I can only speak about the ISO implementation where this
particular check is made in codestream/sequentialscan.cpp, line 682.
(This is both for the sequential process as well as the initial scan of
a progressive scan).
Rather than the actual symbol, it would be worth a check how
libjpeg-turbo handles the case of an incoming bit sequence that does not
establish a valid Huffman code, a check that needs to be done anyhow. If
it handles this case gracefully (aborting etc) then one could simply
invalidate the affected symbols after reading the table. It would then
only trigger a check if a particular codestream attempts to decode the
corresponding bit sequences.
Let me just stress that the purposes of the two code bases is rather
different. Darrell's code is a high-speed, production quality
implementation suitable as a black-box for software that requires access
to JPEG encoded images. The ISO code is a reference to the standard and
thus demonstrates how the standard should be read, it aids as a
demonstration of the standard, and not so much as an actual
implementation of it one should use in other software. As such, it
checks for normative requirements of the codestreams, but it may be not
as robust as a year-long well-proven implementation such as libjpeg-turbo.
Greetings,
Thomas
|
Right, but doing that in a performant manner may prove problematic. The Huffman decoder has a fast path that avoids branches ( Again, I need to fully understand the ramifications of the proposed changes from a practical, as opposed to theoretical, perspective before I can consider them. libjpeg-turbo has an indirect user base of billions of people, yet it only has funding for a bit more than a day of full-time labor per month. Thus, the more leg work you and others can do to answer the questions I listed above, the more likely I will be to consider a modification. Otherwise, the default course of action will be to add a code comment and move on. |
Just for later reference. Here is what was done in one libjpeg-fork (DCMTK copy):
ref: I cannot comment on the security aspect raised above. But my understanding is that this piece of code has been there since day one, and will really only impact a limited set of people (JPEG without optimized huffman table). I also believe this piece of code may be revisited when implementing #402 (if ever). 2cts |
The likelihood that lossless JPEG will ever be implemented in libjpeg-turbo is small. It would require a larger amount of funded development than any that has ever been secured for this project. My own research indicates that lossless JPEG is a mixed bag. It can produce better compression ratios than webp and PNG for some types of images but not for others, and the performance is not exceptional. The only relevant question at the moment is: how should libjpeg-turbo handle JPEG images with unused DC symbols >= 16 in the Huffman tables. The answer to that depends on the answers to the three questions I posed above. Until/unless I get answers to those three questions, there will be no movement on this issue. |
@dcommander Forget my comment about lossless. This was simply a reference to the original issue where:
does produce a large huffman table with unused symbols, which eventually exercise the code section in |
@malaterre Fair enough. That image would not successfully decode with libjpeg-turbo anyhow. |
@malaterre Actually, an image created with the command line above does not exercise the code section in So let me be more clear: I need an example of an image created with a known JPEG encoder (as opposed to an artificially-crafted image) that would otherwise decompress normally with libjpeg-turbo were it not for the strict checking of the DC Huffman table. |
I believe that we have given this issue adequate time to breathe. My opinion at the moment is that libjpeg had a good reason for disallowing DC symbols > 15. Until/unless someone can provide the information I requested above, there is insufficient basis for me to change that opinion. I will reopen this issue if further information emerges. |
Have you searched the existing issues (both open and closed) in the libjpeg-turbo issue tracker to ensure that this bug report is not a duplicate?
yes
Does this bug report describe one of the two known and unsolvable issues with the JPEG format?
no
Clear and concise description of the bug:
jpeg_make_d_derived_tbl
is too strict and check all codes within the Huffman table, even if not used by the decoder.Steps to reproduce the bug (using only libjpeg-turbo):
Using the attached JPEG file (WITH_12BIT:BOOL=ON), here is what I see:
Image(s) needed in order to reproduce the bug (if applicable):
Expected behavior:
Observed behavior:
Platform(s) (compiler version, operating system version, CPU) on which the bug was observed:
Debian/bullseye amd64.
libjpeg-turbo release(s), commit(s), or branch(es) in which the bug was observed (always test the tip of the main branch or the latest stable pre-release to verify that the bug hasn't already been fixed):
If the bug is a regression, the specific commit that introduced the regression (use
git bisect
to determine this):Seems like commit is: 5ead57a
Additional information:
Using the following local patch, I can properly decode the image:
Using the above patch, I get a result compatible with another open-source implementation of ITU 81:
The text was updated successfully, but these errors were encountered: