Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible undefined behavior in jcdctmgr.c's quantize() function #13

Closed
DimitryAndric opened this issue Aug 29, 2015 · 3 comments
Closed

Comments

@DimitryAndric
Copy link

I'm a FreeBSD developer working on maintaining our copy of clang and llvm in the FreeBSD base system. While doing a test build run of all ports, the following libjpeg-turbo test failure was reported to me:

http://package22.nyi.freebsd.org/data/headamd64PR201377-default/2015-08-20_16h45m13s/logs/errors/libjpeg-turbo-1.4.0.log

This failure occurs on FreeBSD/amd64, not on FreeBSD/i386:

./cjpeg -sample 2x2 -quality 100 -dct fast -prog -outfile testout_420_q100_ifast_prog.jpg ./testimages/testorig.ppm
md5/md5cmp 990cbe0329c882420a2094da7e5adade testout_420_q100_ifast_prog.jpg
testout_420_q100_ifast_prog.jpg: FAILED.  Checksum is debb59d009a8010d7d925bf60ffb7e49

Searching the internet for similar test failures, I found this particular ticket, from February 2015:

http://sourceforge.net/p/libjpeg-turbo/bugs/85/

It might seem partially related, since it only occurs when compiling with gcc 4.9.1, and specifying -march=haswell. However, the original reporter also mentions it occurred on a 32-bit system.

In any case, after some analysis (using the Undefined Behavior Sanitizer, among others) it turns out that the quantize() function in jcdctmgr.c uses undefined behavior in some cases.

In particular, it sometimes tries to right-shift a 32-bit unsigned integer by 65551 bits, which is obviously illegal. For example, from a test run with UBSan enabled:

./cjpeg -sample 2x2 -quality 100 -dct fast -prog -outfile testout_420_q100_ifast_prog.jpg ./testimages/testorig.ppm
jcdctmgr.c:415:15: runtime error: shift exponent 65551 is too large for 32-bit type 'UDCTELEM2' (aka 'unsigned int')
jcdctmgr.c:410:15: runtime error: shift exponent 65551 is too large for 32-bit type 'UDCTELEM2' (aka 'unsigned int')

Initially the problem is triggered by the assignment on line 405, because the particular 'divisors' element chosen is -1 (!) at that point:

389 METHODDEF(void)
390 quantize (JCOEFPTR coef_block, DCTELEM * divisors, DCTELEM * workspace)
391 {
...
398   UDCTELEM recip, corr, shift;
...
401   for (i = 0; i < DCTSIZE2; i++) {
...
405     shift = divisors[i + DCTSIZE2 * 3];
...
410       product >>= shift + sizeof(DCTELEM)*8;
...
415       product >>= shift + sizeof(DCTELEM)*8;

Since shift is an unsigned short (in case of this build), it gets assigned 65535. Then, on lines 410 and 415, this causes the right shift of product by 65535 + 2 * 8 = 65551 bits.

Of course, the next question is: how does one of the shift elements of divisors[] become -1? Then we end up in compute_reciprocal(), which does not seem to handle a divisor argument of 1 very well:

171 LOCAL(int)
172 compute_reciprocal (UINT16 divisor, DCTELEM * dtbl)
173 {
174   UDCTELEM2 fq, fr;
175   UDCTELEM c;
176   int b, r;
177
178   b = flss(divisor) - 1;
... if 'divisor' is 1, 'b' becomes 0 ...
179   r  = sizeof(DCTELEM) * 8 + b;
... r becomes 16 ...
180
181   fq = ((UDCTELEM2)1 << r) / divisor;
182   fr = ((UDCTELEM2)1 << r) % divisor;
183
184   c = divisor / 2; /* for rounding */
185
186   if (fr == 0) { /* divisor is power of two */
187     /* fq will be one bit too large to fit in DCTELEM, so adjust */
188     fq >>= 1;
189     r--;
... since 'fr' will be 0, here 'r' becomes 15 ...
190   } else if (fr <= (divisor / 2U)) { /* fractional part is < 0.5 */
191     c++;
192   } else { /* fractional part is > 0.5 */
193     fq++;
194   }
195
196   dtbl[DCTSIZE2 * 0] = (DCTELEM) fq;      /* reciprocal */
197   dtbl[DCTSIZE2 * 1] = (DCTELEM) c;       /* correction + roundfactor */
198   dtbl[DCTSIZE2 * 2] = (DCTELEM) (1 << (sizeof(DCTELEM)*8*2 - r));  /* scale */
199   dtbl[DCTSIZE2 * 3] = (DCTELEM) r - sizeof(DCTELEM)*8; /* shift */
... and finally, this 'dtbl' element becomes -1.

The call to compute_reciprocal() with a divisor argument of 1 is done from start_pass_dctmgr(), in the JDCT_IFAST case:

217 METHODDEF(void)
218 start_pass_fdctmgr (j_compress_ptr cinfo)
219 {
...
259 #ifdef DCT_IFAST_SUPPORTED
260     case JDCT_IFAST:
261       {
...
287         dtbl = fdct->divisors[qtblno];
288         for (i = 0; i < DCTSIZE2; i++) {
289 #if BITS_IN_JSAMPLE == 8
290           if(!compute_reciprocal(
291             DESCALE(MULTIPLY16V16((INT32) qtbl->quantval[i],
292                                   (INT32) aanscales[i]),
293                     CONST_BITS-3), &dtbl[i])

At this point, qtbl->quantval[i] is 1, which appears to be legal, and aanscales[i] is 2446, resulting in a divisor of 1.

So it looks to me like a divisor of 1 is legal, but how should this be handled properly in compute_reciprocal()? The resulting "real" reciprocal should be 1, obviously, but in terms of the function's variables, they should likely be:

b = 0
r = (word size)    [16, in our case]
f = 2^(word size)  [65536, in our case]

Unfortunately, with DCTELEM being a 16-bit short, the resulting f value does not fit! So how to solve this conundrum? I am unsure at this point, and any advice is welcome. :-)

@DimitryAndric
Copy link
Author

I have proposed a tentative fix in pull request #14.

@dcommander
Copy link
Member

This is partly related to an old issue (from 4 years ago.) Basically, it was discovered that, when using the IFAST forward DCT and quality >= 98, a similar overflow could occur because the x86 SIMD routines are using 16-bit reciprocal values. The solution to that was to detect the overflow in compute_reciprocal and, if it occurs, drop back to using the non-SIMD quantization routine. This slows performance considerably, which is why there's a draconian warning in README-turbo.txt that says basically "don't use IFAST with quality >= 98." Even when it's working properly, there are other degenerate behaviors with the IFAST algorithm (including the C version) with quality >= 98. The compression ratio is as much as 15% worse for those quality levels than when using ISLOW.

As you discovered, compute_reciprocal() isn't producing correct values when divisor==1, which is apparently a rare enough situation that it hasn't caused major problems so far. 4cfa3f4 should fix this issue-- it makes compute_reciprocal() generate special values for the divisor==1 case, and those values make the C quantization function behave like an identity function.

@DimitryAndric
Copy link
Author

Thanks for the quick fix. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants