New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subpar decoding for transcoded jpegs #1470
Comments
Perhaps it would also be useful to have the original jpg for comparison, since on a superficial evaluation, I don't see the degration you are talking about, even when doing a flickering comparison at 400%. |
I don't know but I don't need to zoom in nor flicker test to notice the jxl decoding is blurrier. I'm not sure where it fails but since it's more noticeable on reds it might be related to the YUV to RGB conversion, more precisely the chroma upscaling kernel. Here's the image, I think imgur don't preprocess them. (no, only strips metadata) |
Maybe there's is another issue with rendering the pngs somehow, for example a viewer doing some upsampling with smoothing in between? |
Does it still happen if you pass |
Thanks @TheDecryptor EDIT: Some more notes.
Here the full log. |
I can see a small difference between those two decoded images when I magnify enough, but I have no reason to assume that the libjxl decode is 'worse' than the libjpeg-turbo/mozjpeg decode. There's a difference, no doubt caused by differences in the IDCT implementation, the YCbCr to RGB implementation, and possibly small effects caused by Chroma From Luma (this is where As far as I understand, libjxl does things more accurately than libjpeg-turbo/mozjpeg, e.g. we have YCbCr data in float32 and convert it to RGB in float32 before quantizing it to uint8, while libjpeg-turbo/mozjpeg afaiu decodes to 8-bit YCbCr first and then converts that to 8-bit RGB. So unless proven otherwise, I think if any jpeg decoder is 'subpar', it's more likely to be libjpeg-turbo/mozjpeg than libjxl. |
Just to be sure, which of the two images is the jxl? |
The blurrier one |
This seems to be a larger difference than expected. Can you share the original jpeg for this example? I want to try to reproduce this. |
Imgur, but any half-toned image will do. This call: Maybe the internal resampling is also involved in some YCbCr<->RGB conversions internally, don't know. |
That does not do lossless jpeg recompression at all. That decodes the JPEG to pixels, downscales the image 2x, then encodes the result losslessly. Obviously downscaling an image 2x is a very lossy operation, which will of course introduce blur. |
Please let us know if there is an example where the differences are as striking as in the second pair of images that you posted, but where the |
I was just replying to @jonsneyers quote. EDIT: No time today to do the stdev test but I could check and the issues only happen on the transcode code path, no sign of blurring on direct encode to jxl |
So, let's compare different jpeg decoders with the example image you provided.
Then I compare the decompressed images along different metrics:
Images for that comparison are here: From this and visual inspection of the results, I conclude that the differences between jxl and mozjpeg (default) with are not greater than expected, and not greater than let's say between mozjpeg (default) and mozjpeg ( When you write "I never observed such a decoding degradation", which ways of decoding jpeg did you observe other than mozjpeg? Also I think it would be helpful to point to more example of degradation to better understand this issue. |
Thanks for the tests. Sorry didn't have much time to run them myself but will arrange something on the next days, primarily only on the V plane which encodes red-green opposing values, as that's where the problem is most visible. Checking the matrices, well, there you go, a difference between 20 and 30 times higher compared to the other decoders (except 'fast'). You don't want an absolute difference greater than 0.0005 as that starts to become "easily" noticeable. To test with other images I recommend those where saturated red is prevalent and there are fine details in the reds, although the problem is chroma related overall. |
Question still is which one is closer to the actual jpeg spec, i.e. using mathematically exact, infinite precision arithmetic. As far as I understand, the existing jpeg decoders like libjpeg-turbo do iDCT with either floats or ints, but then they produce uint8 YCbCr, which then gets upsampled if needed and then converted to uint8 RGB. So in terms of the precision of arithmetic, I think we can assume that libjxl is more precise than the rest. The only potential source of inaccuracy could be the use of chroma-from-luma in the encoder, which might have subtle effects on Cb and Cr — not enough to change the quantization bucket (otherwise it wouldn't be reversible) but potentially enough to change the Cb and Cr values themselves quite a bit when the source JPEG is low enough quality. Note that JPEG conformance even allows causing an off-by-one in the quantization bucket, so this wouldn't make it non-conforming, but it could create noticeable differences in the decoded result. This is where the |
More precise than mozjpeg float? I mean, when I said subpar JPEG decoding I meant exactly that, leaving model conversion out of the question. This leaves Photoshop and mozjpeg out of test batch. I can decode with ffmpeg in avisynth via FFMS2 directly to YCbCr, as you noted that's uint8 but checking the above tables the difference is not big between 8-bit and float. |
Mozjpeg/libjpeg-turbo float is doing arithmetic during iDCT with floats, but it's still converting int16 dct coeffs to uint8 ycbcr values. I don't think any part of libjpeg-turbo uses float buffers - that code dates back to the time when most cpus didn't even do fp arithmetic yet. When doing lossless jpeg -> jxl -> jpeg, things stay as DCT coefficients, so there certainly is no conversion to RGB, there even is no iDCT/DCT. But here we're looking at jpeg -> png versus jpeg -> jxl -> png, where some of the difference is caused in the YCbCr to RGB stage, where libjpg starts from quantized (8-bit) YCbCr and jxl starts from float YCbCr. |
Actually no, my comparisons were jpeg vs jpeg -> jxl -> jpeg. I decode directly to YCbCr so the degradation is visible even without conversion to RGB. EDIT: Here's the comparison with a diff image of the V plane (no RGB conversion involved) scaled 43 times to saturate 8-bit since many values are off by 6 (have seen higher though) For clarity here is my workflow:
AviSynth+ script:
|
If you do --strip, then it will drop the jpeg bitstream reconstruction so djxl will decode to rgb pixels and then do a reencode as q95 jpeg. |
Can confirm. Then I will need to run my tests with RGB conversion in mind (output to png) because --strip saves some size by removing the jbrd box. And I also see a softer look by plain decoding with jxl-winthumb EDIT:
Still quite different even when doing the conversion to RGB in float. And the blurring is there in JXL, so the problem might be in the YCbCr -> RGB conversion, or simply in the JXL iDCT. |
Could you provide the resulting images, so we can compare them? At least the avisynth one, the jxl we can easily generate from the original jpg of course.. Did you use the |
The jbrd box allows to reconstruct the original JPEG file, so you can use any JPEG decoder you like. If the file size is a real concern, the original JPEGs can be optimized to bare JFIF before transcoding to JXL, then the jbrd box will be pretty small. The size of the jbrg box is proportional to the size of the original file, so optimizing JPEGs before transcoding, to make them as small as possible, actually makes the resulting JXL smaller too, although it affects only the jbrd data and not the actual JXL bitstream. |
If I understand correctly, in #1470 (comment) you generate a use jxl to generatea a png from a jpg (going through jxl) and then compare it with another image. Could you provide the pixels of the other image also as png? I don't understand what those two jpgs that you linked are. |
Hi @mo271, you asked me for the avisynth images and that's what I uploaded, to be more concise the output of this:
That is FFImageSource jpeg decoding (mjpeg?) then converting to RGB in float point as JXL does (to show that doesn't make such a difference). I don't know what kind of dithering JXL does so I uploaded both options, or if you tell me I can upload a matching dither but I think they are no deterministic(?) I don't know. The problem now resides that for comparison there's no framework where I can do both iDCT and RGB conversion in float, as mozjpeg only does that for iDCT (and no lossless YUV output is possible) and AviSynth only does that for RGB conversion. I can try to mimic this in Python although it's not a language I feel too comfortable with. |
I'm trying to understand this issue, and I'm not yet convinced that jxl doing everything in float give worse or even significantly worse results than other ways of converting from jpeg. The blurriness that you mention is not the only artifact to be worried about. We are comparing the jpeg decoding of jxl, triggered by
with jpeg decoding of other decoders. So far your provided one image (the original jpeg) and one other code (mozjpeg). As pngs to do the comparisons, you provide two cropped pngs where slight differences are visible, but where the butteraugli-metric is small enough that we don't worry about any unexpected degradation. > butteraugli_main tF7XRRL.png rhq5Z6i.png
0.7073692083
3-norm: 0.232715
> butteraugli_main rhq5Z6i.png tF7XRRL.png
0.8325679302
3-norm: 0.269613 In order to compare this with other ways of decoding the jpeg, it would be useful to have the png output of other codecs. (That is why I asked of pngs in #1470 (comment)) |
As nobody can actually tell, how decoded JPEG images should look like exactly, wouldn't be better to compare decoded JPEG images with the original uncompressed images? |
I'm not implying that the issue is in the float pipeline, actually I think it's in the iDCT (hopefully at decoding) of transcoded jpegs, bitdepth regardless. I could see the same blurriness with other images, but this one so far gave me the worst case scenario, probably due to the presence of a saturated red area. If I could spot it easily (not sure if it helps having a trained eye) anyone else can also do it. A high butteraugli metric might be fine for lossy, but lossless...(?). Also how are you running butteraugli, all three channels averaged? That is going to even things out, localize it to the offending channel and the metric should go higher, probably much worse than it probably should. In any case butteraugli is only tested for differences between 90 and 95, something like 96 (bad in lossless terms) would pass as legit in butteraugli scores. Another aspect is metrics are usually very bad at factoring in blurriness, I don't know how many of them do but it's the reason B-SSIM was created. I will run a test later today with butteraugli and provide you with the other pngs (unfortunately with RGB conversion baked in, that's why I didn't find a point sharing them at first) @jendalinda, yes I actually had that in mind but forgot to test. Will check as well. |
So I did a little test using this command line:
So this is producing a few jpegs from a png original, and then computing peak absolute error (PAE, lower is better), mean absolute error (MAE, lower is better) and PSNR (higher is better) between the original and the jpeg as decoded by imagemagick (i.e. libjpeg-turbo) and as decoded by libjxl after recompressing the jpeg losslessly. This is what I got for some random example image:
Here is what I got for another random test image:
For that same test image, this is what I get when adding
So basically what this shows is that at least in terms of PSNR and MAE, jxl decoding of recompressed jpegs is better than "standard" decoding in the sense that it is closer to the original image. The PAE is sometimes better, sometimes worse; not sure if that means much; at least the numbers are in the same ballpark. MAE and PSNR are consistently somewhat better for jxl decoding of a recompressed jpeg, confirming what was said earlier: libjxl is the more accurate decoder here, and if you see a difference, it's actually libjpeg-turbo producing a "subpar" result, not libjxl. |
I don't think butteraugli favors blur, it's probably just that doing a small blur like that basically does a small amount of deblocking, i.e. undoes a bit of of jpeg artifacting. Anyway, for testing the accuracy of DCT implementations, I don't think butteraugli is a good metric, it's better to just look at peak error, mean error, PSNR etc. This is a purely numerical matter, not a perceptual one, so it makes more sense to use purely numerical metrics, not perceptual ones. |
You cannot use either MAE or PSNR as blurring introduces error spread, kind of like denoising (just tested with MAE). If any we should try with a sharpness metric, I don't know of any robust one, should research on that. I know there were some algos with laplace to detect blurring but we are talking on low decimals here. EDIT: Reading a bit around, seems like variance is a good metric for blurriness. I checked it out and seems to be so. I added a twist by computing the variance in log space so small diffs are emphasized and got a lower score (blurrier) for JXl in this image (0.003476), 0.003619 (original), 0.003571 (jpeg).
|
Ok, but doesn't a less precise DCT also have the effect of introducing more (unwanted) edges, hence improving the score on that metric? I know PSNR/MAE are quite tolerant w.r.t. blurring, but they still shouldn't give a better score to an image that is more blurred than to an image that is less blurred compared to the original. |
In the images that you attached, I don't think there is any unexpected issue with problematic jpeg decoding. While there might be slight differences, they can be explained by the fact that libjxl is using floats throughout and libjpeg-turbo/mozjpeg are doing less precise computation. If you prefer the (in your eyes) less blurry (more blocky) decoding of mozjpeg, then perhaps instead of going
you could do
I'm unsure what action we are supposed to take on this issue. |
I don't know maybe, I just tested by adding a deblocker and checking metrics. I added two different ones, a deblocker PP (Deblock_QED func in AVS) and a straight HQ iDCT decoder JPEGSource (closed source unfortunately). In both cases sharpness was still higher than the JXL decoding. I improved the code to compute over Luma instead of Red channel (practically the same), and added a gaussian type of blur instead of mean average. There are more sharpness metrics listed here, I didn't test them.
JPEGSource,Deblock_QED(quant1=30, quant2=40), |a.sharpness - b.sharpness| diff map |
We showed before that mozjpeg float decoding against JXL was still a big difference. And that RGB float conversion wasn't a meaningful change either. My suspect was the iDCT coefficients. Maybe I can give them a try and see if the problem lies there. Do you know what file I should be looking at? |
Reading the JXL paper there are many steps where a low-pass is performed. I don't know if anyone is involved in lossless jpeg transcodes though. Color correlation
DC handling
LF predictions
Loop filters
What I do know is JPEG XL does things differently at the decoding stage as depicted by the slider from @jonsneyers. As a reference I have been using very good coefficients in mozjpeg that achieves better qual/size ratio than any of the internal presets:
|
@veluca93 pointed out that the quant bias, that we have, see here: Line 59 in a67a570
leads to some additional numerical inaccuracies wenn decompressing lossless recompressed jpgs to pixels. I gave that a try and hardcoded the values in that struct to be Do you think we should either somehow turn the quant bias of in the jpeg-decoding case, or introduce another flag to control this, @jonsneyers and @veluca93? |
Where does the quant bias get used in the case of jpeg recompression? I thought this was an encode thing (which wouldn't apply to jpeg recompression since there the coeffs are quantized already), but this looks like it is a decode thing? Is this used for AC or for DC or both? Is it signaled? If it causes the decoded image to be closer to the original image (as my little test with mae/psnr suggested), I don't think we should disable it by default. Getting pixels that are closer to the original seems to me to be more valuable than getting pixels that are closer to the decode result of libjpeg-turbo. Could you do a test like the one I did (#1470 (comment)) to see how it affects the pae/mae/psnr w.r.t. the original image? |
It is signaled. AC only, and actually decode only :) |
Thanks for the tests @mo271, maybe the bell-shape distribution (looks like T-distribution but didn't check) causes the noted smoothness. I collected a bunch of DCT quant coefs and the one posted gave me the best results on mozjpeg (had to disable trellis as it was also causing blur). It follows an equation by Jukka Ripatti. @jonsneyers, I don't think those metrics are suited for sharpness evaluation which is the main case here. We can run log variance as a sharpness metric or the other ones linked (#1470 (comment)) |
I also noticed the same issue with djxl decoding of jpeg recompressed images. When the image is not decoded back to jpg the problem occurs.
When decoded, there's slight blur. This is typical for lossy jxl encoding with default settings. When decoded back to jpg instead of png, it's 100% lossless and matches hashes with original jpg. |
For the one year anniversary of the issue I gave the last dev build a whirl and the output in pixels of the lossless transcoded jpg is still as blurry as it was a year ago, actually a bit blurrier since now I cannot use This shocks me as the solution was given back then. So I wonder if there's a problem in implementing the fix or at least an user defined quant table, as I believe this is being a showstopper for mass adoption of the format. I had hopes this was something else than JPEG 2000 or JPEG XR. I improved the sharpness function a bit to be more granular with little details and give more weight to the texture and not to the edges. These are the results. |
Describe the bug
Being aware that jpeg decoding is not standardized I never observed such a decoding degradation as cjxl did.
For comparison I decoded the jpeg with mozjpeg library and compared with the output of "To Reproduce" calls.
For mozjpeg I used the call:
djpeg -bmp -dct float "input.jpg" >"input.jpg.bmp"
To Reproduce
Lossless transcode with jpeg input:
cjxl "input.jpg" "input.jpg.jxl"
djxl "input.jpg.jxl" "input.jpg.jxl.png"
Expected behavior
I expect an exact, if not very close rendition of a reference jpeg decoding library, not a degradation.
Screenshots
dec_moz
dec_jxl
Environment
The text was updated successfully, but these errors were encountered: