Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subpar decoding for transcoded jpegs #1470

Open
Dogway opened this issue Jun 2, 2022 · 46 comments
Open

Subpar decoding for transcoded jpegs #1470

Dogway opened this issue Jun 2, 2022 · 46 comments
Labels
unrelated to 1.0 Things that need not be done before the 1.0 version milestone

Comments

@Dogway
Copy link

Dogway commented Jun 2, 2022

Describe the bug
Being aware that jpeg decoding is not standardized I never observed such a decoding degradation as cjxl did.
For comparison I decoded the jpeg with mozjpeg library and compared with the output of "To Reproduce" calls.
For mozjpeg I used the call:
djpeg -bmp -dct float "input.jpg" >"input.jpg.bmp"

To Reproduce
Lossless transcode with jpeg input:
cjxl "input.jpg" "input.jpg.jxl"
djxl "input.jpg.jxl" "input.jpg.jxl.png"

Expected behavior
I expect an exact, if not very close rendition of a reference jpeg decoding library, not a degradation.

Screenshots
dec_moz
dec_jxl

Environment

  • OS: Win7 SP1 x64
  • CPU type: i7-4790K
  • djxl version string: JPEG XL decoder v0.7.0 ab775e2 [AVX2,SSE4,SSSE3,Scalar]
  • cjxl version string: JPEG XL encoder v0.7.0 b432598 [AVX2,SSE4,SSSE3,Scalar]
@mo271
Copy link
Member

mo271 commented Jun 2, 2022

Let me upload the images you linked to this issue, so they don't get lost.
dec_moz:
dec_moz

dec_jxl:
dec_jxl

@mo271
Copy link
Member

mo271 commented Jun 2, 2022

Perhaps it would also be useful to have the original jpg for comparison, since on a superficial evaluation, I don't see the degration you are talking about, even when doing a flickering comparison at 400%.

@mo271 mo271 added the unrelated to 1.0 Things that need not be done before the 1.0 version milestone label Jun 2, 2022
@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

I don't know but I don't need to zoom in nor flicker test to notice the jxl decoding is blurrier.

I'm not sure where it fails but since it's more noticeable on reds it might be related to the YUV to RGB conversion, more precisely the chroma upscaling kernel.

Here's the image, I think imgur don't preprocess them. (no, only strips metadata)

@mo271
Copy link
Member

mo271 commented Jun 2, 2022

Maybe there's is another issue with rendering the pngs somehow, for example a viewer doing some upsampling with smoothing in between?
Do you also find the differences when looking at those images in a browser?
Perhaps a screenshot could help us understand what degration you are talking about.
Thanks!

@TheDecryptor
Copy link

Does it still happen if you pass --jpeg_transcode_disable_cfl to cjxl when transcoding? If I decompose the images to LAB in GIMP I can see barely any changes in the L channel, but some larger differences in the A/B channels that could explain it.

@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

Thanks @TheDecryptor
That helped quite a bit, what does it stands for? I searched and found Contextual Feature Loss, is it?
Still there's some subtle blur remaining though.

EDIT: Some more notes.
I observed the image is full chroma so no chroma scaling issues.
I passed it through JPEGSnoop and found a few interetings things that might be related.

Photoshop Save As Quality        = 8 
Photoshop Save Format            = "Optimized"
Photoshop Save Progressive Scans = "3 Scans"

Scan Decode Mode: No IDCT (DC only)
  NOTE: Low-resolution DC component shown. Can decode full-res with [Options->Scan Segment->Full IDCT]

Here the full log.

@jonsneyers
Copy link
Member

I can see a small difference between those two decoded images when I magnify enough, but I have no reason to assume that the libjxl decode is 'worse' than the libjpeg-turbo/mozjpeg decode. There's a difference, no doubt caused by differences in the IDCT implementation, the YCbCr to RGB implementation, and possibly small effects caused by Chroma From Luma (this is where --jpeg_transcode_disable_cfl can indeed help).

As far as I understand, libjxl does things more accurately than libjpeg-turbo/mozjpeg, e.g. we have YCbCr data in float32 and convert it to RGB in float32 before quantizing it to uint8, while libjpeg-turbo/mozjpeg afaiu decodes to 8-bit YCbCr first and then converts that to 8-bit RGB.

So unless proven otherwise, I think if any jpeg decoder is 'subpar', it's more likely to be libjpeg-turbo/mozjpeg than libjxl.

@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

Yes, dithering can cause a slight blurring which might explain the remaining slight blur, but the offending one that fixed with disabling Chroma From Luma might help, maybe disabling it when 4:4:4?

So unless proven otherwise, I think if any jpeg decoder is 'subpar', it's more likely to be libjpeg-turbo/mozjpeg than libjxl.

Probably it isn't related to this issue because of the full chroma, but I observed that mozjpeg resamples chroma with bilinear (triangular kernel) while cjxl does some wizardry that turns into a blurry blocky mess. I wouldn't make such assumptions blindly.
(no zoom, 100% crop)

@mo271
Copy link
Member

mo271 commented Jun 2, 2022

Just to be sure, which of the two images is the jxl?

@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

Just to be sure, which of the two images is the jxl?

The blurrier one

@jonsneyers
Copy link
Member

This seems to be a larger difference than expected. Can you share the original jpeg for this example? I want to try to reproduce this.

@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

Imgur, but any half-toned image will do. This call:
cjxl "SF.jpg" "SF.jxl" -d 0 -m --resampling=2

Maybe the internal resampling is also involved in some YCbCr<->RGB conversions internally, don't know.

@jonsneyers
Copy link
Member

That does not do lossless jpeg recompression at all. That decodes the JPEG to pixels, downscales the image 2x, then encodes the result losslessly. Obviously downscaling an image 2x is a very lossy operation, which will of course introduce blur.

@mo271
Copy link
Member

mo271 commented Jun 2, 2022

Please let us know if there is an example where the differences are as striking as in the second pair of images that you posted, but where the --resampling=2 isn't used.
The first pair of images you posted (where I assume --resampling=2 was not used) seems to be well within the expected range of deviation between different jpg decoders.

@Dogway
Copy link
Author

Dogway commented Jun 2, 2022

I was just replying to @jonsneyers quote.
Tomorrow I will upload a deviation test between different decoders to see how much it differs (with default cfl enabled)

EDIT: No time today to do the stdev test but I could check and the issues only happen on the transcode code path, no sign of blurring on direct encode to jxl
And reuploading the original images in b channel of Lab, for the few of you with sight issues.

@mo271
Copy link
Member

mo271 commented Jun 7, 2022

So, let's compare different jpeg decoders with the example image you provided.

  • jxl: (cjxl losslessly recompressed, then djxl),
  • mjpeg: using ffmpeg
  • mozjpeg/libjpegturbo with three different settings, namely
    • -dct in (the default)
    • -dct fast (same as -fast?) and
    • -dct float
  • rust's jpeg decoder, https://docs.rs/jpeg-decoder/latest/jpeg_decoder/

Then I compare the decompressed images along different metrics:

  • MSE (as a sanity check)
  • MAE
  • PAE
  • butteraugli
    Here butteraugli is not symmetric, which is why I print the whole matrix of comparisons, even though for the other metrics half of the matrix would be enough to communicate the differences for all pairs.
MSE jxl mjpeg fast float int rust
jxl 0 (0) 1.64258 (2.50641e-05) 2.79644 (4.2671e-05) 1.64121 (2.50433e-05) 1.63914 (2.50116e-05) 1.64237 (2.5061e-05)
mjpeg 1.64258 (2.50641e-05) 0 (0) 1.92949 (2.94421e-05) 0.0443743 (6.77109e-07) 0.0698835 (1.06635e-06) 0.0835574 (1.275e-06)
fast 2.79644 (4.2671e-05) 1.92949 (2.94421e-05) 0 (0) 1.98651 (3.03123e-05) 2.00253 (3.05567e-05) 2.03194 (3.10054e-05)
float 1.64121 (2.50433e-05) 0.0443743 (6.77109e-07) 1.98651 (3.03123e-05) 0 (0) 0.0467264 (7.12999e-07) 0.0508508 (7.75934e-07)
int 1.63914 (2.50116e-05) 0.0698835 (1.06635e-06) 2.00253 (3.05567e-05) 0.0467264 (7.12999e-07) 0 (0) 0.0543799 (8.29784e-07)
rust 1.64237 (2.5061e-05) 0.0835574 (1.275e-06) 2.03194 (3.10054e-05) 0.0508508 (7.75934e-07) 0.0543799 (8.29784e-07) 0 (0)
PAE jxl mjpeg fast float int rust
jxl 0 (0) 6425 (0.0980392) 6425 (0.0980392) 6425 (0.0980392) 6425 (0.0980392) 6425 (0.0980392)
mjpeg 6425 (0.0980392) 0 (0) 1542 (0.0235294) 771 (0.0117647) 771 (0.0117647) 771 (0.0117647)
fast 6425 (0.0980392) 1542 (0.0235294) 0 (0) 1542 (0.0235294) 1542 (0.0235294) 1542 (0.0235294)
float 6425 (0.0980392) 771 (0.0117647) 1542 (0.0235294) 0 (0) 771 (0.0117647) 771 (0.0117647)
int 6425 (0.0980392) 771 (0.0117647) 1542 (0.0235294) 771 (0.0117647) 0 (0) 771 (0.0117647)
rust 6425 (0.0980392) 771 (0.0117647) 1542 (0.0235294) 771 (0.0117647) 771 (0.0117647) 0 (0)
MAE jxl mjpeg fast float int rust
jxl 0 (0) 208.392 (0.00317986) 310.788 (0.00474231) 208.236 (0.00317748) 208.028 (0.0031743) 208.407 (0.00318009)
mjpeg 208.392 (0.00317986) 0 (0) 259.174 (0.00395475) 8.37492 (0.000127793) 13.0649 (0.000199357) 15.6423 (0.000238686)
fast 310.788 (0.00474231) 259.174 (0.00395475) 0 (0) 264.576 (0.00403718) 265.991 (0.00405876) 268.705 (0.00410018)
float 208.236 (0.00317748) 8.37492 (0.000127793) 264.576 (0.00403718) 0 (0) 9.01121 (0.000137502) 9.9194 (0.00015136)
int 208.028 (0.0031743) 13.0649 (0.000199357) 265.991 (0.00405876) 9.01121 (0.000137502) 0 (0) 11.3722 (0.000173528)
rust 208.407 (0.00318009) 15.6423 (0.000238686) 268.705 (0.00410018) 9.9194 (0.00015136) 11.3722 (0.000173528) 0 (0)
butteraugli jxl mjpeg fast float int rust
jxl 0.0000000000 0.9411652088 1.1420263052 0.9466884732 0.9448851943 0.9460596442
mjpeg 1.0180765390 0.0000000000 1.1411626339 0.7881607413 0.8761280775 0.8702933788
fast 1.1666915417 1.1633328199 0.0000000000 1.1629202366 1.1629323959 1.1628748178
float 1.0231082439 0.8004256487 1.1426250935 0.0000000000 0.6474697590 0.6527773142
int 1.0192036629 0.9545516968 1.1423904896 0.6336971521 0.0000000000 0.3134871423
rust 1.0201907158 0.9508787394 1.1427843571 0.6364465952 0.3598134220 0.0000000000

Images for that comparison are here:
https://gist.github.com/mo271/f47f811dbff2f34dedd602f1f330b1f5

From this and visual inspection of the results, I conclude that the differences between jxl and mozjpeg (default) with are not greater than expected, and not greater than let's say between mozjpeg (default) and mozjpeg (-fast). Of course those are the results with this particular image, comparing on a representative corpus could give more insight on how the different ways of decoding jpeg behave.

When you write "I never observed such a decoding degradation", which ways of decoding jpeg did you observe other than mozjpeg?

Also I think it would be helpful to point to more example of degradation to better understand this issue.

@Dogway
Copy link
Author

Dogway commented Jun 7, 2022

Thanks for the tests. Sorry didn't have much time to run them myself but will arrange something on the next days, primarily only on the V plane which encodes red-green opposing values, as that's where the problem is most visible.

Checking the matrices, well, there you go, a difference between 20 and 30 times higher compared to the other decoders (except 'fast'). You don't want an absolute difference greater than 0.0005 as that starts to become "easily" noticeable.
I can add into the mix other decoders like Photoshop (old and recent versions) and different avisynth jpeg loader flavors, as those also decoded fine from my observations.

JXL dec_metrics

To test with other images I recommend those where saturated red is prevalent and there are fine details in the reds, although the problem is chroma related overall.

@jonsneyers
Copy link
Member

Question still is which one is closer to the actual jpeg spec, i.e. using mathematically exact, infinite precision arithmetic.

As far as I understand, the existing jpeg decoders like libjpeg-turbo do iDCT with either floats or ints, but then they produce uint8 YCbCr, which then gets upsampled if needed and then converted to uint8 RGB.
By contrast, current libjxl does the whole decode pipeline with floats, and only converts to uint8 at the very end if uint8 is what is desired.

So in terms of the precision of arithmetic, I think we can assume that libjxl is more precise than the rest.
Note that even just converting 8-bit RGB to 8-bit YCbCr and back causes at least off-by-two errors in R and B (something libjxl avoids by never materializing 8-bit YCbCr buffers but keeping things in float).

The only potential source of inaccuracy could be the use of chroma-from-luma in the encoder, which might have subtle effects on Cb and Cr — not enough to change the quantization bucket (otherwise it wouldn't be reversible) but potentially enough to change the Cb and Cr values themselves quite a bit when the source JPEG is low enough quality. Note that JPEG conformance even allows causing an off-by-one in the quantization bucket, so this wouldn't make it non-conforming, but it could create noticeable differences in the decoded result. This is where the --jpeg_transcode_disable_cfl flag is useful.

@Dogway
Copy link
Author

Dogway commented Jun 7, 2022

So in terms of the precision of arithmetic, I think we can assume that libjxl is more precise than the rest.

More precise than mozjpeg float? I mean, when I said subpar JPEG decoding I meant exactly that, leaving model conversion out of the question.
Correct me if I'm wrong but for transcoding JPEG -> JXL -> JPEG there's no conversion to or from RGB?
If you decoded the source JPEG and the JXL decoded JPEG both diectly to YCbCr they should give you the same output, that's how I performed the comparisons (then using the same YUV->RGB converter for both to output png). I can't tell about @mo271 tests as I don't think mozjpeg can output YUV in any form.

This leaves Photoshop and mozjpeg out of test batch. I can decode with ffmpeg in avisynth via FFMS2 directly to YCbCr, as you noted that's uint8 but checking the above tables the difference is not big between 8-bit and float.

@jonsneyers
Copy link
Member

Mozjpeg/libjpeg-turbo float is doing arithmetic during iDCT with floats, but it's still converting int16 dct coeffs to uint8 ycbcr values. I don't think any part of libjpeg-turbo uses float buffers - that code dates back to the time when most cpus didn't even do fp arithmetic yet.

When doing lossless jpeg -> jxl -> jpeg, things stay as DCT coefficients, so there certainly is no conversion to RGB, there even is no iDCT/DCT. But here we're looking at jpeg -> png versus jpeg -> jxl -> png, where some of the difference is caused in the YCbCr to RGB stage, where libjpg starts from quantized (8-bit) YCbCr and jxl starts from float YCbCr.

@Dogway
Copy link
Author

Dogway commented Jun 7, 2022

Actually no, my comparisons were jpeg vs jpeg -> jxl -> jpeg. I decode directly to YCbCr so the degradation is visible even without conversion to RGB.
In any case, hold on, I will do a second check just in case.

EDIT: Here's the comparison with a diff image of the V plane (no RGB conversion involved) scaled 43 times to saturate 8-bit since many values are off by 6 (have seen higher though)
Imgur

For clarity here is my workflow:

cjxl "input.jpg" "input.jpg.jxl" --strip --num_threads=4 --jpeg_transcode_disable_cfl
djxl "input.jpg.jxl" "input.jpg.jxl.jpg"

AviSynth+ script:

a=FFImageSource("input.jpg")
b=FFImageSource("input.jpg.jxl.jpg")
a=a.ExtractV()
b=b.ExtractV()
ex_makediff(a,b, metric="MAE", aug=43, dif=true, show=2)

@jonsneyers
Copy link
Member

If you do --strip, then it will drop the jpeg bitstream reconstruction so djxl will decode to rgb pixels and then do a reencode as q95 jpeg.
Without the --strip, the process would be file-exact reversible. With the --strip, you're introducing an extra lossy transcode...

@Dogway
Copy link
Author

Dogway commented Jun 7, 2022

Can confirm. Then I will need to run my tests with RGB conversion in mind (output to png) because --strip saves some size by removing the jbrd box. And I also see a softer look by plain decoding with jxl-winthumb

EDIT:

cjxl "input.jpg" "input.jpg.jxl" --num_threads=4 --jpeg_transcode_disable_cfl
djxl "input.jpg.jxl" "input.jpg.jxl.png"
a=FFImageSource("input.jpg")
a.ConvertBits(32)
z_ConvertFormat(pixel_type="RGBPS",colorspace_op ="170m:2020:709:f=>rgb:2020:709:f")
# or
# fmtc_matrix(mats="601", col_fam="RGB", fulls=true, fulld=true)
a=last

b=ImageSource("input.jpg.jxl.png",start=0,end=0).ConvertToPlanarRGB()

a=a.                moncurve_f(2.222,0.09929682681).RGB_to_XYZ("170M").XYZ_to_Lab()
b=b.Convertbits(32).moncurve_f(2.222,0.09929682681).RGB_to_XYZ("170M").XYZ_to_Lab()
a=a.ExtractG()
b=b.ExtractG()

ex_makediff(a, b, metric="MAE", aug=43, dif=true, show=2)
Convertbits(8,dither=-1)

Still quite different even when doing the conversion to RGB in float. And the blurring is there in JXL, so the problem might be in the YCbCr -> RGB conversion, or simply in the JXL iDCT.
Imgur

@mo271
Copy link
Member

mo271 commented Jun 8, 2022

Could you provide the resulting images, so we can compare them? At least the avisynth one, the jxl we can easily generate from the original jpg of course..

Did you use the --strip option in the initial post in this issue?

@Dogway
Copy link
Author

Dogway commented Jun 8, 2022

No, I didn't use --strip in OP, but the issue is JXL uses a different iDCT codepath when reconstructing jpeg so it can't be used as reference. The problem seems to lie when iDCT (not reconstruction) transcoded jpegs, rather than color model conversion I think.
Rounded
Dithered

@jendalinda
Copy link

The jbrd box allows to reconstruct the original JPEG file, so you can use any JPEG decoder you like. If the file size is a real concern, the original JPEGs can be optimized to bare JFIF before transcoding to JXL, then the jbrd box will be pretty small. The size of the jbrg box is proportional to the size of the original file, so optimizing JPEGs before transcoding, to make them as small as possible, actually makes the resulting JXL smaller too, although it affects only the jbrd data and not the actual JXL bitstream.

@mo271
Copy link
Member

mo271 commented Jun 8, 2022

No, I didn't use --strip in OP, but the issue is JXL uses a different iDCT codepath when reconstructing jpeg so it can't be used as reference. The problem seems to lie when iDCT (not reconstruction) transcoded jpegs, rather than color model conversion I think. Rounded Dithered

If I understand correctly, in #1470 (comment) you generate a use jxl to generatea a png from a jpg (going through jxl) and then compare it with another image. Could you provide the pixels of the other image also as png?

I don't understand what those two jpgs that you linked are.

@Dogway
Copy link
Author

Dogway commented Jun 10, 2022

I don't understand what those two jpgs that you linked are.

Hi @mo271, you asked me for the avisynth images and that's what I uploaded, to be more concise the output of this:

FFImageSource("input.jpg") # original image
ConvertBits(32)
fmtc_matrix(mats="601", col_fam="RGB", fulls=true, fulld=true)
Convertbits(8,dither=-1) # or dither=1 for floyd-steinberg

That is FFImageSource jpeg decoding (mjpeg?) then converting to RGB in float point as JXL does (to show that doesn't make such a difference). I don't know what kind of dithering JXL does so I uploaded both options, or if you tell me I can upload a matching dither but I think they are no deterministic(?) I don't know.

The problem now resides that for comparison there's no framework where I can do both iDCT and RGB conversion in float, as mozjpeg only does that for iDCT (and no lossless YUV output is possible) and AviSynth only does that for RGB conversion.
In any case it doesn't take much to realize the problem is not in either of these (iDCT in int/float or RGB in int/float). JXL to JPEG code path is of reconstruction not decoding, where the issue shows only when actually decoding TRANSCODED jpeg's.

I can try to mimic this in Python although it's not a language I feel too comfortable with.

@Dogway Dogway changed the title Subpar jpeg decoding Subpar decoding for transcoded jpegs Jun 10, 2022
@mo271
Copy link
Member

mo271 commented Jun 15, 2022

I'm trying to understand this issue, and I'm not yet convinced that jxl doing everything in float give worse or even significantly worse results than other ways of converting from jpeg. The blurriness that you mention is not the only artifact to be worried about.

We are comparing the jpeg decoding of jxl, triggered by

cjxl "input.jpg" "input.jpg.jxl"
djxl "input.jpg.jxl" "input.jpg.jxl.png"

with jpeg decoding of other decoders. So far your provided one image (the original jpeg) and one other code (mozjpeg). As pngs to do the comparisons, you provide two cropped pngs where slight differences are visible, but where the butteraugli-metric is small enough that we don't worry about any unexpected degradation.

> butteraugli_main tF7XRRL.png rhq5Z6i.png
0.7073692083
3-norm: 0.232715
> butteraugli_main rhq5Z6i.png tF7XRRL.png
0.8325679302
3-norm: 0.269613

In order to compare this with other ways of decoding the jpeg, it would be useful to have the png output of other codecs. (That is why I asked of pngs in #1470 (comment))
Ideally we would like to compare against some decoder that does everything in float or arbitrary precision arithmetic.

@jendalinda
Copy link

As nobody can actually tell, how decoded JPEG images should look like exactly, wouldn't be better to compare decoded JPEG images with the original uncompressed images?

@Dogway
Copy link
Author

Dogway commented Jun 15, 2022

I'm not implying that the issue is in the float pipeline, actually I think it's in the iDCT (hopefully at decoding) of transcoded jpegs, bitdepth regardless.

I could see the same blurriness with other images, but this one so far gave me the worst case scenario, probably due to the presence of a saturated red area. If I could spot it easily (not sure if it helps having a trained eye) anyone else can also do it. A high butteraugli metric might be fine for lossy, but lossless...(?).

Also how are you running butteraugli, all three channels averaged? That is going to even things out, localize it to the offending channel and the metric should go higher, probably much worse than it probably should.

In any case butteraugli is only tested for differences between 90 and 95, something like 96 (bad in lossless terms) would pass as legit in butteraugli scores. Another aspect is metrics are usually very bad at factoring in blurriness, I don't know how many of them do but it's the reason B-SSIM was created. I will run a test later today with butteraugli and provide you with the other pngs (unfortunately with RGB conversion baked in, that's why I didn't find a point sharing them at first)

@jendalinda, yes I actually had that in mind but forgot to test. Will check as well.

@jonsneyers
Copy link
Member

So I did a little test using this command line:

f=[INPUT FILENAME]; for i in 70 80 90 95; do echo q$i:; convert $f -quality $i $f.q$i.jpg; cjxl --quiet $f.q$i.jpg $f.q$i.jpg.jxl; djxl $f.q$i.jpg.jxl $f.q$i.jpg.jxl.png --bits_per_sample 16 2>/dev/null; for m in PAE MAE PSNR; do echo -n "libjpeg-turbo: $m "; compare -metric $m $f $f.q$i.jpg null:; echo; echo -n "jxl: $m "; compare -metric $m $f $f.q$i.jpg.jxl.png null:; echo; done; done

So this is producing a few jpegs from a png original, and then computing peak absolute error (PAE, lower is better), mean absolute error (MAE, lower is better) and PSNR (higher is better) between the original and the jpeg as decoded by imagemagick (i.e. libjpeg-turbo) and as decoded by libjxl after recompressing the jpeg losslessly.
You can try this test starting from any input image and with other quality settings if you want.

This is what I got for some random example image:

q70:
libjpeg-turbo: PAE 21331 (0.32549)
jxl: PAE 22102 (0.337255)
libjpeg-turbo: MAE 947.761 (0.0144619)
jxl: MAE 924.809 (0.0141117)
libjpeg-turbo: PSNR 33.6959
jxl: PSNR 33.9103
q80:
libjpeg-turbo: PAE 12336 (0.188235)
jxl: PAE 12932 (0.19733)
libjpeg-turbo: MAE 824.314 (0.0125782)
jxl: MAE 803.689 (0.0122635)
libjpeg-turbo: PSNR 35.0689
jxl: PSNR 35.2994
q90:
libjpeg-turbo: PAE 8224 (0.12549)
jxl: PAE 9858 (0.150423)
libjpeg-turbo: MAE 602.758 (0.00919749)
jxl: MAE 586.17 (0.00894438)
libjpeg-turbo: PSNR 38.162
jxl: PSNR 38.436
q95:
libjpeg-turbo: PAE 5911 (0.0901961)
jxl: PAE 6269 (0.0956588)
libjpeg-turbo: MAE 454.982 (0.00694257)
jxl: MAE 441.292 (0.00673369)
libjpeg-turbo: PSNR 40.7147
jxl: PSNR 41.0553

Here is what I got for another random test image:

q70:
libjpeg-turbo: PAE 20046 (0.305882)
jxl: PAE 19559 (0.298451)
libjpeg-turbo: MAE 1006.07 (0.0153517)
jxl: MAE 987.158 (0.0150631)
libjpeg-turbo: PSNR 32.9006
jxl: PSNR 33.06
q80:
libjpeg-turbo: PAE 17733 (0.270588)
jxl: PAE 18204 (0.277775)
libjpeg-turbo: MAE 886.765 (0.0135312)
jxl: MAE 868.436 (0.0132515)
libjpeg-turbo: PSNR 34.1043
jxl: PSNR 34.2774
q90:
libjpeg-turbo: PAE 10023 (0.152941)
jxl: PAE 14324 (0.21857)
libjpeg-turbo: MAE 635.731 (0.00970063)
jxl: MAE 614.988 (0.00938412)
libjpeg-turbo: PSNR 37.4551
jxl: PSNR 37.7492
q95:
libjpeg-turbo: PAE 6425 (0.0980392)
jxl: PAE 7089 (0.108171)
libjpeg-turbo: MAE 473.937 (0.00723182)
jxl: MAE 462.292 (0.00705412)
libjpeg-turbo: PSNR 40.1479
jxl: PSNR 40.4159

For that same test image, this is what I get when adding --jpeg_transcode_disable_cfl to cjxl :

q70:
libjpeg-turbo: PAE 20046 (0.305882)
jxl: PAE 19559 (0.298451)
libjpeg-turbo: MAE 1006.07 (0.0153517)
jxl: MAE 987.158 (0.0150631)
libjpeg-turbo: PSNR 32.9006
jxl: PSNR 33.06
q80:
libjpeg-turbo: PAE 17733 (0.270588)
jxl: PAE 18204 (0.277775)
libjpeg-turbo: MAE 886.765 (0.0135312)
jxl: MAE 868.436 (0.0132515)
libjpeg-turbo: PSNR 34.1043
jxl: PSNR 34.2774
q90:
libjpeg-turbo: PAE 10023 (0.152941)
jxl: PAE 10127 (0.154528)
libjpeg-turbo: MAE 635.731 (0.00970063)
jxl: MAE 615.809 (0.00939665)
libjpeg-turbo: PSNR 37.4551
jxl: PSNR 37.7718
q95:
libjpeg-turbo: PAE 6425 (0.0980392)
jxl: PAE 5752 (0.0877699)
libjpeg-turbo: MAE 473.937 (0.00723182)
jxl: MAE 456.713 (0.006969)
libjpeg-turbo: PSNR 40.1479
jxl: PSNR 40.5829

So basically what this shows is that at least in terms of PSNR and MAE, jxl decoding of recompressed jpegs is better than "standard" decoding in the sense that it is closer to the original image. The PAE is sometimes better, sometimes worse; not sure if that means much; at least the numbers are in the same ballpark.

MAE and PSNR are consistently somewhat better for jxl decoding of a recompressed jpeg, confirming what was said earlier: libjxl is the more accurate decoder here, and if you see a difference, it's actually libjpeg-turbo producing a "subpar" result, not libjxl.

@Dogway
Copy link
Author

Dogway commented Jun 15, 2022

More tests here, Pyron.png is ground truth image, I converted to jpeg using libjpegturbo since mozjpeg was a bit blurry:

o=ImageSource("Pyron.png",start=0,end=0).ConverttoPlanarRGB().ConvertBits(32,fulls=true)
a=ImageSource("Pyronq95turbo.jpg.bmp",start=0,end=0).ConverttoPlanarRGB().ConvertBits(32,fulls=true)
b=ImageSource("Pyronq95turbo.jpg.jxl.png",start=0,end=0).ConverttoPlanarRGB().ConvertBits(32,fulls=true)

bl=o.ex_blur(0.3)
Butteraugli(o,bl) # score: 0.8894

Butteraugli(o,a)  # score: 1.524369

Butteraugli(o,b)  # score: 1.452442

bl=a.ex_blur(0.25)
Butteraugli(o,bl) # score: 1.465598

Which means that butteraugli metric is favoring blurry targets. Will run more tests.
Original, JPEG, JXL

@jonsneyers
Copy link
Member

I don't think butteraugli favors blur, it's probably just that doing a small blur like that basically does a small amount of deblocking, i.e. undoes a bit of of jpeg artifacting.

Anyway, for testing the accuracy of DCT implementations, I don't think butteraugli is a good metric, it's better to just look at peak error, mean error, PSNR etc. This is a purely numerical matter, not a perceptual one, so it makes more sense to use purely numerical metrics, not perceptual ones.

@Dogway
Copy link
Author

Dogway commented Jun 15, 2022

You cannot use either MAE or PSNR as blurring introduces error spread, kind of like denoising (just tested with MAE). If any we should try with a sharpness metric, I don't know of any robust one, should research on that. I know there were some algos with laplace to detect blurring but we are talking on low decimals here.
Maybe using log versions of metrics can help to emphasize low diffs.

EDIT: Reading a bit around, seems like variance is a good metric for blurriness. I checked it out and seems to be so. I added a twist by computing the variance in log space so small diffs are emphasized and got a lower score (blurrier) for JXl in this image (0.003476), 0.003619 (original), 0.003571 (jpeg).

o=o.extractR()
a=a.extractR()
b=b.extractR()

o=o
mx = ex_luts(o, mode="max", pixels=ex_shape(1,mode="square",center=true))
mn = ex_luts(o, mode="min", pixels=ex_shape(1,mode="square",center=true))
av = ex_luts(o, mode="avg", pixels=ex_shape(1,mode="square",center=true))

#~ ex_lutxyz(av,mn,mx,"y x - dup * z x - dup * + 0.5 *") # Usual variance
ex_lutxyz(av,mn,mx,"y 1 + log x 1 + log X@ - dup * z 1 + log X - dup * + 0.5 *") # variance in log space

Log Variance x10 augmented:

@jonsneyers
Copy link
Member

Ok, but doesn't a less precise DCT also have the effect of introducing more (unwanted) edges, hence improving the score on that metric?

I know PSNR/MAE are quite tolerant w.r.t. blurring, but they still shouldn't give a better score to an image that is more blurred than to an image that is less blurred compared to the original.

@mo271
Copy link
Member

mo271 commented Jun 16, 2022

In the images that you attached, I don't think there is any unexpected issue with problematic jpeg decoding. While there might be slight differences, they can be explained by the fact that libjxl is using floats throughout and libjpeg-turbo/mozjpeg are doing less precise computation.
Artificially doing less precise calculations in order to be more similar to existing jpeg decoding implementation, is not really an option, especially in the light of the fact that jpg-decoding is in some sense a side effect of the fact that we are able to both losslessly re-compress existing jpgs and also decode to other formats (like png).

If you prefer the (in your eyes) less blurry (more blocky) decoding of mozjpeg, then perhaps instead of going

  • jpg -----losslessly recompress-with-cjxl----->
  • jxl ----decode-with-djxl------->
  • png

you could do

  • jpg -----losslessly recompress-with-cjxl----->
  • jxl ----reconstruct-losslessly-with-djxl------->
  • jpg -----decode-with_djpeg ----->
  • png

I'm unsure what action we are supposed to take on this issue.

@Dogway
Copy link
Author

Dogway commented Jun 16, 2022

Ok, but doesn't a less precise DCT also have the effect of introducing more (unwanted) edges, hence improving the score on that metric?

I don't know maybe, I just tested by adding a deblocker and checking metrics. I added two different ones, a deblocker PP (Deblock_QED func in AVS) and a straight HQ iDCT decoder JPEGSource (closed source unfortunately). In both cases sharpness was still higher than the JXL decoding. I improved the code to compute over Luma instead of Red channel (practically the same), and added a gaussian type of blur instead of mean average.

There are more sharpness metrics listed here, I didn't test them.

o=o.DotClip([0.298903,0.586620,0.114477])
a=a.DotClip([0.298903,0.586620,0.114477])
b=b.DotClip([0.298903,0.586620,0.114477])
sh=ex_shape(1,mode="square",center=true)

mx = ex_luts(b, mode="max", pixels=sh)
mn = ex_luts(b, mode="min", pixels=sh)
av = b.ex_blur(0.5,mode="gaussian")
bv=ex_lutxyz(av,mn,mx,"y 1 + log x 1 + log X@ - dup * z 1 + log X - dup * + 0.5 *")

mx = ex_luts(a, mode="max", pixels=sh)
mn = ex_luts(a, mode="min", pixels=sh)
av = a.ex_blur(0.5,mode="gaussian")
av=ex_lutxyz(av,mn,mx,"y 1 + log x 1 + log X@ - dup * z 1 + log X - dup * + 0.5 *")

# ex_makediff(bv,av, metric="MAE", aug=50, dif=true, show=3) # Sharpness Diff x50 augmented

ScriptClip(av, function [] () {
    avg=AverageLuma()
    subtitle("Log Variance: "+string(avg))
} )

JPEGSource,Deblock_QED(quant1=30, quant2=40), |a.sharpness - b.sharpness| diff map

@Dogway
Copy link
Author

Dogway commented Jun 16, 2022

In the images that you attached, I don't think there is any unexpected issue with problematic jpeg decoding. While there might be slight differences, they can be explained by the fact that libjxl is using floats throughout and libjpeg-turbo/mozjpeg are doing less precise computation.

We showed before that mozjpeg float decoding against JXL was still a big difference. And that RGB float conversion wasn't a meaningful change either. My suspect was the iDCT coefficients. Maybe I can give them a try and see if the problem lies there. Do you know what file I should be looking at?
By the way, just tested direct encoding (-q 95) to JXL for this image and the decoded also showed less sharpness score. This wasn't apparent though on my first image example, this case here is extreme (stars and high saturation).

@Dogway
Copy link
Author

Dogway commented Jun 20, 2022

Reading the JXL paper there are many steps where a low-pass is performed. I don't know if anyone is involved in lossless jpeg transcodes though.

Color correlation

In particular, the default correlation factor is 0 for the X channel and 0.935669 for the B channel: this allows
to transmit luma information using the Y channel only.
However, these factors are not necessarily optimal on areas of the image with a strong chroma component
(such as, for example, fully red regions). Therefore, JPEG XL allows altering the correlation factors locally, for
tiles of 64 × 64 pixels, with a dedicated control field.

DC handling

To further reduce banding in areas with steeper gradients, JPEG XL applies a selective smoothing algorithm
to the DC image, that is only allowed to move values inside their quantization boundaries. If the smoothed value
would be outside of the quantization boundaries, it is discarded, and the original is used.

LF predictions

After applying the a smoothing algorithm similar to the one used on the DC image, the upsampled DC is
converted back to DCT coefficients; the low-frequency values that are produced this way are then added to the
encoded low-frequency coefficients.

Loop filters

Despite the significant improvements that JPEG XL delivers to reduce artefacts, block boundaries and ringing
can still be noticed, especially at somewhat lower qualities. Ringing artefacts are spurious signals near sharp
edges, caused by quantizing or truncating high-frequency components (see Figure 7). To mitigate their impact,
JPEG XL employs two different loop filters that are applied to the image after the decompression process.
The first loop filter is a smoothing convolution. As smoothing the image inherently introduces a sharpness
loss, this effect is compensated by the encoder by means of a sharpening filter that is applied before the DCT
step. The overall effect of this procedure is that the visual impact of block boundaries gets reduced, while still
preserving sharp details present in the original image.
The second loop filter is intended to reduce ringing, while still preserving texture that is transmitted in the
image. To achieve this effect, it applies an adaptive smoothing algorithm related to Non-Local Means

What I do know is JPEG XL does things differently at the decoding stage as depicted by the slider from @jonsneyers.
Like I said in my above post unless we know what kind of dequantization coefficients it uses we are at a moot point.

As a reference I have been using very good coefficients in mozjpeg that achieves better qual/size ratio than any of the internal presets:

# with -q 82 achieves a better qual/ratio than plain -q 95
6   6   7   9  11  14  14  14
6   6   7   9  12  14  14  14
7   7   8  10  12  14  15  15
9   9  10  12  14  16  17  17
11  12  12  14  17  19  20  20
14  14  14  16  19  21  22  22
14  14  15  17  20  22  23  23
14  14  15  17  20  22  23  23

@mo271
Copy link
Member

mo271 commented Jun 20, 2022

@veluca93 pointed out that the quant bias, that we have, see here:

static constexpr float kDefaultQuantBias[4] = {

leads to some additional numerical inaccuracies wenn decompressing lossless recompressed jpgs to pixels.

I gave that a try and hardcoded the values in that struct to be {1.0f, 1.0f, 1.0f, 0.f} and then together with the flag --jpeg_transcode_disable_cfl, the numerical values tests similar to the ones above, are already much closer to what mozjpeg provides.

Do you think we should either somehow turn the quant bias of in the jpeg-decoding case, or introduce another flag to control this, @jonsneyers and @veluca93?

@jonsneyers
Copy link
Member

Where does the quant bias get used in the case of jpeg recompression? I thought this was an encode thing (which wouldn't apply to jpeg recompression since there the coeffs are quantized already), but this looks like it is a decode thing? Is this used for AC or for DC or both? Is it signaled?

If it causes the decoded image to be closer to the original image (as my little test with mae/psnr suggested), I don't think we should disable it by default. Getting pixels that are closer to the original seems to me to be more valuable than getting pixels that are closer to the decode result of libjpeg-turbo. Could you do a test like the one I did (#1470 (comment)) to see how it affects the pae/mae/psnr w.r.t. the original image?

@veluca93
Copy link
Member

Where does the quant bias get used in the case of jpeg recompression? I thought this was an encode thing (which wouldn't apply to jpeg recompression since there the coeffs are quantized already), but this looks like it is a decode thing? Is this used for AC or for DC or both? Is it signaled?

It is signaled. AC only, and actually decode only :)

@Dogway
Copy link
Author

Dogway commented Jun 20, 2022

Thanks for the tests @mo271, maybe the bell-shape distribution (looks like T-distribution but didn't check) causes the noted smoothness.

I collected a bunch of DCT quant coefs and the one posted gave me the best results on mozjpeg (had to disable trellis as it was also causing blur). It follows an equation by Jukka Ripatti.
I don't know if it can also be applied to AC after a source quality estimation. At worst an option should come handy for testing purposes for the time being.

@jonsneyers, I don't think those metrics are suited for sharpness evaluation which is the main case here. We can run log variance as a sharpness metric or the other ones linked (#1470 (comment))

@ValiumBear
Copy link

I also noticed the same issue with djxl decoding of jpeg recompressed images. When the image is not decoded back to jpg the problem occurs.

cjxl input.jpg output.jxl djxl output.jxl output.png

When decoded, there's slight blur. This is typical for lossy jxl encoding with default settings.
It seems like djxl is not adding the jpeg reconstruction data when decoding losslessly to non-jpg.
This same behavior is also present in image viewers like IrfanView and ImageGlass.

When decoded back to jpg instead of png, it's 100% lossless and matches hashes with original jpg.

@Dogway
Copy link
Author

Dogway commented Jun 20, 2023

For the one year anniversary of the issue I gave the last dev build a whirl and the output in pixels of the lossless transcoded jpg is still as blurry as it was a year ago, actually a bit blurrier since now I cannot use --jpeg_transcode_disable_cfl.

This shocks me as the solution was given back then.

So I wonder if there's a problem in implementing the fix or at least an user defined quant table, as I believe this is being a showstopper for mass adoption of the format. I had hopes this was something else than JPEG 2000 or JPEG XR.

I improved the sharpness function a bit to be more granular with little details and give more weight to the texture and not to the edges. These are the results.

JXL Sharpness Map, JPEG Turbo Sharpness Map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unrelated to 1.0 Things that need not be done before the 1.0 version milestone
Projects
None yet
Development

No branches or pull requests

7 participants