YCCK JPEG from a PDF with wrong colors #878

Schmidor · 2023-12-14T15:33:48Z

Describe the bug
The attached JEPG has wrong colors when rendered with TwelveMonkeys. Color space should be YCCK.

The image is extracted from a PDF in https://issues.apache.org/jira/browse/PDFBOX-5488
Note that there might be another underlying issue in PDFBox, when read.readRaster(...) is used and the color space is read from the metadata (https://issues.apache.org/jira/browse/PDFBOX-5738), which causes problems with the workaround from PDFBOX-5488. We might need some hints there.

Version information

The version of the TwelveMonkeys ImageIO library in use.
3.10.1
The exact output of java --version (or java -version for older Java releases).
openjdk 17.0.9 2023-10-17 LTS
OpenJDK Runtime Environment Zulu17.46+19-CA (build 17.0.9+8-LTS)
OpenJDK 64-Bit Server VM Zulu17.46+19-CA (build 17.0.9+8-LTS, mixed mode, sharing)

Sample file(s)

Screenshots

The text was updated successfully, but these errors were encountered:

haraldk · 2023-12-15T13:50:54Z

Curious...

Thanks for the sample and report! I'll investigate why we get different colors than libJPEG...

haraldk · 2023-12-15T14:19:13Z

Okay... This sample file has an APP14/Adobe marker specifying YCCK, however the "Adobe" string is followed by a 0x01 "Start of Header" character instead of the normal NULL-termination 0x00... So we don't really see it as an Adobe marker but an "Adobe\0x01" custom marker that we ignore...

We could probably change the identifier parsing to stop at the first ASCII control character (<= 0x20) instead of the NULL only. But I'm not sure the sample file is strictly correct...

haraldk · 2023-12-15T15:59:53Z

...or maybe I'm just wrong in assuming that every JPEG marker identifier string should be 0-terminated...

Supporting the DCT Filters in PostScript Level 2 in section 18 Adobe Application-Specific JPEG Marker actually specifies only the characters 'A', 'd', 'o', 'b' and 'e' followed by a two byte version... While I have always thought the version was only a single byte.. 😮

If this is indeed correct, then I guess I need to rewrite a lot of the JPEG marker parsing. 😛

haraldk · 2023-12-16T16:41:06Z

Hmm.. Thinking about it some more, I think (the authors of) the encoder that wrote the file actually just misread the spec. My guess is that they tried to write an App14/Adobe marker with version 1.00, but encoded it as 0x0100 (256) instead of the correct 0x0064 (100)... I think the most recent version is either 1.01 or 1.02, there really isn't any 2.56 version around...

Anyway, I should still handle this, as the spec says two byte version, and not 0-termination...

Schmidor · 2023-12-18T12:11:33Z

Thank you. I think that solves our problem with the PDF.
There are many "interesting" variations in PDF writers 😄

Schmidor added the Reported bug label Dec 14, 2023

haraldk added the Confirmed bug label Dec 16, 2023

haraldk added a commit that referenced this issue Dec 16, 2023

#878: Now detects APP14/Adobe markers with full 2 byte version

b91d02a

Schmidor closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YCCK JPEG from a PDF with wrong colors #878

YCCK JPEG from a PDF with wrong colors #878

Schmidor commented Dec 14, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 16, 2023

Schmidor commented Dec 18, 2023 •

edited

Loading

YCCK JPEG from a PDF with wrong colors #878

YCCK JPEG from a PDF with wrong colors #878

Comments

Schmidor commented Dec 14, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 15, 2023

haraldk commented Dec 16, 2023

Schmidor commented Dec 18, 2023 • edited Loading

Schmidor commented Dec 18, 2023 •

edited

Loading