Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YCCK JPEG from a PDF with wrong colors #878

Closed
Schmidor opened this issue Dec 14, 2023 · 5 comments
Closed

YCCK JPEG from a PDF with wrong colors #878

Schmidor opened this issue Dec 14, 2023 · 5 comments

Comments

@Schmidor
Copy link
Contributor

Describe the bug
The attached JEPG has wrong colors when rendered with TwelveMonkeys. Color space should be YCCK.

The image is extracted from a PDF in https://issues.apache.org/jira/browse/PDFBOX-5488
Note that there might be another underlying issue in PDFBox, when read.readRaster(...) is used and the color space is read from the metadata (https://issues.apache.org/jira/browse/PDFBOX-5738), which causes problems with the workaround from PDFBOX-5488. We might need some hints there.

Version information

  1. The version of the TwelveMonkeys ImageIO library in use.
    3.10.1

  2. The exact output of java --version (or java -version for older Java releases).
    openjdk 17.0.9 2023-10-17 LTS
    OpenJDK Runtime Environment Zulu17.46+19-CA (build 17.0.9+8-LTS)
    OpenJDK 64-Bit Server VM Zulu17.46+19-CA (build 17.0.9+8-LTS, mixed mode, sharing)

Sample file(s)
gre_research_validiity_data_page1

Screenshots
image

@haraldk
Copy link
Owner

haraldk commented Dec 15, 2023

Curious...

Thanks for the sample and report! I'll investigate why we get different colors than libJPEG...

@haraldk
Copy link
Owner

haraldk commented Dec 15, 2023

Okay... This sample file has an APP14/Adobe marker specifying YCCK, however the "Adobe" string is followed by a 0x01 "Start of Header" character instead of the normal NULL-termination 0x00... So we don't really see it as an Adobe marker but an "Adobe\0x01" custom marker that we ignore...

We could probably change the identifier parsing to stop at the first ASCII control character (<= 0x20) instead of the NULL only. But I'm not sure the sample file is strictly correct...

@haraldk
Copy link
Owner

haraldk commented Dec 15, 2023

...or maybe I'm just wrong in assuming that every JPEG marker identifier string should be 0-terminated...

Supporting the DCT Filters in PostScript Level 2 in section 18 Adobe Application-Specific JPEG Marker actually specifies only the characters 'A', 'd', 'o', 'b' and 'e' followed by a two byte version... While I have always thought the version was only a single byte.. 😮

If this is indeed correct, then I guess I need to rewrite a lot of the JPEG marker parsing. 😛

@haraldk
Copy link
Owner

haraldk commented Dec 16, 2023

Hmm.. Thinking about it some more, I think (the authors of) the encoder that wrote the file actually just misread the spec. My guess is that they tried to write an App14/Adobe marker with version 1.00, but encoded it as 0x0100 (256) instead of the correct 0x0064 (100)... I think the most recent version is either 1.01 or 1.02, there really isn't any 2.56 version around...

Anyway, I should still handle this, as the spec says two byte version, and not 0-termination...

@Schmidor
Copy link
Contributor Author

Schmidor commented Dec 18, 2023

Thank you. I think that solves our problem with the PDF.
There are many "interesting" variations in PDF writers 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants