Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cmyk support #756

Merged
merged 18 commits into from
Apr 26, 2023
Merged

Add cmyk support #756

merged 18 commits into from
Apr 26, 2023

Conversation

devdev29
Copy link

@devdev29 devdev29 commented Apr 7, 2023

Add CMYK support to image conversion in pdf

tries to fix #711

I've just added a basic condition in the image parser file to check if the mode is specified to be CMYK and to convert the image to the appropriate format if that is the case. This is my first time working with CMYK so I'd appreciate it if the reviewer could tell me whether I need to treat the Key channel in CMYK like the Alpha channel in RGBA.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged, this should probably be a draft pr

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@devdev29 devdev29 requested a review from Lucas-C as a code owner April 7, 2023 08:05
@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Patch coverage: 16.66% and project coverage change: -0.17 ⚠️

Comparison is base (c55f1e3) 93.25% compared to head (a38a25c) 93.08%.

❗ Current head a38a25c differs from pull request most recent head 7049d98. Consider uploading reports for the commit 7049d98 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #756      +/-   ##
==========================================
- Coverage   93.25%   93.08%   -0.17%     
==========================================
  Files          27       27              
  Lines        7312     7293      -19     
  Branches     1322     1324       +2     
==========================================
- Hits         6819     6789      -30     
- Misses        308      315       +7     
- Partials      185      189       +4     
Impacted Files Coverage Δ
fpdf/image_parsing.py 81.97% <16.66%> (-1.80%) ⬇️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Lucas-C
Copy link
Member

Lucas-C commented Apr 7, 2023

Hi @devdev29 and welcome! 😊

This seems promising!

On fpdf2 we always add some unit tests when introducing new features:
https://pyfpdf.github.io/fpdf2/Development.html#testing

I think we currently do not have any CMYK image among fpdf2 reference files,
but you could use the 2nd image from this SO answer: https://stackoverflow.com/a/59854479/636849
You could add it to the test/image/image_types/ folder and create a unit test function in test/image/image_types/test_insert_images.py.

This is my first time working with CMYK so I'd appreciate it if the reviewer could tell me whether I need to treat the Key channel in CMYK like the Alpha channel in RGBA.

I have to admit that I don't know much neither about CMYK 😅
I have just learned there is no such thig as PNG images with CMYK,
and that TIFF images with CMYK are very rare / mostly unsupported.

Regarding your code and your question, the way you handle the 4 channels of CMYK seems fine to me.

@devdev29
Copy link
Author

devdev29 commented Apr 7, 2023

Ohk then, I was going to write a test for the addition but I just wanted to confirm whether I was approaching the issue correctly, I'll be sure to add the test by today evening, also about that CMYK image file I just converted an existing jpeg to cmyk using pillow for a test script so I'll add that to the image_types folder I'll just use the SO image instead, it seems more appropriate 😅

@devdev29
Copy link
Author

devdev29 commented Apr 8, 2023

So I wrote the test for inserting the image but the test seems to be generating a different hash for every new pdf file I try even though the each file is the exact same, another issue I've run into is that only TIFF files in the cmyk format work jpegs produce weird effects when inserted into the document. I'll try to fix the jpeg issue asap but I don't know why the test is failing.

@Lucas-C
Copy link
Member

Lucas-C commented Apr 10, 2023

So I wrote the test for inserting the image but the test seems to be generating a different hash for every new pdf file I try even though the each file is the exact same

That is strange...
Could you share the code that you are using?

another issue I've run into is that only TIFF files in the cmyk format work jpegs produce weird effects when inserted into the document. I'll try to fix the jpeg issue asap but I don't know why the test is failing

Alright.
If you provide some code I'll try to reproduce the problems you have faced myself, and see if I can investigate!

@gmischler
Copy link
Collaborator

This is my first time working with CMYK so I'd appreciate it if the reviewer could tell me whether I need to treat the Key channel in CMYK like the Alpha channel in RGBA.

fpdf2 doesn't currently support pixel transparency, so the A(lpha) channel gets dropped (as eg. in converting from RGBA to RGB).
In contrast to that, the "K" channel in a CYMK is just the color value for the black ink. There is no reason to treat it differently from the others, and you should not need to convert those files in any way.

@devdev29
Copy link
Author

devdev29 commented Apr 13, 2023

So I wrote the test for inserting the image but the test seems to be generating a different hash for every new pdf file I try even though the each file is the exact same

That is strange... Could you share the code that you are using?

Sure here's the code - https://pastebin.com/cu0STbcq

edit - P.S. - sorry for the late reply got a bit busy with school 😅

@gmischler
Copy link
Collaborator

fpdf2 doesn't currently support pixel transparency

Actually fpdf2 DOES support pixel transparency, I was confused about the context of some related code in _to_data().

But still, your code addition in the same place:

    if img.mode == "CMYK":
        img = img.convert("CMYK")

is a null-operation that wastes time for doing nothing.

@devdev29
Copy link
Author

fpdf2 doesn't currently support pixel transparency

Actually fpdf2 DOES support pixel transparency, I was confused about the context of some related code in _to_data().

But still, your code addition in the same place:

    if img.mode == "CMYK":
        img = img.convert("CMYK")

is a null-operation that wastes time for doing nothing.

Yea I realised that a few days after the commit and removed the code, but thanks anyway for pointing it out :), I've also fixed the jpg issue, turns out the condition check in the image parsing file which extracts the info for images was setting the channel and other info for cmyk images the same as rgb, which seemed to be running it through the wrong decode array in output.py. The test still fails though :( I'll try and see where I'm going wrong.

@Lucas-C
Copy link
Member

Lucas-C commented Apr 13, 2023

Sure here's the code - pastebin.com/cu0STbcq

edit - P.S. - sorry for the late reply got a bit busy with school 😅

Thank you, and no worries 😊

I'd be happy to have a look, but could you please add your reference insert_images_insert_tiff_cmyk.tiff image file to the PR?

@Lucas-C
Copy link
Member

Lucas-C commented Apr 14, 2023

I have tried to reproduce your issue using the following code:

#!/usr/bin/env python3
from datetime import datetime, timezone
from fpdf import FPDF

pdf = FPDF()
pdf.creation_date = datetime(1969, 12, 31, 19, 00, 00).replace(tzinfo=timezone.utc)
pdf.add_page()
pdf.image("flower-cmyk.jpg")  # from https://stackoverflow.com/a/59854479/636849
pdf.add_page()
pdf.image("test/image/image_types/test.tiff", w=pdf.epw)
pdf.output("issue_756.pdf")

And I get a consistant sha256 hash:

$ ./issue_756.py && sha256sum issue_756.pdf
57d5d2a1c83431155469a11a338cc231d06b4154a6c81c5f2a4a144079fada3a  issue_756.pdf

Note: I have latest Pillow==9.5.0 package installed.

So I guess your problem may be with the insert_images_insert_tiff_cmyk.tiff image that you used @devdev29?

The good news is that the CMYK JPEG image seems to be correctly embedded in the PDF!
At least the PDF viewers that I use display it ^^

@devdev29
Copy link
Author

devdev29 commented Apr 14, 2023

So I guess your problem may be with the insert_images_insert_tiff_cmyk.tiff image that you used @devdev29?

Yep either that or there's some issue with my pillow installation.

The good news is that the CMYK JPEG image seems to be correctly embedded in the PDF! At least the PDF viewers that I use display it ^^

That's great! It's working on my pdf viewers as well. I'll just add the test to the PR and then it can be merged! 🎉

@Lucas-C
Copy link
Member

Lucas-C commented Apr 24, 2023

That's great! It's working on my pdf viewers as well. I'll just add the test to the PR and then it can be merged! 🎉

Hi @devdev29!

How is it going regarding this PR? 😊

@devdev29
Copy link
Author

Hi @devdev29!

How is it going regarding this PR? 😊

Hi Lucas, sorry for abandoning the PR so suddenly, I'm still facing some issues with CMYK TIFFs, will definitely fix them and make a final commit by tomorrow. Sorry for taking so long on such a simple PR.

@Lucas-C
Copy link
Member

Lucas-C commented Apr 24, 2023

Hi Lucas, sorry for abandoning the PR so suddenly, I'm still facing some issues with CMYK TIFFs, will definitely fix them and make a final commit by tomorrow. Sorry for taking so long on such a simple PR.

No worries!

I'm just happy if you plan to finish this valuable work 😊

@devdev29
Copy link
Author

Heya Lucas, I've figured out the problem with the TIFFs and JPGs, apparently the dct decode stream for CMYK JPEGs is inverted and the decode array needs to be inverted from the default for CMYK i.e. [0 1 0 1 0 1 0 1] to [1 0 1 0 1 0 1 0] but needs to remain at the default value for TIFFs so I created a flag that check if the image is a CMYK JPEG and inverts the decode array in the output.py file accordingly. Now both CMYK TIFFs and JPEGs can be inserted into pdfs using fpdf!

Here's the answer that made me realise what was going on with the different image formats, hope you find it helpful -
https://graphicdesign.stackexchange.com/questions/12894/cmyk-jpegs-extracted-from-pdf-appear-inverted

@Lucas-C
Copy link
Member

Lucas-C commented Apr 26, 2023

Well done @devdev29!
Congratulations

The GitHub Actions CI pipeline currently fails,
but it's just due to some minor issue with the new reference PDF files.

That should be easily resolved by running the new unit tests once while passing generate=True to the call to assert_pdf_equal()

If you have time to fix, go for it.
Otherwise I can merge this PR and perform the fix afterwards

@devdev29
Copy link
Author

All done and dusted.
Thanks @Lucas-C for being so patient with me, this was a great experience, look forward to contributing more to fpdf. Cheers🎉

@Lucas-C
Copy link
Member

Lucas-C commented Apr 26, 2023

Merged!
Thank you for your contribution @devdev29 👍

@Lucas-C Lucas-C merged commit e79eeff into py-pdf:master Apr 26, 2023
9 of 10 checks passed
@Lucas-C
Copy link
Member

Lucas-C commented Apr 26, 2023

@allcontributors please add @devdev29 for code

@allcontributors
Copy link

@Lucas-C

I've put up a pull request to add @devdev29! 🎉

@devdev29 devdev29 deleted the add_cmyk_support branch April 27, 2023 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DeviceCMYK images ColorSpace
3 participants