-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mostly broken watermarking behaviour #2112
Comments
Doing some further testing, it seems like this is (mostly) unrelated to the actual PDF file and can be reproduced in even more cases. Running Ghostscript on the corresponding file beforehand seems to reliably fix this behavior and allows displaying/printing with the watermark being visible, but imposes some unnecessary overhead and rewrites the complete PDF file (maybe changing the PDF quality itself as well). |
I further investigated this issue and came to the conclusion that watermarking in PyPDF is somehow completely broken. Why? Because looking at the result of the official test, the generated PDF file shows the same behavior: Lines 1519 to 1535 in f16f434
I could reproduce this on multiple Python 3.9 installations as well as on a completely different setup with a Python 3.10.12 installation. In my opinion, this behavior cannot be tested with a simple file size check, but needs proper visual testing as far as it is possible. For the current problem, just running Ghostscript (which is what ImageMagick uses) on the PDF file will make it look valid again, but throw away the watermark. Additionally, there are lots of warnings emitted which seem to be related to the issue:
For now, doing an image comparison based upon ImageMagick/Ghostscript should at least have discovered the current issue. |
This apparently has been fixed by #2086 without being aware of that, but a corresponding integration test should still be added. |
Do you want to attempt that? |
You mentioned something about an existing private test, so I assumed that you would have a look at this.If desired, I could submit a PR tomorrow. My current (rough) testing code: pytest.mark.enable_socket()
def test_watermark_rendering(tmp_path):
url = "https://github.com/py-pdf/pypdf/files/11985889/bg.pdf"
name = "bgwatermark.pdf"
watermark = PdfReader(BytesIO(get_data_from_url(url, name=name))).pages[0]
url = "https://github.com/py-pdf/pypdf/files/11985888/source.pdf"
name = "srcwatermark.pdf"
page = PdfReader(BytesIO(get_data_from_url(url, name=name))).pages[0]
writer = PdfWriter()
page.merge_page(watermark, over=False)
writer.add_page(page)
target_png_path = tmp_path / "target.png"
url = "https://github.com/stefan6419846/pypdf/assets/96178532/2646f89f-f7a9-494d-8bdf-5331c58ed668"
name = "dstwatermark.png"
target_png_path.write_bytes(get_data_from_url(url, name=name))
pdf_path = tmp_path / "out.pdf"
png_path = tmp_path / "out.png"
writer.write(pdf_path)
ghostscript_binary = shutil.which("gs")
assert ghostscript_binary is not None
# https://github.com/PyCQA/bandit/issues/333
subprocess.run([ghostscript_binary, "-sDEVICE=pngalpha", "-o", png_path, pdf_path]) # noqa: S603
assert png_path.is_file()
image_similarity(png_path, target_png_path) As long as Ghostscript versions stay consistent, there should be no real issue (I generated the file with version 9.52, CI uses 9.50, but larger version differences might lead to rendering differences). |
@stefan6419846 |
This is what my testing code is basically doing, although being in my fork for the time of testing to not spam the issues here. |
Reference file for testing purposes: https://github.com/py-pdf/pypdf/assets/96178532/d5c72d0e-7047-4504-bbf6-bc591c80d7c0 |
This adds a test for the correct rendering of watermarked files. Closes #2112
I am trying to add a watermark to a PDF file based on the official docs, which seems to create issues with different PDF viewers.
By default, the watermark is upside down and shifted somehow. Introducing
page.transfer_rotation_to_content()
(which is currently hidden away in the API docs of the PageObject) will fix this issue.Nevertheless, it seems like there is some additional stuff going wrong as well, as only the Firefox browser (and plain pdf.js, the PDF viewer implementation used by Firefox) are able to view the watermark, while during my tests, Chromium, Evince, Edge, MuPDF and Adobe Acrobat only show a blank page (or the non-watermarked version if the input file would not have had a blank page).
Environment
Which environment were you using when you encountered the problem?
$ python -m platform Linux-5.14.21-150400.24.81-default-x86_64-with-glibc2.31 $ python -c "import pypdf;print(pypdf.__version__)" 3.15.2
Code + PDF
This is a minimal, complete example that shows the issue:
The plain test files are: watermark.pdf out1.pdf marked.pdf
Traceback
There is no traceback.
The text was updated successfully, but these errors were encountered: