Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page merge: 'NullObject' object has no attribute 'get_data' #2157

Closed
stefan6419846 opened this issue Sep 6, 2023 · 3 comments · Fixed by #2161 or #2524
Closed

Page merge: 'NullObject' object has no attribute 'get_data' #2157

stefan6419846 opened this issue Sep 6, 2023 · 3 comments · Fixed by #2161 or #2524

Comments

@stefan6419846
Copy link
Collaborator

Applying the fix from #2150 on the existing code and using a PDF where remove_text() and remove_images() have been called on raises an error.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.81-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.15.5, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader, PdfWriter

watermark = PdfReader("watermark.pdf").pages[0]

pdf_file = PdfWriter(clone_from="file.pdf")
for page in pdf_file.pages:
    page.merge_page(watermark, over=True)

The watermark is https://github.com/py-pdf/pypdf/files/12428857/watermark.pdf, the cleaned file abc.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/temp/pdf/run1.py", line 7, in <module>
    page.merge_page(watermark, over=True)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 1044, in merge_page
    self._merge_page(page2, over=over, expand=expand)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 1124, in _merge_page
    original_content = self.get_contents()
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 955, in get_contents
    return ContentStream(self[PG.CONTENTS].get_object(), pdf)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 1021, in __init__
    stream_data = stream.get_data()
AttributeError: 'NullObject' object has no attribute 'get_data'
@pubpub-zz
Copy link
Collaborator

@stefan6419846
PR is available if you want to try it

@stefan6419846
Copy link
Collaborator Author

I can confirm that the PR seems to indeed avoid this issue and generates a correctly watermarked PDF file.

@MartinThoma
Copy link
Member

I've re-opened it as I want to improve the test test_get_contents_from_nullobject: Doing the merge_page shows immediately why returning None is desirable at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants