Page merge: 'NullObject' object has no attribute 'get_data' #2157

stefan6419846 · 2023-09-06T07:02:50Z

Applying the fix from #2150 on the existing code and using a PDF where remove_text() and remove_images() have been called on raises an error.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.81-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.15.5, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader, PdfWriter

watermark = PdfReader("watermark.pdf").pages[0]

pdf_file = PdfWriter(clone_from="file.pdf")
for page in pdf_file.pages:
    page.merge_page(watermark, over=True)

The watermark is https://github.com/py-pdf/pypdf/files/12428857/watermark.pdf, the cleaned file abc.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/temp/pdf/run1.py", line 7, in <module>
    page.merge_page(watermark, over=True)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 1044, in merge_page
    self._merge_page(page2, over=over, expand=expand)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 1124, in _merge_page
    original_content = self.get_contents()
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 955, in get_contents
    return ContentStream(self[PG.CONTENTS].get_object(), pdf)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 1021, in __init__
    stream_data = stream.get_data()
AttributeError: 'NullObject' object has no attribute 'get_data'

The text was updated successfully, but these errors were encountered:

closes py-pdf#2157

pubpub-zz · 2023-09-06T19:01:16Z

@stefan6419846
PR is available if you want to try it

stefan6419846 · 2023-09-07T07:42:17Z

I can confirm that the PR seems to indeed avoid this issue and generates a correctly watermarked PDF file.

Fixes #2157

MartinThoma · 2023-09-10T08:00:00Z

I've re-opened it as I want to improve the test test_get_contents_from_nullobject: Doing the merge_page shows immediately why returning None is desirable at this point.

Fixes py-pdf#2157 by addressing py-pdf#2157 (comment)

…2524) Fixes #2157 by addressing #2157 (comment)

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 6, 2023

BUG: getcontents() shall return None if contents is NullObject

c84777b

closes py-pdf#2157

pubpub-zz mentioned this issue Sep 6, 2023

BUG: getcontents() shall return None if contents is NullObject #2161

Merged

MartinThoma closed this as completed in #2161 Sep 10, 2023

MartinThoma pushed a commit that referenced this issue Sep 10, 2023

BUG: getcontents() shall return None if contents is NullObject (#2161)

bf62f17

Fixes #2157

MartinThoma reopened this Sep 10, 2023

stefan6419846 added a commit to stefan6419846/pypdf that referenced this issue Mar 16, 2024

TST: Improve test_get_contents_from_nullobject to show real use-case

cac7343

Fixes py-pdf#2157 by addressing py-pdf#2157 (comment)

stefan6419846 mentioned this issue Mar 16, 2024

TST: Improve test_get_contents_from_nullobject to show real use-case #2524

Merged

pubpub-zz closed this as completed in #2524 Mar 16, 2024

pubpub-zz pushed a commit that referenced this issue Mar 16, 2024

TST: Improve test_get_contents_from_nullobject to show real use-case (#…

8ef399a

…2524) Fixes #2157 by addressing #2157 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page merge: 'NullObject' object has no attribute 'get_data' #2157

Page merge: 'NullObject' object has no attribute 'get_data' #2157

stefan6419846 commented Sep 6, 2023

pubpub-zz commented Sep 6, 2023

stefan6419846 commented Sep 7, 2023

MartinThoma commented Sep 10, 2023

Page merge: 'NullObject' object has no attribute 'get_data' #2157

Page merge: 'NullObject' object has no attribute 'get_data' #2157

Comments

stefan6419846 commented Sep 6, 2023

Environment

Code + PDF

Traceback

pubpub-zz commented Sep 6, 2023

stefan6419846 commented Sep 7, 2023

MartinThoma commented Sep 10, 2023