Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'PdfFileWriter' object has no attribute 'stream' #670

Closed
rtibbles opened this issue Apr 7, 2022 · 10 comments · Fixed by #787
Closed

AttributeError: 'PdfFileWriter' object has no attribute 'stream' #670

rtibbles opened this issue Apr 7, 2022 · 10 comments · Fixed by #787
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@rtibbles
Copy link

rtibbles commented Apr 7, 2022

I am very happy to see that PyPDF2 is being maintained and released again - however, it seems that the latest release appears to contain a regression. We had an unpinned dependency on PyPDF2 in our library, and when 1.27.0 was released, we started seeing this error:

../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:482: in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:572: in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:572: in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:557: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:572: in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:589: in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:589: in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:548: in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <PyPDF2.pdf.PdfFileWriter object at 0x7f3f9fe45460>
externMap = {<ricecooker.utils.pdf.CustomPDFReader object at 0x7f3f9fec98b0>: {0: {116: IndirectObject(7, 0), 118: IndirectObject(6, 0), 119: IndirectObject(3, 0), 120: IndirectObject(5, 0)}}}
data = IndirectObject(8, 0)

    def _sweepIndirectReferences(self, externMap, data):
        debug = False
        if debug: print((data, "TYPE", data.__class__.__name__))
        if isinstance(data, DictionaryObject):
            for key, value in list(data.items()):
                origvalue = value
                value = self._sweepIndirectReferences(externMap, value)
                if isinstance(value, StreamObject):
                    # a dictionary value is a stream.  streams must be indirect
                    # objects, so we need to change this value.
                    value = self._addObject(value)
                data[key] = value
            return data
        elif isinstance(data, ArrayObject):
            for i in range(len(data)):
                value = self._sweepIndirectReferences(externMap, data[i])
                if isinstance(value, StreamObject):
                    # an array value is a stream.  streams must be indirect
                    # objects, so we need to change this value
                    value = self._addObject(value)
                data[i] = value
            return data
        elif isinstance(data, IndirectObject):
            # internal indirect references are fine
            if data.pdf == self:
                if data.idnum in self.stack:
                    return data
                else:
                    self.stack.append(data.idnum)
                    realdata = self.getObject(data)
                    self._sweepIndirectReferences(externMap, realdata)
                    return data
            else:
>               if data.pdf.stream.closed:
E               AttributeError: 'PdfFileWriter' object has no attribute 'stream'

../../.virtualenvs/ricecooker_clean/lib/python3.9/site-packages/PyPDF2/pdf.py:575: AttributeError

It seems to be entirely internal to PyPDF2, so might be causing some issues for others as well.

@MartinThoma
Copy link
Member

Damn. Do you have a pdf + code to share that shows this issue?

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 8, 2022
@Daniel-Steinberger
Copy link

This issue is also discussed here: https://stackoverflow.com/questions/40168027/pypdf2-pdffilewriter-has-no-attribute-stream
I'll try to create a simplified example to reproduce this next week.

@MartinThoma MartinThoma changed the title Regression in latest release AttributeError: 'PdfFileWriter' object has no attribute 'stream' Apr 11, 2022
@Daniel-Steinberger
Copy link

pypdf2_files.zip

  1. % python3 -m venv venv
  2. % source ./venv/bin/activate
  3. [venv] % pip install pypdf2==1.26.0
  4. [venv] % python test.py

1.26.0

  1. [venv] % pip install pypdf2==1.27.3
  2. [venv] % python test.py

1.27.3
Traceback (most recent call last):
File "test.py", line 39, in
main()
File "test.py", line 35, in main
split(pdf_data, split_info)
File "test.py", line 22, in split
pdf_writer.write(tmp_io)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 480, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 557, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 589, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/home/dst/Desktop/pypdf2/venv/lib/python3.8/site-packages/PyPDF2/pdf.py", line 575, in _sweepIndirectReferences
if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'

@MartinThoma MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Apr 20, 2022
@MartinThoma
Copy link
Member

This is a simplified version:

#!/usr/bin/env python3

from PyPDF2 import PdfFileReader, PdfFileWriter, __version__ as pypdf_version


print(pypdf_version)

reader = PdfFileReader("crazyones.pdf", strict=False, overwriteWarnings=False)
for _ in range(2):
    pdf_writer = PdfFileWriter()
    page = reader.getPage(0)
    print(page)
    pdf_writer.addPage(page)
    with open("foo.pdf", "wb") as f_pdf:
        pdf_writer.write(f_pdf)

The example works with the crazyones.pdf example in Ressources:

python foo.py
1.27.2
{'/Resources': IndirectObject(8, 0), '/Type': '/Page', '/Parent': IndirectObject(10, 0), '/Contents': [IndirectObject(7, 0)], '/MediaBox': [0, 0, 612, 792]}
{'/Resources': IndirectObject(5, 0), '/Type': '/Page', '/Parent': IndirectObject(1, 0), '/Contents': [IndirectObject(20, 0)], '/MediaBox': [0, 0, 612, 792]}
Traceback (most recent call last):
  File "foo.py", line 15, in <module>
    pdf_writer.write(f_pdf)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 480, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 557, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/site-packages/PyPDF2/pdf.py", line 575, in _sweepIndirectReferences
    if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'

It's interesting that reading the same page twice gives different results

@MartinThoma
Copy link
Member

MartinThoma commented Apr 20, 2022

It works with PyPDF2==1.26.0, but breaks with PyPDF2==1.27.0. More specifically:

@MartinThoma
Copy link
Member

ce5f7ec - mixed tabs / spaces

@MartinThoma
Copy link
Member

b030b7f - worked

@MartinThoma
Copy link
Member

Here is what broke it: b030b7f...26e5077

@MartinThoma
Copy link
Member

Removing this from PdfFileWriter._sweepIndirectReferences would remove the exception you're encountering:

if data.pdf.stream.closed:
    raise ValueError("I/O operation on closed file: {}".format(data.pdf.stream.name))

@rtibbles
Copy link
Author

Apologies for not seeing any of the follow up here - very glad to hear that this is fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants