Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream has ended unexpectedly #454

Closed
simonsteiner1984 opened this issue Sep 13, 2018 · 5 comments · Fixed by #1223
Closed

Stream has ended unexpectedly #454

simonsteiner1984 opened this issue Sep 13, 2018 · 5 comments · Fixed by #1223
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@simonsteiner1984
Copy link

simonsteiner1984 commented Sep 13, 2018

out.pdf

"PyPDF2/generic.py", line 334, in readStringFromStream
    raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

with this code:

import StringIO
from PyPDF2 import PdfFileWriter, PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4

input_pdf = '../out.pdf'

ifh = file(input_pdf, 'rb')
inv_input = PdfFileReader(ifh, strict=False)

packet = StringIO.StringIO()

can = canvas.Canvas(packet, pagesize=A4)
can.drawString(10, 10, 'ADDED INFO')
can.save()

packet.seek(0)
added_info = PdfFileReader(packet)

output = PdfFileWriter()

page_count = 1
for x in range(page_count):
    page = inv_input.getPage(x)
    page.mergePage(added_info.getPage(0))
    output.addPage(page)

file_stream = file('../pypdf.pdf', 'wb')
output.write(file_stream)
file_stream.close()
@vdmitriyev
Copy link

I experience the same error for some of my PDF files

  File "/.venv/lib/python3.6/site-packages/PyPDF2/utils.py", line 134, in readUntilRegex
    raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Apr 7, 2022
@MartinThoma
Copy link
Member

I can confirm for PyPDF2==1.27.7 with the above example:

Traceback (most recent call last):
  File "/home/moose/foo.py", line 22, in <module>
    page.mergePage(added_info.getPage(0))
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2423, in mergePage
    self._mergePage(page2)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2460, in _mergePage
    newContentArray.append(PageObject._pushPopGS(
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2384, in _pushPopGS
    stream = ContentStream(contents, pdf)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2871, in __init__
    self.__parseContentStream(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2903, in __parseContentStream
    operands.append(readObject(stream, None))
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 84, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 181, in readFromStream
    arr.append(readObject(stream, pdf))
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 90, in readObject
    return readStringFromStream(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 349, in readStringFromStream
    raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

@MartinThoma
Copy link
Member

MartinThoma commented Apr 19, 2022

Simplifying it a bit:

from PyPDF2 import PdfReader, PdfWriter


def prepare_stream():
    from io import BytesIO

    from reportlab.lib.pagesizes import A4
    from reportlab.pdfgen import canvas

    packet = BytesIO()
    can = canvas.Canvas(packet, pagesize=A4)
    can.drawString(10, 10, "ADDED INFO")
    can.save()
    packet.seek(0)
    return packet


reader = PdfReader("out.pdf", strict=False)
writer = PdfWriter()

page = reader.pages[0]

stream = prepare_stream()
added_info = PdfReader(stream)
to_merge = added_info.pages[0]
page.merge_page(to_merge)
writer.add_page(page)

writer.write("pypdf.pdf")

@MartinThoma
Copy link
Member

I guess the issue is that PyPDF2 tries to read the stream twice - once for getPage and then again for mergePage. If this is the issue, then it is a bug in PyPDF2.

@MartinThoma
Copy link
Member

I can confirm it for PyPDF2==2.4.2:

Traceback (most recent call last):
  File "foo.py", line 26, in <module>
    page.merge_page(to_merge)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 449, in merge_page
    self._merge_page(page2, expand=expand)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 514, in _merge_page
    PageObject._push_pop_gs(original_content, self.pdf)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 384, in _push_pop_gs
    stream = ContentStream(contents, pdf)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1177, in __init__
    self.__parse_content_stream(stream_bytes)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1207, in __parse_content_stream
    operands.append(read_object(stream, None, self.forced_encoding))
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1314, in read_object
    return ArrayObject.read_from_stream(stream, pdf, forced_encoding)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 217, in read_from_stream
    arr.append(read_object(stream, pdf, forced_encoding))
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1318, in read_object
    return read_string_from_stream(stream, forced_encoding)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 434, in read_string_from_stream
    raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Aug 11, 2022
fixes  py-pdf#454
observed in case of  \0 - \9 in streams
MartinThoma pushed a commit that referenced this issue Aug 11, 2022
Observed in case of  \0 - \9 in streams

Closes  #454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants