Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF2.errors.PdfReadError: EOF marker not found #134

Closed
alisufian opened this issue Sep 1, 2014 · 4 comments
Closed

PyPDF2.errors.PdfReadError: EOF marker not found #134

alisufian opened this issue Sep 1, 2014 · 4 comments
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness PdfReader The PdfReader component is affected

Comments

@alisufian
Copy link

alisufian commented Sep 1, 2014

The following script originally hanged, but with PyPDF2==2.4.2 we get PdfReadError: EOF marker not found.

MCVE: PDF + Code

This file is 298MB with 21 pages.

from PyPDF2 import PdfReader

reader = PdfReader("01-006 2009-04-30 FRP NON CONFIDENTIAL PAP FILING.PDF")

Traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 267, in __init__
    self.read(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1218, in read
    raise PdfReadError("EOF marker not found")
PyPDF2.errors.PdfReadError: EOF marker not found
@alisufian alisufian changed the title getNumPages() hangs on this file. PdfFileReader hangs on this file. Sep 3, 2014
@MartinThoma MartinThoma added PdfReader The PdfReader component is affected is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Apr 8, 2022
@MartinThoma MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Apr 18, 2022
@MartinThoma
Copy link
Member

MartinThoma commented Apr 19, 2022

PyPDF2==1.27.7 gives:

Traceback (most recent call last):
  File "/home/moose/foo.py", line 3, in <module>
    reader = PdfFileReader("01-006 2009-04-30 FRP NON CONFIDENTIAL PAP FILING.PDF")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 1208, in __init__
    self.read(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 1828, in read
    line = self.readNextEndLine(stream, last1K)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2078, in readNextEndLine
    raise PdfReadError("Could not read malformed PDF file")
PyPDF2.errors.PdfReadError: Could not read malformed PDF file

Also with strict=False

that is the part:

            # Prevent infinite loops in malformed PDFs
            if stream.tell() == 0 or stream.tell() == limit_offset:
                raise PdfReadError("Could not read malformed PDF file")

@MartinThoma MartinThoma added is-robustness-issue From a users perspective, this is about robustness and removed is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Apr 19, 2022
@MartinThoma MartinThoma changed the title PdfFileReader hangs on this file. PyPDF2.errors.PdfReadError: EOF marker not found Jul 9, 2022
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Jul 24, 2022
a) cmap : strip lines when processing cmap from fonts
b) look for %EOF up to beginning of file
@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jul 24, 2022

the PDF is very odd... about 100 MB of null characters
after removing the threshold to look for the end marker, it can be opened. also some lines in the cmap needs to be stripped....

MartinThoma pushed a commit that referenced this issue Jul 25, 2022
See #134

a) cmap : strip lines when processing cmap from fonts
b) look for %EOF up to beginning of file
MartinThoma added a commit that referenced this issue Jul 25, 2022
Bug Fixes (BUG):
-  u_hash in AlgV4.compute_key (#1170)

Robustness (ROB):
-  Fix loading of file from #134 (#1167)
-  Cope with empty DecodeParams (#1165)

Documentation (DOC):
-  Typo in warning message (#1166)

Maintenance (MAINT):
-  Package updates; solve mypy strict remarks (#1163)

Testing (TST):
-  Add test from #325 (#1169)

Full Changelog: 2.8.0...2.8.1
@pubpub-zz
Copy link
Collaborator

@MartinThoma,
this issue should be release, isn't it ?

@MartinThoma
Copy link
Member

Thank you for the reminder :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness PdfReader The PdfReader component is affected
Projects
None yet
Development

No branches or pull requests

3 participants