Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF2.utils.PdfReadError: EOF marker not found #480

Closed
umapathireddy opened this issue Jan 17, 2019 · 3 comments
Closed

PyPDF2.utils.PdfReadError: EOF marker not found #480

umapathireddy opened this issue Jan 17, 2019 · 3 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF is-robustness-issue From a users perspective, this is about robustness

Comments

@umapathireddy
Copy link

  • python /jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py **** 11166
    Report Generation Successful
    Report Downloaded for Project Id: 11166 & Project Name:
    Report Download Auth Code and Report Id: {'mat': 'YzBkNTU1M2EtOGFmNy00NTU5LThiYTAtNzlmMDdkNThiODRj', 'id': 22620}
    Report Download Successful
    Traceback (most recent call last):
    File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 114, in
    readReport()
    File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 101, in readReport
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1084, in init
    self.read(stream)
    File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1696, in read
    raise utils.PdfReadError("EOF marker not found")
    PyPDF2.utils.PdfReadError: EOF marker not found
@reportgunner
Copy link

reportgunner commented Oct 8, 2019

I'm using PyPDF2 every week to merge a few thousands of PDFs and I run into this problem a lot. It's not because I forgot to open the file as binary and the PDF file is not corrupted - when I try opening it with various PDF viewers it works just fine.

After some tinkering today I found out a way to troubleshoot this issue for each respective file (not all files that raise this exception are the same).

EOF_MARKER = b'%%EOF'
file_name = 'test_EOF_file.pdf'

with open(file_name, 'rb') as f:
    contents = f.read()

# check if EOF is somewhere else in the file
if EOF_MARKER in contents:
    # we can remove the early %%EOF and put it at the end of the file
    contents = contents.replace(EOF_MARKER, b'')
    contents = contents + EOF_MARKER
else:
    # Some files really don't have an EOF marker
    # In this case it helped to manually review the end of the file
    print(contents[-8:]) # see last characters at the end of the file
    # printed b'\n%%EO%E'
    contents = contents[:-6] + EOF_MARKER

with open(file_name.replace('.pdf', '') + '_fixed.pdf', 'wb') as f:
    f.write(contents)

This way I was able to "fix" all of the PDFs I tried today (5 files) and they were succesfully read by PdfFileReader without throwing the exception.

I'll try some more tomorrow and post updates if I learn something new.

@markdoliner
Copy link

This may be a duplicate of #177

I've seen this happen with a PDF that had more than 1024 extra bytes (comments or null bytes or some such) after the last %%EOF. My solution was to find the last %%EOF in the file and truncate everything after it (and if there is no %%EOF at all then append one).

@reportgunner I'm not super familiar with the PDF file format, but it may not be safe to remove or move %%EOF from other places in the file. I think that string may be used multiple times within the file to indicate the end of some sort of "block," not just the end of the entire file.

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 9, 2022
@MartinThoma
Copy link
Member

#442 - let's close this issue and keep track of it in the other one

@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Apr 11, 2022
MartinThoma pushed a commit that referenced this issue Apr 21, 2022
Try to find “%%EOF” in last 1Mb of file.

This fixes the issue with reading Selenium-generated PDF files.

Closes #177
Closes #442
Closes #480
VictorCarlquist pushed a commit to VictorCarlquist/PyPDF2 that referenced this issue Apr 29, 2022
Try to find “%%EOF” in last 1Mb of file.

This fixes the issue with reading Selenium-generated PDF files.

Closes py-pdf#177
Closes py-pdf#442
Closes py-pdf#480
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

No branches or pull requests

4 participants