-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyPDF2.utils.PdfReadError: EOF marker not found #480
Comments
I'm using PyPDF2 every week to merge a few thousands of PDFs and I run into this problem a lot. It's not because I forgot to open the file as binary and the PDF file is not corrupted - when I try opening it with various PDF viewers it works just fine. After some tinkering today I found out a way to troubleshoot this issue for each respective file (not all files that raise this exception are the same). EOF_MARKER = b'%%EOF'
file_name = 'test_EOF_file.pdf'
with open(file_name, 'rb') as f:
contents = f.read()
# check if EOF is somewhere else in the file
if EOF_MARKER in contents:
# we can remove the early %%EOF and put it at the end of the file
contents = contents.replace(EOF_MARKER, b'')
contents = contents + EOF_MARKER
else:
# Some files really don't have an EOF marker
# In this case it helped to manually review the end of the file
print(contents[-8:]) # see last characters at the end of the file
# printed b'\n%%EO%E'
contents = contents[:-6] + EOF_MARKER
with open(file_name.replace('.pdf', '') + '_fixed.pdf', 'wb') as f:
f.write(contents) This way I was able to "fix" all of the PDFs I tried today (5 files) and they were succesfully read by I'll try some more tomorrow and post updates if I learn something new. |
This may be a duplicate of #177 I've seen this happen with a PDF that had more than 1024 extra bytes (comments or null bytes or some such) after the last %%EOF. My solution was to find the last %%EOF in the file and truncate everything after it (and if there is no %%EOF at all then append one). @reportgunner I'm not super familiar with the PDF file format, but it may not be safe to remove or move |
#442 - let's close this issue and keep track of it in the other one |
Try to find “%%EOF” in last 1Mb of file. This fixes the issue with reading Selenium-generated PDF files. Closes py-pdf#177 Closes py-pdf#442 Closes py-pdf#480
Report Generation Successful
Report Downloaded for Project Id: 11166 & Project Name:
Report Download Auth Code and Report Id: {'mat': 'YzBkNTU1M2EtOGFmNy00NTU5LThiYTAtNzlmMDdkNThiODRj', 'id': 22620}
Report Download Successful
Traceback (most recent call last):
File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 114, in
readReport()
File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 101, in readReport
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1084, in init
self.read(stream)
File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1696, in read
raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found
The text was updated successfully, but these errors were encountered: