PyPDF2.utils.PdfReadError: EOF marker not found #480

umapathireddy · 2019-01-17T06:06:29Z

python /jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py **** 11166
Report Generation Successful
Report Downloaded for Project Id: 11166 & Project Name:
Report Download Auth Code and Report Id: {'mat': 'YzBkNTU1M2EtOGFmNy00NTU5LThiYTAtNzlmMDdkNThiODRj', 'id': 22620}
Report Download Successful
Traceback (most recent call last):
File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 114, in
readReport()
File "/jenkinsdata/apihub/quality_scan/apihub_apicontent/workspace/Fortify/Fortify.py", line 101, in readReport
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1084, in init
self.read(stream)
File "/var/jenkins_home/.local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1696, in read
raise utils.PdfReadError("EOF marker not found")
PyPDF2.utils.PdfReadError: EOF marker not found

reportgunner · 2019-10-08T15:58:53Z

I'm using PyPDF2 every week to merge a few thousands of PDFs and I run into this problem a lot. It's not because I forgot to open the file as binary and the PDF file is not corrupted - when I try opening it with various PDF viewers it works just fine.

After some tinkering today I found out a way to troubleshoot this issue for each respective file (not all files that raise this exception are the same).

EOF_MARKER = b'%%EOF'
file_name = 'test_EOF_file.pdf'

with open(file_name, 'rb') as f:
    contents = f.read()

# check if EOF is somewhere else in the file
if EOF_MARKER in contents:
    # we can remove the early %%EOF and put it at the end of the file
    contents = contents.replace(EOF_MARKER, b'')
    contents = contents + EOF_MARKER
else:
    # Some files really don't have an EOF marker
    # In this case it helped to manually review the end of the file
    print(contents[-8:]) # see last characters at the end of the file
    # printed b'\n%%EO%E'
    contents = contents[:-6] + EOF_MARKER

with open(file_name.replace('.pdf', '') + '_fixed.pdf', 'wb') as f:
    f.write(contents)

This way I was able to "fix" all of the PDFs I tried today (5 files) and they were succesfully read by PdfFileReader without throwing the exception.

I'll try some more tomorrow and post updates if I learn something new.

markdoliner · 2020-04-03T18:02:34Z

This may be a duplicate of #177

I've seen this happen with a PDF that had more than 1024 extra bytes (comments or null bytes or some such) after the last %%EOF. My solution was to find the last %%EOF in the file and truncate everything after it (and if there is no %%EOF at all then append one).

@reportgunner I'm not super familiar with the PDF file format, but it may not be safe to remove or move %%EOF from other places in the file. I think that string may be used multiple times within the file to indicate the end of some sort of "block," not just the end of the entire file.

MartinThoma · 2022-04-11T05:39:01Z

#442 - let's close this issue and keep track of it in the other one

Try to find “%%EOF” in last 1Mb of file. This fixes the issue with reading Selenium-generated PDF files. Closes #177 Closes #442 Closes #480

Try to find “%%EOF” in last 1Mb of file. This fixes the issue with reading Selenium-generated PDF files. Closes py-pdf#177 Closes py-pdf#442 Closes py-pdf#480

markdoliner mentioned this issue Apr 3, 2020

Make selenium-generated PDF readable #321

Merged

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 9, 2022

MartinThoma closed this as completed Apr 11, 2022

MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Apr 11, 2022

MartinThoma pushed a commit that referenced this issue Apr 21, 2022

BUG: Use 1MB as offset for readNextEndLine (#321)

db1e458

Try to find “%%EOF” in last 1Mb of file. This fixes the issue with reading Selenium-generated PDF files. Closes #177 Closes #442 Closes #480

alejmedinajr mentioned this issue Mar 10, 2024

Machine Learning Model alejmedinajr/AI-Detector#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyPDF2.utils.PdfReadError: EOF marker not found #480

PyPDF2.utils.PdfReadError: EOF marker not found #480

umapathireddy commented Jan 17, 2019

reportgunner commented Oct 8, 2019 •

edited

Loading

markdoliner commented Apr 3, 2020

MartinThoma commented Apr 11, 2022

PyPDF2.utils.PdfReadError: EOF marker not found #480

PyPDF2.utils.PdfReadError: EOF marker not found #480

Comments

umapathireddy commented Jan 17, 2019

reportgunner commented Oct 8, 2019 • edited Loading

markdoliner commented Apr 3, 2020

MartinThoma commented Apr 11, 2022

reportgunner commented Oct 8, 2019 •

edited

Loading