PdfReadError: EOF marker not found error when opening pdf files generated from selenium snapshot #177

lovesh · 2015-02-08T14:09:21Z

I am using selenium and Ghost to capture screenshots as pdf. The code for saving screenshot is

driver.get('http://localhost/report/10?page=1')
driver.save_screenshot('page1.pdf')

Now i can open these files in a pdf viewer(I am using Okular) and they look fine. But when i try to open them using this code

from PyPDF2 import PdfFileReader
input1 = PdfFileReader(open("page1.pdf", "rb"))

It gives error PdfReadError: EOF marker not found. The reason i am trying to open this file using PdfFileReader is that i need to merge several pdfs into one and for that i need to open these pdfs. I found a github issue #34 and it says it was resolved but i still face this issue. My pypdf version is 1.24

The text was updated successfully, but these errors were encountered:

abixadamj · 2015-07-31T13:17:41Z

I want to say, that if I read PDF with PdfFileReader, then write with PdfFileWriter, and then read with FileReader once again, I've got:

input_file = PdfFileReader(open("/tmp/zakodowany.pdf", "rb"))

PdfReadError                              Traceback (most recent call last)
/home/adasiek/<ipython-input-12-cb4f4869d7a1> in <module>()
----> 1 input_file = PdfFileReader(open("/tmp/zakodowany.pdf", "rb"))

/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.pyc in **init**(self, stream, strict, warndest, overwriteWarnings)
   1063             stream = BytesIO(b_(fileobj.read()))
   1064             fileobj.close()
-> 1065         self.read(stream)
   1066         self.stream = stream
   1067 

/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.pyc in read(self, stream)
   1665         while line[:5] != b_("%%EOF"):
   1666             if stream.tell() < last1K:
-> 1667                 raise utils.PdfReadError("EOF marker not found")
   1668             line = self.readNextEndLine(stream)
   1669             if debug: print("  line:",line)

PdfReadError: EOF marker not found

rafaelcanovas · 2016-01-31T21:32:55Z

Hi there,

How this issue ended up?

I'm facing the exact same problem right now.

Thank you :)

vivekpd15 · 2017-01-20T10:27:54Z

+1

Try to find “%%EOF” in last 1Mb of file.

Fix py-pdf#177

fractos · 2017-04-04T09:33:36Z

I just hit this problem too. Please could this be fixed in the pip install version soon?

* akolpakov/issue_177: Fix py-pdf#177 Try to find “%%EOF” in last 1Mb of file.

beruic · 2018-02-14T14:27:25Z

Again, this issue would be REALLY nice to have fixed in a pip release.

I'm getting a bunch of auto-generated PDFs from customers, where the %%EOF is not within the last 1 kb, so the fix in PR #321 should be applied.
Not that it is elegant, but the current code is not either, and it would get us onwards.

kut · 2018-12-05T19:18:56Z

same issue here, would be fixed by PR #321...

myleshk · 2019-07-23T07:20:52Z

Same issue, please fix.

joseprieto · 2020-02-20T17:58:20Z

I face the same issue! Anyone has found the way to solve it?

myleshk · 2020-02-26T06:11:33Z

I just use pikepdf to preprocess the PDF file.

from pikepdf import Pdf

def fix_file(filename, input_base_dir):
    file_basename = filename[:-4]
    original_input_file_path = path.join(input_base_dir, filename)
    tmp_output_file_path = path.join(
        input_base_dir, file_basename+".pdf.tmp"
    )
    final_input_file_path = path.join(
        input_base_dir, file_basename+".pdf.old"
    )

    pdf = Pdf.open(original_input_file_path)
    new_pdf = Pdf.new()
    for page_obj in pdf.pages:
        new_pdf.pages.append(page_obj)
    new_pdf.save(tmp_output_file_path)

    rename(original_input_file_path, final_input_file_path)
    rename(tmp_output_file_path, original_input_file_path)
    print(f"Fixed {filename}")

guillaume-uH57J9 · 2020-12-04T20:15:03Z

Hi,
Same issue where with a PDF where %EOF is not within the last 1kb, there's actually about 9.5k of data after %EOF.
The document is an invoice provided by a 3rd party, according to metadata it was generated by "dompdf 0.8.6 + CPDF".

It looks like a fix was submitted and a PR have been pending for some years, is this non-longer maintained?

MartinThoma · 2022-04-07T15:39:15Z

Can somebody share a PDF that has this issue?

guillaume-uH57J9 · 2022-04-14T19:16:20Z

Certainly @MartinThoma, you can download it from there :
https://drop.infini.fr/r/_wVhZCtDBy#npqf9V4POgFy1bzo1FGs3zDdXF/c0IZW9Fti7R0jvEo=

Github throws this error when processing the file "Something went really wrong, and we can't process that file. ", so I had to upload it somewhere else.

Here's how I created the file, in case you want to recreate it locally :

Type "Hello world" in a text editor
Print to PDF
Add some bytes at the end of file file, using command dd if=/dev/zero bs=1024 count=20 >> helloworld.pdf

Obviously this will produce a dumb PDF, but it should be sufficient to reproduce the error.
I've encountered PDF in the wild (invoices, etc) that trigger this same error, but will not share those because they contain personal information.

guillaume-uH57J9 · 2022-04-22T20:34:28Z

Thanks for the merge!

Try to find “%%EOF” in last 1Mb of file. This fixes the issue with reading Selenium-generated PDF files. Closes py-pdf#177 Closes py-pdf#442 Closes py-pdf#480

akolpakov added a commit to akolpakov/PyPDF2 that referenced this issue Feb 6, 2017

Fix py-pdf#177

9c36f77

Try to find “%%EOF” in last 1Mb of file.

akolpakov added a commit to akolpakov/PyPDF2 that referenced this issue Feb 6, 2017

Merge pull request #1 from akolpakov/issue_177

4247ba1

Fix py-pdf#177

reginafcompton mentioned this issue Mar 10, 2017

PDF consolidation Metro-Records/la-metro-councilmatic#109

Closed

vstoykov added a commit to IndustriaTech/PyPDF2 that referenced this issue Jul 21, 2017

Merge remote-tracking branch 'akolpakov/issue_177'

e55f261

* akolpakov/issue_177: Fix py-pdf#177 Try to find “%%EOF” in last 1Mb of file.

O2Graphics mentioned this issue Dec 9, 2019

Fix page count on some PDF files, and fix a Python 3 incompatibility mayan-edms/Mayan-EDMS#8

Closed

markdoliner-doma mentioned this issue Apr 3, 2020

PyPDF2.utils.PdfReadError: EOF marker not found #442

Closed

This was referenced Apr 3, 2020

PyPDF2.utils.PdfReadError: EOF marker not found #480

Closed

Make selenium-generated PDF readable #321

Merged

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected labels Apr 7, 2022

py-pdf deleted a comment from claird Apr 7, 2022

py-pdf deleted a comment from preetu098 Apr 7, 2022

MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Apr 16, 2022

MartinThoma closed this as completed in db1e458 Apr 21, 2022

MartinThoma added a commit that referenced this issue Apr 21, 2022

TST: Regression test for #177

ba7b461

MartinThoma added a commit that referenced this issue Apr 21, 2022

TST: Regression test for #177 (#790)

80c59c9

VictorCarlquist pushed a commit to VictorCarlquist/PyPDF2 that referenced this issue Apr 29, 2022

TST: Regression test for py-pdf#177 (py-pdf#790)

2213b07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfReadError: EOF marker not found error when opening pdf files generated from selenium snapshot #177

PdfReadError: EOF marker not found error when opening pdf files generated from selenium snapshot #177

lovesh commented Feb 8, 2015

abixadamj commented Jul 31, 2015 •

edited by MartinThoma

Loading

rafaelcanovas commented Jan 31, 2016

vivekpd15 commented Jan 20, 2017

fractos commented Apr 4, 2017

beruic commented Feb 14, 2018

kut commented Dec 5, 2018

myleshk commented Jul 23, 2019

joseprieto commented Feb 20, 2020

myleshk commented Feb 26, 2020 •

edited

Loading

guillaume-uH57J9 commented Dec 4, 2020

MartinThoma commented Apr 7, 2022

guillaume-uH57J9 commented Apr 14, 2022 •

edited

Loading

guillaume-uH57J9 commented Apr 22, 2022

PdfReadError: EOF marker not found error when opening pdf files generated from selenium snapshot #177

PdfReadError: EOF marker not found error when opening pdf files generated from selenium snapshot #177

Comments

lovesh commented Feb 8, 2015

abixadamj commented Jul 31, 2015 • edited by MartinThoma Loading

rafaelcanovas commented Jan 31, 2016

vivekpd15 commented Jan 20, 2017

fractos commented Apr 4, 2017

beruic commented Feb 14, 2018

kut commented Dec 5, 2018

myleshk commented Jul 23, 2019

joseprieto commented Feb 20, 2020

myleshk commented Feb 26, 2020 • edited Loading

guillaume-uH57J9 commented Dec 4, 2020

MartinThoma commented Apr 7, 2022

guillaume-uH57J9 commented Apr 14, 2022 • edited Loading

guillaume-uH57J9 commented Apr 22, 2022

abixadamj commented Jul 31, 2015 •

edited by MartinThoma

Loading

myleshk commented Feb 26, 2020 •

edited

Loading

guillaume-uH57J9 commented Apr 14, 2022 •

edited

Loading