Rarely crash on some PDF #18

ghaddarAbs · 2018-12-28T13:37:56Z

Hi,

Great library.......I just want to reports some rare crashes (>20/700k PDF), not a big deal. I don't know if it's a bug or exceptions can occur in the extremely damaged cases.

pdf = pikepdf.open(input_file)
File "C:\Users.......\lib\site-packages\pikepdf_init_.py", line 41, in open
return Pdf.open(*args, **kwargs)
pikepdf._qpdf.PdfError: C:/.......\my_pdf.pdf: unable to find trailer dictionary while recovering damaged file

If it helps I can send the PDFs by email. We talk about corrupted PDFs that were generated before 2000 :)

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2018-12-29T06:19:00Z

The error indicates that:

the file was damaged
libqpdf tried to recover the damaged file, but gave up/exhausted its recovery tools

In my experience this usually happens when a file is truncated. Sometimes you can do manual forensic recovery and extract some content, but it all depends how the original was structured.

You should get an exception and that's expected behavior. If you got a crash, meaning the Python interpreter aborted with a segfault or some other error, I'd like to look at the files.

ghaddarAbs · 2018-12-29T12:15:20Z

Yeh, this is why I closed the issue ...... the documents were extremely damaged..... However, i used a try\catch to skip those docs.

The attachments contain 4 samples.
documents.zip

jbarlow83 · 2018-12-31T23:24:34Z

All 4 of these files appear to be truncated. At a glance the first few pages of text/images might be recoverable from the first two, but that's definitely in the realm of forensic data recovery, not what we're trying to do here.

Thanks for your submission.

ghaddarAbs closed this as completed Dec 28, 2018

ghaddarAbs reopened this Dec 29, 2018

jbarlow83 closed this as completed Dec 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rarely crash on some PDF #18

Rarely crash on some PDF #18

ghaddarAbs commented Dec 28, 2018

jbarlow83 commented Dec 29, 2018

ghaddarAbs commented Dec 29, 2018 •

edited

jbarlow83 commented Dec 31, 2018

Rarely crash on some PDF #18

Rarely crash on some PDF #18

Comments

ghaddarAbs commented Dec 28, 2018

jbarlow83 commented Dec 29, 2018

ghaddarAbs commented Dec 29, 2018 • edited

jbarlow83 commented Dec 31, 2018

ghaddarAbs commented Dec 29, 2018 •

edited