-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rarely crash on some PDF #18
Comments
The error indicates that:
In my experience this usually happens when a file is truncated. Sometimes you can do manual forensic recovery and extract some content, but it all depends how the original was structured. You should get an exception and that's expected behavior. If you got a crash, meaning the Python interpreter aborted with a segfault or some other error, I'd like to look at the files. |
Yeh, this is why I closed the issue ...... the documents were extremely damaged..... However, i used a try\catch to skip those docs. The attachments contain 4 samples. |
All 4 of these files appear to be truncated. At a glance the first few pages of text/images might be recoverable from the first two, but that's definitely in the realm of forensic data recovery, not what we're trying to do here. Thanks for your submission. |
Hi,
Great library.......I just want to reports some rare crashes (>20/700k PDF), not a big deal. I don't know if it's a bug or exceptions can occur in the extremely damaged cases.
pdf = pikepdf.open(input_file)
File "C:\Users.......\lib\site-packages\pikepdf_init_.py", line 41, in open
return Pdf.open(*args, **kwargs)
pikepdf._qpdf.PdfError: C:/.......\my_pdf.pdf: unable to find trailer dictionary while recovering damaged file
If it helps I can send the PDFs by email. We talk about corrupted PDFs that were generated before 2000 :)
The text was updated successfully, but these errors were encountered: