You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
Currently, if the xref table in the PDF is corrupted (wrong offsets), then dumppdf.py fails to extract files.
For example, when the following command runs:
dumppdf.py -E /tmp/ sample.pdf
We get:
Traceback (most recent call last):
File "/usr/local/bin/dumppdf.py", line 268, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/usr/local/bin/dumppdf.py", line 265, in main
dumpall=dumpall, codec=codec, extractdir=extractdir)
File "/usr/local/bin/dumppdf.py", line 191, in extractembedded
obj = doc.getobj(objid)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfdocument.py", line 681, in getobj
stream = stream_value(self.getobj(strmid))
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfdocument.py", line 694, in getobj
raise PDFObjectNotFound(objid)
pdfminer.pdftypes.PDFObjectNotFound: 1
Possible solution:
When computing an object offset (in PDFXref.load), you can check if the offset given by the xref table makes sense, probably by checking if starts with "<objid> 0 obj". If that's not the case, then you can find that pattern in the whole file, and use that offset instead.
The text was updated successfully, but these errors were encountered:
Maybe it's possible to fix the corrupt PDF using a different tool beforehand. I'm asking if it makes sense to start dealing with all possible edge cases or just assume valid PDF files.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently, if the xref table in the PDF is corrupted (wrong offsets), then dumppdf.py fails to extract files.
For example, when the following command runs:
We get:
Possible solution:
When computing an object offset (in PDFXref.load), you can check if the offset given by the xref table makes sense, probably by checking if starts with "<objid> 0 obj". If that's not the case, then you can find that pattern in the whole file, and use that offset instead.
The text was updated successfully, but these errors were encountered: