-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index out of bounds in pypdf._text_extraction.handle_tj #2320
Comments
Without the PDF we can not analyse anything. If you agree email it privately to @MartinThoma (info@martin-thoma.de) |
Is there a way to cut specific pages from a pdf? I tried:
But the error was not present in the resultant pdf. |
try:
|
That worked! |
I sent a redacted pdf page to the above email instead of attaching here out of an abundance of caution. |
Link to code position: pypdf/pypdf/_text_extraction/__init__.py Line 220 in 38795f5
|
Have you considered submitting a corresponding PR for this (the offending line has already been part of your initial traceback)? I cannot debug this without a PDF file, but it looks like we can have an early return due to an empty operands list here. |
Hi Stefan. I've sent a PDF to info@martin-thoma.de as requested and yes this should be an easy fix. Will try and create a PR if I find a moment. |
PR: rgwood-rely:rgwood/2320_fix_index_out_of_bounds_in_handle_tj |
On decoding a pdf in the second line:
len(operands) == 0 and it raises an ex.
Should change it to:
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
# sorry; PDF is confidential
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
Traceback
This is the complete traceback I see:
The text was updated successfully, but these errors were encountered: