-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local variable 'cm' referenced before assignment #2702
Comments
please provide code and input file |
@thelazydogsback please update the issue with code and input file, else we will have to close the issue as "can't reproduce" |
@thelazydogsback |
I close this dead issue |
I encountered the same issue! Here is the code and the PDF file. for idx, page in enumerate(PdfReader(pdf_path).pages):
page_content = ""
text = page.extract_text() # UnboundLocalError: local variable 'cm' referenced before assignment and there is some infomation I can provide when I running code: |
@bazinga014 |
@pubpub-zz I just uploaded the file in the issue |
@bazinga014 |
@pubpub-zz Yes, I can open it normally with Chrome's built-in PDF parser, but there are errors when opening it with Acrobat Reader. So how should I handle this situation? How can I parse it with pypdf? |
Apparently, opening with pypdf is already possible - there just is some issue with the text extraction if this is about the undefined |
in _cmap.py
and confirm the output is ok |
@pubpub-zz OK! I will try it! thanks a lot! |
Thanks @bazinga014 for following up -- I was unable to provide the file because it is a customer file that I was unable to share. |
@thelazydogsback can you also try the patch ? |
@pubpub-zz also had the same issue (with a customer's private file) and the patch seems to help! |
Trying to extract text from page.
Tested in Win11 & Linux container.
pypdf==4.2.0, crypt_provider=('cryptography', '42.0.5'), PIL=none
Traceback
The text was updated successfully, but these errors were encountered: