Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when piping/Tee-Object on Windows #951

Open
jamesdeluk opened this issue Mar 20, 2024 · 0 comments
Open

UnicodeEncodeError when piping/Tee-Object on Windows #951

jamesdeluk opened this issue Mar 20, 2024 · 0 comments

Comments

@jamesdeluk
Copy link

Running the script normally seems to work, printing out the full file.

However, if I try to pipe or Tee-Object:

python .\pdf2txt.py file.pdf > file.txt

or python .\pdf2txt.py file.pdf | Tee-Object file.txt

I get the following error (Command Prompt and PowerShell):

Traceback (most recent call last):
  File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 317, in <module>
    sys.exit(main())
             ^^^^^^
  File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 311, in main
    outfp = extract_text(**vars(parsed_args))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 62, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\high_level.py", line 132, in extract_text_to_fp
    interpreter.process_page(page)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\pdfinterp.py", line 998, in process_page
    self.device.end_page(page)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 81, in end_page
    self.receive_layout(self.cur_item)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 352, in receive_layout
    render(ltpage)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
    render(child)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
    render(child)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
    render(child)
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 343, in render
    self.write_text(item.get_text())
  File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 335, in write_text
    cast(TextIO, self.outfp).write(text)
  File "C:\Program Files\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\x83' in position 0: character maps to <undefined>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant