Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some non-ascii input echos wrong characters or fails to decode #202

Closed
Dobatymo opened this issue Apr 9, 2021 · 14 comments · Fixed by #203
Closed

Some non-ascii input echos wrong characters or fails to decode #202

Dobatymo opened this issue Apr 9, 2021 · 14 comments · Fixed by #203

Comments

@Dobatymo
Copy link

Dobatymo commented Apr 9, 2021

Hi!

Running the editor example https://github.com/jquast/blessed/blob/master/bin/editor.py on Windows 10 x64, blessed==1.18.0 and Python 3.7, 3.8 or 3.9 yields

Traceback (most recent call last):
  File "<snip>\blessed\bin\editor.py", line 269, in <module>
    main()
  File "<snip>\blessed\bin\editor.py", line 228, in main
    inp = term.inkey()
  File "<snip>\Python39\lib\site-packages\blessed\terminal.py", line 1386, in inkey
    ucs += self.getch()
  File "<snip>\Python39\lib\site-packages\blessed\win_terminal.py", line 39, in getch
    return super(Terminal, self).getch()
  File "<snip>\Python39\lib\site-packages\blessed\terminal.py", line 1169, in getch
    return self._keyboard_decoder.decode(byte, final=False)
  File "<snip>\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 0: character maps to <undefined>

when I try to input a German umlaut ü (using the US-International keyboard layout you can type "+u). Running chcp outputs Active code page: 437 whereas the traceback shows cp1252.
Typing ß (alt+s) gives me a á. So someting is messing up the encodings here.

Entering either in the cmd prompt or the interactive python interpreter is not a problem and displays fine.

@Dobatymo Dobatymo changed the title Non-ascii input fails with UnicodeDecodeError Some non-ascii input echos wrong characters or fails to decode Apr 9, 2021
@avylove
Copy link
Collaborator

avylove commented Apr 11, 2021

I was able to reproduce your issue, but I don't get an exception for "+u, I get š. If I overwrite the logic to so the encoding is cp437, it behaves as expected.

The problem is, Python, as far as I can tell, does not relay code page 437, even when that's what chcp reports. This is how we currently detect the encoding. In your setup, this would set it to 'cp1252'.

self._encoding = locale.getpreferredencoding() or 'UTF-8'

An alternative way to do this you might see is to look at sys.stdin.encoding, which should return 'utf-8'.

So I think the way to make this work is going to be to add a method to Jinxed that can call the GetConsoleCP() C function and then have Blessed call that. I'll make the changes to Jinxed and make a new release, then create PR for Blessed.

@avylove
Copy link
Collaborator

avylove commented Apr 11, 2021

@Dobatymo, try PR #203 and see if it fixes your issue. You'll need to have Jinxed >= 1.1.0 installed.

@Dobatymo
Copy link
Author

Looks like it's working 👍

@Dobatymo
Copy link
Author

Dobatymo commented Apr 13, 2021

By the way, it seems it's not necessary to use the WinAPI, pure Python should be enough.

>>> locale.getpreferredencoding()
'cp1252'
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.stdin.encoding
'utf-8'
>>> os.device_encoding(0)
'cp437'

os.device_encoding(0) seems to be doing the trick.

@Z1ni
Copy link

Z1ni commented Sep 4, 2021

I'm still having this issue even when using blessed 1.18.1.
My system outputs the exact same encodings as in the above comment.

Here are some wrong characters:

Expected Returned
ä
ö
å
Ä Ž
Ö
Å Throws UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 0: character maps to <undefined>

Some other keys also produce wrong characters.
My keyboard layout is Finnish: https://kbdlayout.info/KBDFI/. Everything works fine with Linux (WSL etc.).

@avylove
Copy link
Collaborator

avylove commented Sep 5, 2021

Sorry to hear that.
What version of Jinxed is installed?
On your Terminal instance, what are the values of errors and _encoding?

@Z1ni
Copy link

Z1ni commented Sep 5, 2021

Jinxed version is 1.1.0, _encoding is cp1252 and errors is ['parameters: kind=None, stream=None, force_styling=False'] after every key press listed above.

@Z1ni
Copy link

Z1ni commented Sep 5, 2021

As per the comments in this issue, it seems that cp437 (from os.device_encoding(0)) should be used, as when I tried doing an os.read from stdin, the data decodes correctly when the decoding charset is set to cp437.

@avylove
Copy link
Collaborator

avylove commented Sep 6, 2021

What does jinxed.get_console_input_encoding() return?

@Z1ni
Copy link

Z1ni commented Sep 6, 2021

jinxed.win32.get_console_input_encoding() returns cp437.

@Z1ni
Copy link

Z1ni commented Sep 6, 2021

After checking out the sources, it seems that the blessed 1.18.1 in pip doesn't have the newest terminal.py.
See terminal.py in https://files.pythonhosted.org/packages/84/83/3a1fe424ebbf1709e1dc282805332f9f367c8e19c3fb9da42ce695390423/blessed-1.18.1.tar.gz vs. in the GitHub release tarball. The IS_WINDOWS change isn't in the pip release.

Edit: The GitHub release tarball installed with pip fixes the problems with the previously broken characters.

@avylove
Copy link
Collaborator

avylove commented Sep 7, 2021

Oh! Good catch!

@avylove
Copy link
Collaborator

avylove commented Sep 21, 2021

1.19.0 was just pushed. This should resolve your issue.

@Z1ni
Copy link

Z1ni commented Sep 21, 2021

Works fine now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants