Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few more Unicode issues with Windows #3

Closed
Zireael-N opened this issue Nov 28, 2017 · 19 comments
Closed

A few more Unicode issues with Windows #3

Zireael-N opened this issue Nov 28, 2017 · 19 comments

Comments

@Zireael-N
Copy link

  1. These two symbols - https://github.com/layday/instawow/blob/master/instawow/cli.py#L20 - cause an UnicodeEncodeError exception.

Appending .encode('utf8').decode(sys.stdout.encoding) to them seems to fix the issue but it looks rather hack-ish to me although I am not really fluent in Python. Using just one of those functions did not work.

  1. If there's a .toc-file with non-ASCII symbols, instawow list preexisting throws an UnicodeDecodeError exception:
$ instawow list preexisting
Traceback (most recent call last):
  File "C:\Program Files\Python36\Scripts\instawow-script.py", line 11, in <module>
    load_entry_point('instawow==0.8.4', 'console_scripts', 'instawow')()
  File "c:\program files\python36\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\program files\python36\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\program files\python36\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\program files\python36\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\program files\python36\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\program files\python36\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\program files\python36\lib\site-packages\click\decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "c:\program files\python36\lib\site-packages\instawow\cli.py", line 259, in preexisting
    folders = {(n, TocReader(t)) for n, t in folders if t.exists()}
  File "c:\program files\python36\lib\site-packages\instawow\cli.py", line 259, in <setcomp>
    folders = {(n, TocReader(t)) for n, t in folders if t.exists()}
  File "c:\program files\python36\lib\site-packages\instawow\utils.py", line 48, in __init__
    for e in toc_file_path.read_text().splitlines()
  File "c:\program files\python36\lib\pathlib.py", line 1175, in read_text
    return f.read()
  File "c:\program files\python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 162: character maps to <undefined>

This is the file that causes the mentioned exception:

## Interface: 70100
## Title: GridManaBars
## Notes: Adds manabars to the sides of Grid frames.
## Notes-ruRU: Добовляет полоски маны в фрейм Gridа.
## Notes-zhTW: [Grid] 在 Grid 框架旁邊顯示法力條。
## Notes-zhCN: [Grid] 在 Grid 框体内设置法力条。
## Author: Adelea@EU-Bronzebeard, Julith
## Version: 1.11
## Grid Author: Pastamancer & Maia
## X-Website: http://wowace.com/wiki/Grid
## Dependencies: Grid
## X-Curse-Packaged-Version: v1.1b
## X-Curse-Project-Name: GridManaBars
## X-Curse-Project-ID: grid-mana-bars
## X-Curse-Repository-ID: wow/grid-mana-bars/mainline

GMBLocale-enUS.lua
GMBLocale-ruRU.lua
GMBLocale-deDE.lua
GMBLocale-koKR.lua
GMBLocale-zhCN.lua
GMBLocale-zhTW.lua
GridManaBar.lua
GridStatusMana.lua
@layday
Copy link
Owner

layday commented Nov 28, 2017

When you say 1. works after re-decoding, does it still display as a tick/cross?

@Zireael-N
Copy link
Author

It does in Cygwin's terminal. In cmd.exe it doesn't, that's most likely because of the font, it also doesn't support escape codes that change colours.

@layday
Copy link
Owner

layday commented Nov 28, 2017

Do you have this issue both in Cygwin and the Command Prompt?

@Zireael-N
Copy link
Author

Just checked, cmd.exe behaves the same way both with and without .encode('utf8').decode(sys.stdout.encoding). Cygwin's terminal chokes without it.

Non-ASCII symbols in .toc-files cause an exception in both of them.

@layday
Copy link
Owner

layday commented Nov 28, 2017

What is the value of sys.stdout.encoding in each? python3 -c "import sys; print(sys.stdout.encoding)"

@Zireael-N
Copy link
Author

Cygwin:

$ python3.6 -c "import sys; print(sys.stdout.encoding)"
UTF-8

cmd.exe:

python3 -c "import sys; print(sys.stdout.encoding)"
utf-8

@layday
Copy link
Owner

layday commented Nov 28, 2017

Um, but then click.style('✓', fg='green') is equal to click.style('✓', fg='green').encode('utf8').decode(sys.stdout.encoding), so how can it be making a difference?

@Zireael-N
Copy link
Author

For some reason, when I launch the instawow's binary, it's equal to cp1252.

@layday
Copy link
Owner

layday commented Nov 28, 2017

If that were the case the tick should've been mangled into ✓ but you say it remains intact (in Cygwin).

@Zireael-N
Copy link
Author

encoding
workingoutput
anexception

@layday
Copy link
Owner

layday commented Nov 28, 2017

That is... confusing. I'm also wondering what might've transpired since #1 when we (thought we) fixed this 🤷‍♂️

@Zireael-N
Copy link
Author

Back then, I had to use pip install locally using those commands you provided in readme.md (venv) because pip refused to accept git+https://github.com/layday/instawow@print-test as an argument. I actually just checked since I still have those files, that binary outputs UTF-8 as its sys.stdout.encoding. It puzzles me that these two binaries assume different encodings, since I used pip via Cygwin's terminal for both of them.

@layday
Copy link
Owner

layday commented Nov 29, 2017

I've released a new version with what should be a fix for the second issue described above (incorrect encoding when opening TOC files).

@Zireael-N
Copy link
Author

Just tested it, the second issue no longer occurs.

@layday
Copy link
Owner

layday commented Nov 30, 2017

So, having secured access to a Windows machine, the value of sys.stdout.encoding in Cygwin for me is cp1252 whichever way I run Python. I'm not sure why Python insists on using CP-1252 when the env var for LANG is *.UTF-8. The easiest and most controversial workaround is to set the PYTHONIOENCODING to UTF8.

instawow did not crash in the Command Prompt.

@Zireael-N
Copy link
Author

PYTHONIOENCODING works for me. Is there a way to force sys.stdout.encoding to be UTF-8 from within the program if it's launched in Cygwin? platform.system() returns different values for Cygwin's terminal and cmd.exe: CYGWIN_NT-10.0 and Windows.

@layday
Copy link
Owner

layday commented Nov 30, 2017

Did you install Python from Cygwin or using the installer from python.org? I've asked on #cygwin and they were unable to reproduce this issue using Cygwin's version of Python and suggested that the official version of Python might be doing 'something wack like looking at your Windows locale settings to determine the ioencoding'.

@Zireael-N
Copy link
Author

Yeah, that was it.

I had both native Python and the one from Cygwin. Both of them were in PATH (Cygwin inherits OS's PATH but precedes it with /usr/local/bin:/usr/bin:). Cygwin's version did not have pip installed, so when I called pip install instawow, it installed instawow to the Scripts directory (which also is present in PATH) of the native version. And that instawow executable was launching the native version.

Sorry for wasting your time.

@layday
Copy link
Owner

layday commented Nov 30, 2017

No worries, happy we've been able to resolve this.

@toastmonger toastmonger mentioned this issue Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants