LS encoding trouble under win7 x64 and qtconsole #2970

Open
pakal opened this Issue Feb 22, 2013 · 10 comments

Projects

None yet

3 participants

@pakal

Unicode works fine in standard ipython console, or in qtconsole when using python strings, however when issuing a "ls", it seems the qtconsole tries to read a latin1 string returned by the os as if it were utf8. Cf screenshot below.

Note that several issues have been opened on encoding subjects, but it seems none of them were especially concerning that case.

ipython_bug

@takluyver
IPython member

Ech, Windows encoding bugs again.

What's the value of IPython.utils.encoding.DEFAULT_ENCODING in each environment?

@pakal

In qtconsole, IPython.utils.encoding.DEFAULT_ENCODING is cp1252 (broken display), in cmd.exe that value is cp850(displays OK).

Note that doing "a = !ls" works on both cases -> filenames get returned in a list a BYTES objects, representing filenames in UTF8.

@takluyver
IPython member

OK, so it's decoding with cp1252 when it should be using cp850:

In [2]: "é".encode('cp850').decode('cp1252')
Out[2]: '‚'

That means that your terminal encoding, sys.stdin.encoding, must be cp850, but locale.getpreferredencoding() is cp1252. I can't see any good way to deal with that, because the Qt console processes on Windows don't have a stdin to query.

I'm not sure why it's using cp850 to start with, though - I thought that went out with DOS, and everyone was on Windows codepages now.

@pakal

For sure windows consoles (cme.exe and the like) have builtin issues with encodings, you can't even specify encoding in properties, and switching to unicode requires digging into the registry it seems. I'm looking at possible better replacements like conemu or the powershell console, currently.

Here is a related issue:
http://stackoverflow.com/questions/9226516/python-windows-console-and-encodings-cp-850-vs-cp1252

If I understand corrrectly, the output of commands like "ls" should be decoded with the encoding found eg. in sys.stdin.encoding (cp850 here), while command line arguments should still be decoded according to getpreferredencoding (cp1252).

What would be the best, trying to get thinks right by hacking into windows codepages, or finding a bullet-proof solution with a specific console syste,; towerds which we should direct windows users willing to use ipython?

PS: I've just remarked that the "unix tools" I use on windows, like the ls command (windows' equivalent, DIR, works fine), might have their own encoding problems between cp850 and cp1252 - some tests failed with "ls" but worked with "DIR" command.

@takluyver
IPython member

At times like these, Sage's approach looks very tempting - point Windows users to a virtual machine image that runs Linux. ;-)

@stonebig

Hi,

Windows is not bad, just the standard options must keep backward compatibility to pre-linux period.

** current problem (view from DOS command) **
CMD + Enter... to be on the dos command line
dir>toto.txt

notepad toto.txt

==> shows it fails exactly the same way (under any french windows)

cmd /U/C dir>toto3.txt
notepad toto3.txt

==> shows it works : no more ill-interpreted character
==> so changing "ls" behaviour from "dir" to "CMD /U/C dir" may solve the problem.

@stonebig

maybe a "lsu" command, which would do "CMD /U/C dir" instead of "dir" , could be a safe solution.
no risk of breaking something.

Side-remark

  • if I start qt console inside my "spyder" (of winpython3.3.2), the "ls" commands displays the proper characters.
  • if I do the same in a stand-alone qt console, the "ls" command displays badly. This seems independant from being python 2 or 3, IPython0.13 or 1.0
@stonebig
  • not very good workaround
    modify IPython/core/alias.py default_aliases = [('ls', 'dir /on'), per default_aliases = [('ls', 'cmd /U/C dir /on'),

then type
ls a*
==> gives result in image below (ok for french characters, not for chineese)

qt_console_ls_unicode

@stonebig

Current workaround to get the equivalent of a "ls" in unicode for windows

%sx cmd/U/Cdir/o/n a*>temp_result_in_utf16le.txt
with open ("temp_result_in_utf16le.txt", "r" , encoding='utf-16LE') as myfile:
print(myfile.read())

qt_console_ls_unicode

@takluyver
IPython member

Hmm, that's interesting. Maybe we could integrate cmd /u/c into our process handling, so we can use UTF-16 to decode it rather than system code pages. @jstenar , thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment