Unicode works fine in standard ipython console, or in qtconsole when using python strings, however when issuing a "ls", it seems the qtconsole tries to read a latin1 string returned by the os as if it were utf8. Cf screenshot below.
Note that several issues have been opened on encoding subjects, but it seems none of them were especially concerning that case.
Ech, Windows encoding bugs again.
What's the value of IPython.utils.encoding.DEFAULT_ENCODING in each environment?
In qtconsole, IPython.utils.encoding.DEFAULT_ENCODING is cp1252 (broken display), in cmd.exe that value is cp850(displays OK).
Note that doing "a = !ls" works on both cases -> filenames get returned in a list a BYTES objects, representing filenames in UTF8.
OK, so it's decoding with cp1252 when it should be using cp850:
In : "é".encode('cp850').decode('cp1252')
That means that your terminal encoding, sys.stdin.encoding, must be cp850, but locale.getpreferredencoding() is cp1252. I can't see any good way to deal with that, because the Qt console processes on Windows don't have a stdin to query.
I'm not sure why it's using cp850 to start with, though - I thought that went out with DOS, and everyone was on Windows codepages now.
For sure windows consoles (cme.exe and the like) have builtin issues with encodings, you can't even specify encoding in properties, and switching to unicode requires digging into the registry it seems. I'm looking at possible better replacements like conemu or the powershell console, currently.
Here is a related issue:
If I understand corrrectly, the output of commands like "ls" should be decoded with the encoding found eg. in sys.stdin.encoding (cp850 here), while command line arguments should still be decoded according to getpreferredencoding (cp1252).
What would be the best, trying to get thinks right by hacking into windows codepages, or finding a bullet-proof solution with a specific console syste,; towerds which we should direct windows users willing to use ipython?
PS: I've just remarked that the "unix tools" I use on windows, like the ls command (windows' equivalent, DIR, works fine), might have their own encoding problems between cp850 and cp1252 - some tests failed with "ls" but worked with "DIR" command.
At times like these, Sage's approach looks very tempting - point Windows users to a virtual machine image that runs Linux. ;-)
Windows is not bad, just the standard options must keep backward compatibility to pre-linux period.
** current problem (view from DOS command) **
CMD + Enter... to be on the dos command line
==> shows it fails exactly the same way (under any french windows)
cmd /U/C dir>toto3.txt
==> shows it works : no more ill-interpreted character
==> so changing "ls" behaviour from "dir" to "CMD /U/C dir" may solve the problem.
maybe a "lsu" command, which would do "CMD /U/C dir" instead of "dir" , could be a safe solution.
no risk of breaking something.
==> gives result in image below (ok for french characters, not for chineese)
Current workaround to get the equivalent of a "ls" in unicode for windows
%sx cmd/U/Cdir/o/n a*>temp_result_in_utf16le.txt
with open ("temp_result_in_utf16le.txt", "r" , encoding='utf-16LE') as myfile:
Hmm, that's interesting. Maybe we could integrate cmd /u/c into our process handling, so we can use UTF-16 to decode it rather than system code pages. @jstenar , thoughts?