You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After further inspection, the resuts for stdin encoding are
sys.stdin.encoding => None (Spyder console and also IPython qtconsole and two-process console)
sys.stdin.encoding => 'UTF-8' (Python console)
According to Martjin Peters in a comment from that post directed to me: "you'll have to study up on how to open sys.stdin for your console with the same encoding used as the GUI input source".
One possible solution would be to set the PYTHONIOENCODING env var to "UTF-8" in our sitecustomize. But I'm not really sure because that changes sys.stdin.encoding for qtconsole (although we can set it just for our consoles).
Thomas, could you give me a hand with this one? I know you're much more knowledgeable than me with this kind of things. It seems a simple fix but I'd like to get it right :-) Thanks!
I think the relevant thing is not stdin, but what you're passing to compile() or exec(). Passing either bytes or unicode will work, but if you pass bytes, you can have surprising behaviour in unicode literals, and if you pass unicode, you can have strange behaviour in bytes literals:
Examples from plain Python in a terminal
exec(b"print(len(u'tiþ'))")
4
exec(u"print(len(b'tiþ'))") # b'þ' is not even valid syntax on Python 3
4 # This works correctly on my system, but may do surprising things on Windows
IIRC, passing code as bytes treats unicode literals as if they were decoded from the bytes with the latin1 codec. That's what you're seeing here: your string is encoded as UTF-8 and then decoded as latin1 by the interpreter.
On the other hand, passing code as unicode treats bytes literals as if they're encoded in UTF-8. This can cause surprises on Windows if you do it with code from the Windows cmd console, but for a GUI application that's passing you unicode, it's almost certainly the right thing to do. In addition, non-ascii characters in bytes literals are illegal in Python 3 (you have to use the \xd9 style escapes), which supports this approach.
I hope that makes sense. I had to work this out for IPython, because our unicode handling was broken in this way for a very long time, so I'm happy to explain more if it's not clear.
Thanks a lot for the thorough explanation Thomas! I decided to leave everything (inputs and outputs) in unicode because this is a GUI app, as you said, so we can give us that luxury :-)
I'll create a pull request soon so you can give a quick look at it and/or test it.
From ccordoba12 on 2014-10-11T11:51:28Z
This comes from this SO question http://stackoverflow.com/questions/26312400/what-exactly-does-spyder-do-to-unicode-strings If I understand things correctly, the problem is the console is not accepting text (through stdin) with the right encoding, and that's why we produce this result (when writing these lines in the console)
when in a terminal Python interpreter people gets
After further inspection, the resuts for stdin encoding are
sys.stdin.encoding => None (Spyder console and also IPython qtconsole and two-process console)
sys.stdin.encoding => 'UTF-8' (Python console)
According to Martjin Peters in a comment from that post directed to me: "you'll have to study up on how to open sys.stdin for your console with the same encoding used as the GUI input source".
One possible solution would be to set the PYTHONIOENCODING env var to "UTF-8" in our sitecustomize. But I'm not really sure because that changes sys.stdin.encoding for qtconsole (although we can set it just for our consoles).
Thomas, could you give me a hand with this one? I know you're much more knowledgeable than me with this kind of things. It seems a simple fix but I'd like to get it right :-) Thanks!
Original issue: http://code.google.com/p/spyderlib/issues/detail?id=2004
The text was updated successfully, but these errors were encountered: