Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open sys.stdin with the right encoding in the console for Python 2 #2004

Closed
spyder-bot opened this issue Feb 17, 2015 · 3 comments
Closed

Open sys.stdin with the right encoding in the console for Python 2 #2004

spyder-bot opened this issue Feb 17, 2015 · 3 comments

Comments

@spyder-bot
Copy link
Collaborator

From ccordoba12 on 2014-10-11T11:51:28Z

This comes from this SO question http://stackoverflow.com/questions/26312400/what-exactly-does-spyder-do-to-unicode-strings If I understand things correctly, the problem is the console is not accepting text (through stdin) with the right encoding, and that's why we produce this result (when writing these lines in the console)

len('tiθ')
4
len(u'tiθ')
4

when in a terminal Python interpreter people gets

len('tiθ')
4
len(u'tiθ')
3

After further inspection, the resuts for stdin encoding are

sys.stdin.encoding => None (Spyder console and also IPython qtconsole and two-process console)
sys.stdin.encoding => 'UTF-8' (Python console)

According to Martjin Peters in a comment from that post directed to me: "you'll have to study up on how to open sys.stdin for your console with the same encoding used as the GUI input source".

One possible solution would be to set the PYTHONIOENCODING env var to "UTF-8" in our sitecustomize. But I'm not really sure because that changes sys.stdin.encoding for qtconsole (although we can set it just for our consoles).

Thomas, could you give me a hand with this one? I know you're much more knowledgeable than me with this kind of things. It seems a simple fix but I'd like to get it right :-) Thanks!

Original issue: http://code.google.com/p/spyderlib/issues/detail?id=2004

@spyder-bot
Copy link
Collaborator Author

From tak...@gmail.com on 2014-10-11T19:04:27Z

I think the relevant thing is not stdin, but what you're passing to compile() or exec(). Passing either bytes or unicode will work, but if you pass bytes, you can have surprising behaviour in unicode literals, and if you pass unicode, you can have strange behaviour in bytes literals:

Examples from plain Python in a terminal

exec(b"print(len(u'tiþ'))")
4
exec(u"print(len(b'tiþ'))") # b'þ' is not even valid syntax on Python 3
4 # This works correctly on my system, but may do surprising things on Windows

IIRC, passing code as bytes treats unicode literals as if they were decoded from the bytes with the latin1 codec. That's what you're seeing here: your string is encoded as UTF-8 and then decoded as latin1 by the interpreter.

On the other hand, passing code as unicode treats bytes literals as if they're encoded in UTF-8. This can cause surprises on Windows if you do it with code from the Windows cmd console, but for a GUI application that's passing you unicode, it's almost certainly the right thing to do. In addition, non-ascii characters in bytes literals are illegal in Python 3 (you have to use the \xd9 style escapes), which supports this approach.

I hope that makes sense. I had to work this out for IPython, because our unicode handling was broken in this way for a very long time, so I'm happy to explain more if it's not clear.

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-10-12T17:35:53Z

Thanks a lot for the thorough explanation Thomas! I decided to leave everything (inputs and outputs) in unicode because this is a GUI app, as you said, so we can give us that luxury :-)

I'll create a pull request soon so you can give a quick look at it and/or test it.

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-10-26T10:04:57Z

This issue was closed by revision fad578ccb1ae .

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant