Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encoding attribute to OutStream class. #2500

Merged
merged 1 commit into from Oct 23, 2012

Conversation

bfroehle
Copy link
Contributor

This means that sys.stdout.encoding returns something useful instead of raising an AttributeError.

Closes #2220.

@takluyver
Copy link
Member

I wonder if explicitly setting UTF-8 makes more sense than using DEFAULT_ENCODING? As far as I'm aware, the Qt console and the notebook both work in UTF-8 even in Windows.

@bfroehle
Copy link
Contributor Author

Yeah, I have no idea. In write we do use encoding.DEFAULT_ENCODING.

But this is a completely backwards usage in some sense, as in a regular file object, .encoding is used by .write and print to control how unicode text is output.

Since we never encode the text (and instead just keep it with type unicode), I'm not sure encoding is even well-defined.

@ghost
Copy link

ghost commented Oct 22, 2012

My use-case is defining a repr() for objects so that unicode
code points render properly when I work interactively in ipython.

In py2.x repr() must return a str() not unicode().
if a repr returns a unicode obj, repr(obj) tries to encode it using "ascii" resulting in a
UnicodeEncodeError. so repr returning a correctly-encoded string is the only choice.

With the current implementation of write, the encoding to use is exactly
that encoding which write uses to decode strings back to unicode, before
pushing data over the socket.

I think encoding.DEFAULT_ENCODING is the right value to use here,
at least with write as it is.

@takluyver
Copy link
Member

Hmm, but I suspect write() should be decoding using UTF-8:

  1. This means code can output any characters, instead of being restricted to the subset in the current code page.
  2. If the user types print "þø→ŧ", I think decoding as UTF-8 is necessary for the output to match the input.

@ghost
Copy link

ghost commented Oct 22, 2012

True, although if a character is not supported, then it won't display correctly
no matter where the replacement occurs.

@takluyver
Copy link
Member

But the Qt console and the notebook don't have to rely on Windows' character set - both should be able to display arbitrary unicode characters.

@bfroehle
Copy link
Contributor Author

For the following reasons, I suggest that we close this issue and mark #2220 as invalid.

  1. It's not obvious what the appropriate value is.
  2. The value is never used.
  3. It's not a required attribute. "The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting Unicode strings."

@bfroehle
Copy link
Contributor Author

On the other hand io.TextIOBase.encoding should always exist in Python 3. In that case, we just want to choose something so that stdout.write(unicode_string) and write(unicode_string.encode(stdout.encoding)) are equivalent.

@ghost
Copy link

ghost commented Oct 22, 2012

You just wouldn't want a situtation where sys.stdout.encoding (specifying what write expects) is different
from what the display is capable of, because that would mislead clients about capabilities.

if that's not a possibility, so much the better.

@ghost
Copy link

ghost commented Oct 22, 2012

that would be a problem, since sys.getdefaultencoding() returns ascii,
encoding.py actually tries locale.getpreferedencoding() first.

@ghost
Copy link

ghost commented Oct 22, 2012

'utf-8' is good, I just can't the bit of code that explicitly states that qt and nb always use utf-8 and nothing but.

@takluyver
Copy link
Member

I suggest: @bfroehle , can you change this, and write() to UTF-8. Then we'll want a Windows user to test that it does the right thing with various bits of unicode. @y-p , are you on Windows? Or @jstenar is our unicode-on-Windows expert.

@ghost
Copy link

ghost commented Oct 22, 2012

I have access to a windows machine, would gladly test whatever you'd like.

@takluyver
Copy link
Member

It's rather confusing: when you encode something to utf-8 to return from repr(), OutStream will decode it to unicode to pack it into JSON (all strings in JSON are unicode). In serialising the JSON, it is then encoded once again to UTF-8. The frontend unserialises, again getting unicode strings, which it passes to the display mechanism (Qt or the browser), which should be unicode capable.

@ghost
Copy link

ghost commented Oct 22, 2012

what's confusing about that? :)

@takluyver
Copy link
Member

Perhaps 'convoluted' would be a better word ;-)

This means that `sys.stdout.encoding` returns something useful instead
of raising an AttributeError. 'UTF-8' was chosen because our version of
write actually wants a unicode string (i.e., not `bytes`) and we want
something universal which an encode the entire range of unicode characters.
@bfroehle
Copy link
Contributor Author

@takluyver & @y-p I think this is what you both were suggesting... Let's give this a thorough test in Windows now.

@ghost
Copy link

ghost commented Oct 22, 2012

Will do.

@ghost
Copy link

ghost commented Oct 22, 2012

Test suite passes on 2.7 on win7 (no errors beyond those of current git master)

and this works for me, using qtconsole

import sys 
class A():
    def __repr__(self):
        return u"שלום".encode(sys.stdout.encoding)
A()

שלום

as a sidenote, settting up for tests was pretty painful. no CI server?

@ghost
Copy link

ghost commented Oct 22, 2012

and of course:

sys.stdout.encoding == 'UTF-8'

@minrk
Copy link
Member

minrk commented Oct 22, 2012

as a sidenote, settting up for tests was pretty painful. no CI server?

As usual, Windows is dramatically more painful than other platforms, principally because there are no IPython devs who use it. That said, I have found that easy_install nose was the only step I have ever had to take to be ready to test on my Windows VMs.

And we do have two CI environments running. Travis automatically runs all PRs (as seen in the 'Good to merge' notes on this and every PR), and we also have ShiningPanda, which recently started Windows tests (again, much more painful to do CI on Windows than other platforms).

@bfroehle
Copy link
Contributor Author

Before we merge this, we should note that this pull request brings us almost back to before 19d5c41 ("Handle unicode properly in IPython.zmq.iostream") (pull request #534, issue #529). @y-p those linked issues should give you a bunch of additional test cases to try out.

@ghost
Copy link

ghost commented Oct 22, 2012

@minrk, python, then setuptools (from web), then pip (that's what I use on linux),then nose, then setup.py develop, then all the deps by hand, then back to easy_install because some deps where egg-only, and then more binaries from
the web because the machine doesn't have a compiler installed. I obviously went off the well-traveled path somewhere.

@ghost
Copy link

ghost commented Oct 22, 2012

replicating the tests in #529:

In [7]: print u"åäö".encode('utf-8')
åäö

In [8]: print u"åäö"
åäö

In [5]: "åäö"
Out[5]: '\xc3\xa5\xc3\xa4\xc3\xb6'

for completeness

In [2]: u"åäö".encode("utf-8").decode("cp850")
Out[2]: u'\u251c\xd1\u251c\xf1\u251c\xc2'

In [3]: print u"åäö".encode("utf-8").decode("cp850")
├ÑñÂ

In [4]: print u"åäö".encode('cp850')
���

In [6]: print u"åäö".encode('cp1252')
���

where � is the unicode replacement symbol, which windows mangles for some reason, but
renders as the usual blotchy-question-mark in qtconsole.

@takluyver
Copy link
Member

@y-p : Could you also test in the qtconsole print "שלוםåäö" (without the u prefix)? I know that doesn't behave correctly in the terminal, but I think with this test it should work in the Qt console and the notebook.

Our Windows CI runs are currently failing, unfortunately. If you've got spare time, could you test #2478, which should address the test failure.

@ghost
Copy link

ghost commented Oct 23, 2012

In notebook.

In [16]: print "שלוםåäö"
שלוםåäö

In qtconsole, the display is right aligned because it begins with RTL script,
but that's not new to this patch. other then that it prints fine.

@takluyver
Copy link
Member

OK, then I think this is all OK. @bfroehle , anything else you want to check? If not, let's land this.

@bfroehle
Copy link
Contributor Author

Okay, hope it doesn't break anything! :) Merging now.

bfroehle added a commit that referenced this pull request Oct 23, 2012
Add `encoding` attribute to `OutStream` class.
@bfroehle bfroehle merged commit 2e1736c into ipython:master Oct 23, 2012
minrk added a commit that referenced this pull request Mar 5, 2013
This means that `sys.stdout.encoding` returns something useful instead of raising an AttributeError.

Closes #2220.
takluyver pushed a commit to takluyver/pandas that referenced this pull request Nov 19, 2013
when redirected to a file stdout is None, in ipython zmq frontends
stdin is None, and stdout (only recently in 0.14) is utf-8.

ipython/ipython#2500
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Add `encoding` attribute to `OutStream` class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No sys.stdout.encoding in kernel based IPython
3 participants