Add `encoding` attribute to `OutStream` class. #2500

bfroehle · 2012-10-17T22:11:10Z

This means that sys.stdout.encoding returns something useful instead of raising an AttributeError.

Closes #2220.

takluyver · 2012-10-17T22:20:54Z

I wonder if explicitly setting UTF-8 makes more sense than using DEFAULT_ENCODING? As far as I'm aware, the Qt console and the notebook both work in UTF-8 even in Windows.

bfroehle · 2012-10-18T00:13:44Z

Yeah, I have no idea. In write we do use encoding.DEFAULT_ENCODING.

But this is a completely backwards usage in some sense, as in a regular file object, .encoding is used by .write and print to control how unicode text is output.

Since we never encode the text (and instead just keep it with type unicode), I'm not sure encoding is even well-defined.

ghost · 2012-10-22T15:19:53Z

My use-case is defining a repr() for objects so that unicode
code points render properly when I work interactively in ipython.

In py2.x repr() must return a str() not unicode().
if a repr returns a unicode obj, repr(obj) tries to encode it using "ascii" resulting in a
UnicodeEncodeError. so repr returning a correctly-encoded string is the only choice.

With the current implementation of write, the encoding to use is exactly
that encoding which write uses to decode strings back to unicode, before
pushing data over the socket.

I think encoding.DEFAULT_ENCODING is the right value to use here,
at least with write as it is.

takluyver · 2012-10-22T16:02:41Z

Hmm, but I suspect write() should be decoding using UTF-8:

This means code can output any characters, instead of being restricted to the subset in the current code page.
If the user types print "þø→ŧ", I think decoding as UTF-8 is necessary for the output to match the input.

ghost · 2012-10-22T16:13:32Z

True, although if a character is not supported, then it won't display correctly
no matter where the replacement occurs.

takluyver · 2012-10-22T16:19:02Z

But the Qt console and the notebook don't have to rely on Windows' character set - both should be able to display arbitrary unicode characters.

bfroehle · 2012-10-22T16:32:42Z

For the following reasons, I suggest that we close this issue and mark #2220 as invalid.

It's not obvious what the appropriate value is.
The value is never used.
It's not a required attribute. "The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting Unicode strings."

bfroehle · 2012-10-22T16:35:29Z

On the other hand io.TextIOBase.encoding should always exist in Python 3. In that case, we just want to choose something so that stdout.write(unicode_string) and write(unicode_string.encode(stdout.encoding)) are equivalent.

ghost · 2012-10-22T16:35:50Z

You just wouldn't want a situtation where sys.stdout.encoding (specifying what write expects) is different
from what the display is capable of, because that would mislead clients about capabilities.

if that's not a possibility, so much the better.

ghost · 2012-10-22T16:36:11Z

that would be a problem, since sys.getdefaultencoding() returns ascii,
encoding.py actually tries locale.getpreferedencoding() first.

ghost · 2012-10-22T16:40:21Z

'utf-8' is good, I just can't the bit of code that explicitly states that qt and nb always use utf-8 and nothing but.

takluyver · 2012-10-22T16:41:36Z

I suggest: @bfroehle , can you change this, and write() to UTF-8. Then we'll want a Windows user to test that it does the right thing with various bits of unicode. @y-p , are you on Windows? Or @jstenar is our unicode-on-Windows expert.

ghost · 2012-10-22T16:42:15Z

I have access to a windows machine, would gladly test whatever you'd like.

takluyver · 2012-10-22T16:45:24Z

It's rather confusing: when you encode something to utf-8 to return from repr(), OutStream will decode it to unicode to pack it into JSON (all strings in JSON are unicode). In serialising the JSON, it is then encoded once again to UTF-8. The frontend unserialises, again getting unicode strings, which it passes to the display mechanism (Qt or the browser), which should be unicode capable.

ghost · 2012-10-22T16:46:17Z

what's confusing about that? :)

takluyver · 2012-10-22T16:47:52Z

Perhaps 'convoluted' would be a better word ;-)

This means that `sys.stdout.encoding` returns something useful instead of raising an AttributeError. 'UTF-8' was chosen because our version of write actually wants a unicode string (i.e., not `bytes`) and we want something universal which an encode the entire range of unicode characters.

bfroehle · 2012-10-22T17:17:55Z

@takluyver & @y-p I think this is what you both were suggesting... Let's give this a thorough test in Windows now.

ghost · 2012-10-22T17:24:08Z

Will do.

ghost · 2012-10-22T19:54:28Z

Test suite passes on 2.7 on win7 (no errors beyond those of current git master)

and this works for me, using qtconsole

import sys 
class A():
    def __repr__(self):
        return u"שלום".encode(sys.stdout.encoding)
A()

שלום

as a sidenote, settting up for tests was pretty painful. no CI server?

ghost · 2012-10-22T19:58:23Z

and of course:

sys.stdout.encoding == 'UTF-8'

minrk · 2012-10-22T20:13:31Z

as a sidenote, settting up for tests was pretty painful. no CI server?

As usual, Windows is dramatically more painful than other platforms, principally because there are no IPython devs who use it. That said, I have found that easy_install nose was the only step I have ever had to take to be ready to test on my Windows VMs.

And we do have two CI environments running. Travis automatically runs all PRs (as seen in the 'Good to merge' notes on this and every PR), and we also have ShiningPanda, which recently started Windows tests (again, much more painful to do CI on Windows than other platforms).

bfroehle · 2012-10-22T20:22:45Z

Before we merge this, we should note that this pull request brings us almost back to before 19d5c41 ("Handle unicode properly in IPython.zmq.iostream") (pull request #534, issue #529). @y-p those linked issues should give you a bunch of additional test cases to try out.

ghost · 2012-10-22T20:34:21Z

@minrk, python, then setuptools (from web), then pip (that's what I use on linux),then nose, then setup.py develop, then all the deps by hand, then back to easy_install because some deps where egg-only, and then more binaries from
the web because the machine doesn't have a compiler installed. I obviously went off the well-traveled path somewhere.

ghost · 2012-10-22T20:45:18Z

replicating the tests in #529:

In [7]: print u"åäö".encode('utf-8')
åäö

In [8]: print u"åäö"
åäö

In [5]: "åäö"
Out[5]: '\xc3\xa5\xc3\xa4\xc3\xb6'

for completeness

In [2]: u"åäö".encode("utf-8").decode("cp850")
Out[2]: u'\u251c\xd1\u251c\xf1\u251c\xc2'

In [3]: print u"åäö".encode("utf-8").decode("cp850")
├Ñ├ñ├Â

In [4]: print u"åäö".encode('cp850')
���

In [6]: print u"åäö".encode('cp1252')
���

where � is the unicode replacement symbol, which windows mangles for some reason, but
renders as the usual blotchy-question-mark in qtconsole.

takluyver · 2012-10-22T23:20:42Z

@y-p : Could you also test in the qtconsole print "שלוםåäö" (without the u prefix)? I know that doesn't behave correctly in the terminal, but I think with this test it should work in the Qt console and the notebook.

Our Windows CI runs are currently failing, unfortunately. If you've got spare time, could you test #2478, which should address the test failure.

ghost · 2012-10-23T05:36:31Z

In notebook.

In [16]: print "שלוםåäö"
שלוםåäö

In qtconsole, the display is right aligned because it begins with RTL script,
but that's not new to this patch. other then that it prints fine.

takluyver · 2012-10-23T08:29:41Z

OK, then I think this is all OK. @bfroehle , anything else you want to check? If not, let's land this.

bfroehle · 2012-10-23T16:31:32Z

Okay, hope it doesn't break anything! :) Merging now.

Add `encoding` attribute to `OutStream` class.

This means that `sys.stdout.encoding` returns something useful instead of raising an AttributeError. Closes #2220.

when redirected to a file stdout is None, in ipython zmq frontends stdin is None, and stdout (only recently in 0.14) is utf-8. ipython/ipython#2500

Add `encoding` attribute to `OutStream` class.

bfroehle added a commit that referenced this pull request Oct 23, 2012

Merge pull request #2500 from bfroehle/outstream_encoding

2e1736c

Add `encoding` attribute to `OutStream` class.

bfroehle merged commit 2e1736c into ipython:master Oct 23, 2012

bfroehle mentioned this pull request Oct 23, 2012

No sys.stdout.encoding in kernel based IPython #2220

Closed

ghost mentioned this pull request Nov 3, 2012

PR: adding a core.config module to hold package-wide configurables pandas-dev/pandas#2097

Merged

minrk added a commit that referenced this pull request Mar 5, 2013

Backport PR #2500: Add encoding attribute to OutStream class.

f635516

This means that `sys.stdout.encoding` returns something useful instead of raising an AttributeError. Closes #2220.

minrk mentioned this pull request Mar 27, 2013

qtconsole breaks doctest.testmod() in Python 3.3 #3091

Closed

jrioux mentioned this pull request Apr 16, 2013

Wrong sys.stdout.encoding in ipython kernel-based console. #3187

Closed

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#2500 from bfroehle/outstream_encoding

717d7ae

Add `encoding` attribute to `OutStream` class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `encoding` attribute to `OutStream` class. #2500

Add `encoding` attribute to `OutStream` class. #2500

bfroehle commented Oct 17, 2012

takluyver commented Oct 17, 2012

bfroehle commented Oct 18, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

bfroehle commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

minrk commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 23, 2012

takluyver commented Oct 23, 2012

bfroehle commented Oct 23, 2012

Add encoding attribute to OutStream class. #2500

Add encoding attribute to OutStream class. #2500

Conversation

bfroehle commented Oct 17, 2012

takluyver commented Oct 17, 2012

bfroehle commented Oct 18, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

bfroehle commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

minrk commented Oct 22, 2012

bfroehle commented Oct 22, 2012

ghost commented Oct 22, 2012

ghost commented Oct 22, 2012

takluyver commented Oct 22, 2012

ghost commented Oct 23, 2012

takluyver commented Oct 23, 2012

bfroehle commented Oct 23, 2012

Add `encoding` attribute to `OutStream` class. #2500

Add `encoding` attribute to `OutStream` class. #2500