Skip to content


#769 (reopened) #770

merged 3 commits into from

3 participants

IPython member

reopens #769 with further fixes

Ensures all replies from ipkernel are clean for json (not just oinfo), and guesses stdin.encoding before using sys.getdefaultencoding in json_clean.

IPython member

@takluyver, does this seem more reasonable for the json_clean?

It now performs the same check as elsewhere, which still fails to get an answer in plenty of situations, and cleans more than just oinfo messages.

IPython member

@takluyver perhaps we should have an IPython.zmq.defaultencoding that starts out as just sys.stdin.encoding or sys.getdefaultencoding(), but people can change its value. Since the kernel can be started without there ever being a terminal associated with it (e.g. as a GUI script), it makes sense for there to be somewhere to store an encoding that should be used to interpret bytes. I think there just isn't a reliable way for us to always get the right answer, and when that's the case, it makes sense to let advanced users make the choice.

IPython member

It seems like locale.getpreferredencoding() is a less conservative choice for fallback than sys.getdefaultencoding(). Perhaps we should use that.

IPython member

updated with IPython.utils.text.getdefaultencoding(), which also fixes the issue described in #775 (at least on my OSX machine).

IPython member

confirming that issue described in #775 is fixed by this PL

IPython member

Confirmed as closing #768

minrk added some commits
@minrk minrk ensure replies from ipkernel are clean for JSON 5a3b97e
@minrk minrk add text.getdefaultencoding() for central default encoding guess
This is a central location for the many places we call sys.stdin.encoding or sys.getdefaultencoding(), which
now adds locale.getpreferredencoding(False) after stdin.encoding,
which should be a better guess when stdin.encoding is None.
@minrk minrk json_clean zmqshell replies
closes gh-535
IPython member

Just a note, Min: this one doesn't seem to solve the weird errors on %debug in the console I was mentioning today. To reproduce those:

  1. start a notebook, type anything in a cell that would cause an exception (but from code executed in the cell, not by using %run).

  2. open a qt console to the notebook's kernel

  3. type %debug in the console

At 3, I see little triangle junk characters in the traceback printout. If the traceback was generated from %run, there's no problem.

IPython member

@fperez, too bad it's not this. Can you get info on the characters that are being printed? I should note that I can't actually reproduce what you describe by following your instructions (with a 1/0 error).

IPython member

Weird, here's a screenshot of what I get:

I've tried it with a few different fonts in the Qt widget and I get the same thing, so it doesn't seem to be font-related... Any ideas? I can hop on irc if you want...

IPython member


I got to my Ubuntu machine today, and I can reproduce what you describe without ever invoking a notebook. Just a single qtconsole, raise an error and invoke %debug, and I see the weird triangles. It happend with every invocation of %debug in the qtconsole, no need for multiple clients. And it's reproducible all the way back to 0.11 release, so it has nothing to do with recent unicode fixes like I thought.

This is using PyQt4 (Ubuntu 10.04 LTS, qt4/pyqt4 from apt: PyQt4 4.7.2, Qt 4.6.2).

As we discussed on IRC, this only affects PyQt, and not PySide, and the reason it appears new is that your PySide is 1.0.0, and PR #725 made the minimum PySide version 1.0.3, effectively switching your default from PySide to PyQt.

IPython member

Yup, you're right, I see it too. At least it's good to know it's an old problem we simply hadn't noticed and not something we broke recently... I'll open an issue and ping Evan about it. Thanks for the extra info!

IPython member

I added debug statements to the frontend, and I can see the difference between the normal traceback sent, and the one that's drawn wrong: null characters. Each triangle corresponds to a (\x00) char that is somehow added to the color-code only in the source, and only from ipdb.

Raising the error with 222/0, the line in the traceback is:
\x1b[0;32m----> 1\x1b[0;31m \x1b[0;36m222\x1b[0m\x1b[0;34m/\x1b[0m\x1b[0;36m0\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m

Whereas the same line, colored by ipdb is:
\x1b[0;32m -1 \x1b[0;31m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m/\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m0\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m

It would appear that PySide just ignores the \x00 null chars, whereas PyQt draws them as triangles. You can see this by simply doing print '\x00'. You will see a triangle with pyqt, and nothing in every other context I can find.

So this really seems like a bug in ipdb, just one that doesn't actually matter anywhere but in a pyqt-console.

IPython member

I should clarify, not a bug in ipdb, but rather a bug in pycolorize, which doesn't like unicode input. See the output of:

In [20]: from IPython.utils import PyColorize
    ...: p = PyColorize.Parser()

In [21]: p.format('5', 'str')
Out[21]: '\x1b[0;36m5\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m\n'

In [22]: p.format(u'5', 'str')
Out[22]: '\x1b[0;36m5\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m\n'
IPython member

Nailed, thanks! Now at least we know where the problem is coming from...

I'll have a go at it now.

IPython member

Ah, easy fix: use StringIO instead of cStringIO. StringIO is unicode aware, but cStringIO is not. Thomas discovered this, and has fixed it in some other parts of the code, if I recall correctly. Should we just find/replace all cStringIOs (there are still a few)?

IPython member
IPython member

Back to actually discussing this PR, sorry for hijacking things with the other bug...

This looks good and the right thing to do. Min, thanks for the work! I'll merge now.

@fperez fperez merged commit ede7936 into ipython:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 9, 2011
  1. @minrk
  2. @minrk

    add text.getdefaultencoding() for central default encoding guess

    minrk committed
    This is a central location for the many places we call sys.stdin.encoding or sys.getdefaultencoding(), which
    now adds locale.getpreferredencoding(False) after stdin.encoding,
    which should be a better guess when stdin.encoding is None.
  3. @minrk

    json_clean zmqshell replies

    minrk committed
    closes gh-535
7 IPython/config/
@@ -24,7 +24,7 @@
from IPython.external import argparse
from IPython.utils.path import filefind, get_ipython_dir
-from IPython.utils import py3compat, warn
+from IPython.utils import py3compat, text, warn
# Exceptions
@@ -425,7 +425,7 @@ def _decode_argv(self, argv, enc=None):
"""decode argv if bytes, using stin.encoding, falling back on default enc"""
uargv = []
if enc is None:
- enc = sys.stdin.encoding or sys.getdefaultencoding()
+ enc = text.getdefaultencoding()
for arg in argv:
if not isinstance(arg, unicode):
# only decode if not already decoded
@@ -586,7 +586,8 @@ def _add_arguments(self, aliases=None, flags=None):
def _parse_args(self, args):
# decode sys.argv to support unicode command-line options
- uargs = [py3compat.cast_unicode(a) for a in args]
+ enc = text.getdefaultencoding()
+ uargs = [py3compat.cast_unicode(a, enc) for a in args]
self.parsed_data, self.extra_args = self.parser.parse_known_args(uargs)
def _convert_to_config(self):
3 IPython/utils/
@@ -23,6 +23,7 @@
# our own imports
from ._process_common import read_no_interrupt, process_handler
+from . import text
# Function definitions
@@ -88,7 +89,7 @@ def _find_cmd(cmd):
def _system_body(p):
"""Callback for _system."""
- enc = sys.stdin.encoding or sys.getdefaultencoding()
+ enc = text.getdefaultencoding()
for line in read_no_interrupt(p.stdout).splitlines():
line = line.decode(enc, 'replace')
print(line, file=sys.stdout)
3 IPython/utils/
@@ -17,6 +17,7 @@
from datetime import datetime
from IPython.utils import py3compat
+from IPython.utils import text
next_attr_name = '__next__' if py3compat.PY3 else 'next'
@@ -134,7 +135,7 @@ def json_clean(obj):
return obj
if isinstance(obj, bytes):
- return obj.decode(sys.getdefaultencoding(), 'replace')
+ return obj.decode(text.getdefaultencoding(), 'replace')
if isinstance(obj, container_to_list) or (
hasattr(obj, '__iter__') and hasattr(obj, next_attr_name)):
24 IPython/utils/
@@ -16,9 +16,11 @@
import __main__
+import locale
import os
import re
import shutil
+import sys
import textwrap
from string import Formatter
@@ -31,6 +33,28 @@
# Code
+# Less conservative replacement for sys.getdefaultencoding, that will try
+# to match the environment.
+# Defined here as central function, so if we find better choices, we
+# won't need to make changes all over IPython.
+def getdefaultencoding():
+ """Return IPython's guess for the default encoding for bytes as text.
+ Asks for stdin.encoding first, to match the calling Terminal, but that
+ is often None for subprocesses. Fall back on locale.getpreferredencoding()
+ which should be a sensible platform default (that respects LANG environment),
+ and finally to sys.getdefaultencoding() which is the most conservative option,
+ and usually ASCII.
+ """
+ enc = sys.stdin.encoding
+ if not enc:
+ try:
+ # There are reports of getpreferredencoding raising errors
+ # in some cases, which may well be fixed, but let's be conservative here.
+ enc = locale.getpreferredencoding(False)
+ except Exception:
+ pass
+ return enc or sys.getdefaultencoding()
def unquote_ends(istr):
"""Remove a single pair of quotes from the endpoints of a string."""
4 IPython/zmq/
@@ -4,7 +4,7 @@
from session import extract_header, Message
-from IPython.utils import io
+from IPython.utils import io, text
# Globals
@@ -69,7 +69,7 @@ def write(self, string):
# Make sure that we're handling unicode
if not isinstance(string, unicode):
- enc = sys.stdin.encoding or sys.getdefaultencoding()
+ enc = text.getdefaultencoding()
string = string.decode(enc, 'replace')
5 IPython/zmq/
@@ -303,6 +303,7 @@ def execute_request(self, ident, parent):
# Send the reply.
+ reply_content = json_clean(reply_content)
reply_msg = self.session.send(self.shell_socket, u'execute_reply',
reply_content, parent, ident=ident)
@@ -321,6 +322,7 @@ def complete_request(self, ident, parent):
matches = {'matches' : matches,
'matched_text' : txt,
'status' : 'ok'}
+ matches = json_clean(matches)
completion_msg = self.session.send(self.shell_socket, 'complete_reply',
matches, parent, ident)
@@ -358,6 +360,7 @@ def history_request(self, ident, parent):
hist = []
content = {'history' : list(hist)}
+ content = json_clean(content)
msg = self.session.send(self.shell_socket, 'history_reply',
content, parent, ident)
@@ -409,7 +412,7 @@ def _raw_input(self, prompt, ident, parent):
# Send the input request.
- content = dict(prompt=prompt)
+ content = json_clean(dict(prompt=prompt))
msg = self.session.send(self.stdin_socket, u'input_request', content, parent)
# Await a response.
5 IPython/zmq/
@@ -30,6 +30,7 @@
from IPython.core.magic import MacroToEdit
from IPython.core.payloadpage import install_payload_page
from IPython.utils import io
+from IPython.utils.jsonutil import json_clean
from IPython.utils.path import get_py_filename
from IPython.utils.traitlets import Instance, Type, Dict, CBool
from IPython.utils.warn import warn
@@ -69,7 +70,7 @@ def publish(self, source, data, metadata=None):
content['data'] = data
content['metadata'] = metadata
- self.pub_socket, u'display_data', content,
+ self.pub_socket, u'display_data', json_clean(content),
@@ -144,7 +145,7 @@ def _showtraceback(self, etype, evalue, stb):
dh = self.displayhook
# Send exception info over pub socket for other clients than the caller
# to pick up
- exc_msg = dh.session.send(dh.pub_socket, u'pyerr', exc_content, dh.parent_header)
+ exc_msg = dh.session.send(dh.pub_socket, u'pyerr', json_clean(exc_content), dh.parent_header)
# FIXME - Hack: store exception info in shell object. Right now, the
# caller is reading this info after the fact, we need to fix this logic
Something went wrong with that request. Please try again.