Skip to content
This repository

#769 (reopened) #770

Merged
merged 3 commits into from over 2 years ago

3 participants

Min RK Paul Ivanov Fernando Perez
Min RK
Owner

reopens #769 with further fixes

Ensures all replies from ipkernel are clean for json (not just oinfo), and guesses stdin.encoding before using sys.getdefaultencoding in json_clean.

Min RK
Owner

@takluyver, does this seem more reasonable for the json_clean?

It now performs the same check as elsewhere, which still fails to get an answer in plenty of situations, and cleans more than just oinfo messages.

Min RK
Owner

@takluyver perhaps we should have an IPython.zmq.defaultencoding that starts out as just sys.stdin.encoding or sys.getdefaultencoding(), but people can change its value. Since the kernel can be started without there ever being a terminal associated with it (e.g. as a GUI script), it makes sense for there to be somewhere to store an encoding that should be used to interpret bytes. I think there just isn't a reliable way for us to always get the right answer, and when that's the case, it makes sense to let advanced users make the choice.

Min RK
Owner

It seems like locale.getpreferredencoding() is a less conservative choice for fallback than sys.getdefaultencoding(). Perhaps we should use that.

Min RK
Owner

updated with IPython.utils.text.getdefaultencoding(), which also fixes the issue described in #775 (at least on my OSX machine).

Paul Ivanov
Collaborator

confirming that issue described in #775 is fixed by this PL

Min RK
Owner

Confirmed as closing #768

added some commits September 06, 2011
Min RK ensure replies from ipkernel are clean for JSON 5a3b97e
Min RK add text.getdefaultencoding() for central default encoding guess
This is a central location for the many places we call sys.stdin.encoding or sys.getdefaultencoding(), which
now adds locale.getpreferredencoding(False) after stdin.encoding,
which should be a better guess when stdin.encoding is None.
6392ceb
Min RK json_clean zmqshell replies
closes gh-535
bc4e206
Fernando Perez
Owner

Just a note, Min: this one doesn't seem to solve the weird errors on %debug in the console I was mentioning today. To reproduce those:

  1. start a notebook, type anything in a cell that would cause an exception (but from code executed in the cell, not by using %run).

  2. open a qt console to the notebook's kernel

  3. type %debug in the console

At 3, I see little triangle junk characters in the traceback printout. If the traceback was generated from %run, there's no problem.

Min RK
Owner

@fperez, too bad it's not this. Can you get info on the characters that are being printed? I should note that I can't actually reproduce what you describe by following your instructions (with a 1/0 error).

Fernando Perez
Owner

Weird, here's a screenshot of what I get: http://imgur.com/RWa10

I've tried it with a few different fonts in the Qt widget and I get the same thing, so it doesn't seem to be font-related... Any ideas? I can hop on irc if you want...

Min RK
Owner

@fperez

I got to my Ubuntu machine today, and I can reproduce what you describe without ever invoking a notebook. Just a single qtconsole, raise an error and invoke %debug, and I see the weird triangles. It happend with every invocation of %debug in the qtconsole, no need for multiple clients. And it's reproducible all the way back to 0.11 release, so it has nothing to do with recent unicode fixes like I thought.

This is using PyQt4 (Ubuntu 10.04 LTS, qt4/pyqt4 from apt: PyQt4 4.7.2, Qt 4.6.2).

As we discussed on IRC, this only affects PyQt, and not PySide, and the reason it appears new is that your PySide is 1.0.0, and PR #725 made the minimum PySide version 1.0.3, effectively switching your default from PySide to PyQt.

Fernando Perez
Owner

Yup, you're right, I see it too. At least it's good to know it's an old problem we simply hadn't noticed and not something we broke recently... I'll open an issue and ping Evan about it. Thanks for the extra info!

Min RK
Owner

I added debug statements to the frontend, and I can see the difference between the normal traceback sent, and the one that's drawn wrong: null characters. Each triangle corresponds to a (\x00) char that is somehow added to the color-code only in the source, and only from ipdb.

Raising the error with 222/0, the line in the traceback is:
\x1b[0;32m----> 1\x1b[0;31m \x1b[0;36m222\x1b[0m\x1b[0;34m/\x1b[0m\x1b[0;36m0\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m

Whereas the same line, colored by ipdb is:
\x1b[0;32m -1 \x1b[0;31m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m2\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m/\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;36m0\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m

It would appear that PySide just ignores the \x00 null chars, whereas PyQt draws them as triangles. You can see this by simply doing print '\x00'. You will see a triangle with pyqt, and nothing in every other context I can find.

So this really seems like a bug in ipdb, just one that doesn't actually matter anywhere but in a pyqt-console.

Min RK
Owner

I should clarify, not a bug in ipdb, but rather a bug in pycolorize, which doesn't like unicode input. See the output of:

In [20]: from IPython.utils import PyColorize
    ...: p = PyColorize.Parser()

In [21]: p.format('5', 'str')
Out[21]: '\x1b[0;36m5\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m\n'

In [22]: p.format(u'5', 'str')
Out[22]: '\x1b[0;36m5\x1b[0m\x1b[0;31m\x00\x1b[0m\x1b[0;34m\x1b[0m\x1b[0m\n'
Fernando Perez
Owner

Nailed, thanks! Now at least we know where the problem is coming from...

I'll have a go at it now.

Min RK
Owner

Ah, easy fix: use StringIO instead of cStringIO. StringIO is unicode aware, but cStringIO is not. Thomas discovered this, and has fixed it in some other parts of the code, if I recall correctly. Should we just find/replace all cStringIOs (there are still a few)?

Fernando Perez
Owner
Fernando Perez
Owner

Back to actually discussing this PR, sorry for hijacking things with the other bug...

This looks good and the right thing to do. Min, thanks for the work! I'll merge now.

Fernando Perez fperez merged commit ede7936 into from September 12, 2011
Fernando Perez fperez closed this September 12, 2011
Fernando Perez fperez referenced this pull request from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 3 unique commits by 1 author.

Sep 09, 2011
Min RK ensure replies from ipkernel are clean for JSON 5a3b97e
Min RK add text.getdefaultencoding() for central default encoding guess
This is a central location for the many places we call sys.stdin.encoding or sys.getdefaultencoding(), which
now adds locale.getpreferredencoding(False) after stdin.encoding,
which should be a better guess when stdin.encoding is None.
6392ceb
Min RK json_clean zmqshell replies
closes gh-535
bc4e206
This page is out of date. Refresh to see the latest.
7  IPython/config/loader.py
@@ -24,7 +24,7 @@
24 24
 
25 25
 from IPython.external import argparse
26 26
 from IPython.utils.path import filefind, get_ipython_dir
27  
-from IPython.utils import py3compat, warn
  27
+from IPython.utils import py3compat, text, warn
28 28
 
29 29
 #-----------------------------------------------------------------------------
30 30
 # Exceptions
@@ -425,7 +425,7 @@ def _decode_argv(self, argv, enc=None):
425 425
         """decode argv if bytes, using stin.encoding, falling back on default enc"""
426 426
         uargv = []
427 427
         if enc is None:
428  
-            enc = sys.stdin.encoding or sys.getdefaultencoding()
  428
+            enc = text.getdefaultencoding()
429 429
         for arg in argv:
430 430
             if not isinstance(arg, unicode):
431 431
                 # only decode if not already decoded
@@ -586,7 +586,8 @@ def _add_arguments(self, aliases=None, flags=None):
586 586
     def _parse_args(self, args):
587 587
         """self.parser->self.parsed_data"""
588 588
         # decode sys.argv to support unicode command-line options
589  
-        uargs = [py3compat.cast_unicode(a) for a in args]
  589
+        enc = text.getdefaultencoding()
  590
+        uargs = [py3compat.cast_unicode(a, enc) for a in args]
590 591
         self.parsed_data, self.extra_args = self.parser.parse_known_args(uargs)
591 592
 
592 593
     def _convert_to_config(self):
3  IPython/utils/_process_win32.py
@@ -23,6 +23,7 @@
23 23
 
24 24
 # our own imports
25 25
 from ._process_common import read_no_interrupt, process_handler
  26
+from . import text
26 27
 
27 28
 #-----------------------------------------------------------------------------
28 29
 # Function definitions
@@ -88,7 +89,7 @@ def _find_cmd(cmd):
88 89
 
89 90
 def _system_body(p):
90 91
     """Callback for _system."""
91  
-    enc = sys.stdin.encoding or sys.getdefaultencoding()
  92
+    enc = text.getdefaultencoding()
92 93
     for line in read_no_interrupt(p.stdout).splitlines():
93 94
         line = line.decode(enc, 'replace')
94 95
         print(line, file=sys.stdout)
3  IPython/utils/jsonutil.py
@@ -17,6 +17,7 @@
17 17
 from datetime import datetime
18 18
 
19 19
 from IPython.utils import py3compat
  20
+from IPython.utils import text
20 21
 next_attr_name = '__next__' if py3compat.PY3 else 'next'
21 22
 
22 23
 #-----------------------------------------------------------------------------
@@ -134,7 +135,7 @@ def json_clean(obj):
134 135
         return obj
135 136
     
136 137
     if isinstance(obj, bytes):
137  
-        return obj.decode(sys.getdefaultencoding(), 'replace')
  138
+        return obj.decode(text.getdefaultencoding(), 'replace')
138 139
     
139 140
     if isinstance(obj, container_to_list) or (
140 141
         hasattr(obj, '__iter__') and hasattr(obj, next_attr_name)):
24  IPython/utils/text.py
@@ -16,9 +16,11 @@
16 16
 
17 17
 import __main__
18 18
 
  19
+import locale
19 20
 import os
20 21
 import re
21 22
 import shutil
  23
+import sys
22 24
 import textwrap
23 25
 from string import Formatter
24 26
 
@@ -31,6 +33,28 @@
31 33
 # Code
32 34
 #-----------------------------------------------------------------------------
33 35
 
  36
+# Less conservative replacement for sys.getdefaultencoding, that will try
  37
+# to match the environment.
  38
+# Defined here as central function, so if we find better choices, we
  39
+# won't need to make changes all over IPython.
  40
+def getdefaultencoding():
  41
+    """Return IPython's guess for the default encoding for bytes as text.
  42
+    
  43
+    Asks for stdin.encoding first, to match the calling Terminal, but that
  44
+    is often None for subprocesses.  Fall back on locale.getpreferredencoding()
  45
+    which should be a sensible platform default (that respects LANG environment),
  46
+    and finally to sys.getdefaultencoding() which is the most conservative option,
  47
+    and usually ASCII.
  48
+    """
  49
+    enc = sys.stdin.encoding
  50
+    if not enc:
  51
+        try:
  52
+            # There are reports of getpreferredencoding raising errors
  53
+            # in some cases, which may well be fixed, but let's be conservative here.
  54
+            enc = locale.getpreferredencoding(False)
  55
+        except Exception:
  56
+            pass
  57
+    return enc or sys.getdefaultencoding()
34 58
 
35 59
 def unquote_ends(istr):
36 60
     """Remove a single pair of quotes from the endpoints of a string."""
4  IPython/zmq/iostream.py
@@ -4,7 +4,7 @@
4 4
 
5 5
 from session import extract_header, Message
6 6
 
7  
-from IPython.utils import io
  7
+from IPython.utils import io, text
8 8
 
9 9
 #-----------------------------------------------------------------------------
10 10
 # Globals
@@ -69,7 +69,7 @@ def write(self, string):
69 69
         else:
70 70
             # Make sure that we're handling unicode
71 71
             if not isinstance(string, unicode):
72  
-                enc = sys.stdin.encoding or sys.getdefaultencoding()
  72
+                enc = text.getdefaultencoding()
73 73
                 string = string.decode(enc, 'replace')
74 74
             
75 75
             self._buffer.write(string)
5  IPython/zmq/ipkernel.py
@@ -303,6 +303,7 @@ def execute_request(self, ident, parent):
303 303
             time.sleep(self._execute_sleep)
304 304
         
305 305
         # Send the reply.
  306
+        reply_content = json_clean(reply_content)
306 307
         reply_msg = self.session.send(self.shell_socket, u'execute_reply',
307 308
                                       reply_content, parent, ident=ident)
308 309
         self.log.debug(str(reply_msg))
@@ -321,6 +322,7 @@ def complete_request(self, ident, parent):
321 322
         matches = {'matches' : matches,
322 323
                    'matched_text' : txt,
323 324
                    'status' : 'ok'}
  325
+        matches = json_clean(matches)
324 326
         completion_msg = self.session.send(self.shell_socket, 'complete_reply',
325 327
                                            matches, parent, ident)
326 328
         self.log.debug(str(completion_msg))
@@ -358,6 +360,7 @@ def history_request(self, ident, parent):
358 360
         else:
359 361
             hist = []
360 362
         content = {'history' : list(hist)}
  363
+        content = json_clean(content)
361 364
         msg = self.session.send(self.shell_socket, 'history_reply',
362 365
                                 content, parent, ident)
363 366
         self.log.debug(str(msg))
@@ -409,7 +412,7 @@ def _raw_input(self, prompt, ident, parent):
409 412
         sys.stdout.flush()
410 413
 
411 414
         # Send the input request.
412  
-        content = dict(prompt=prompt)
  415
+        content = json_clean(dict(prompt=prompt))
413 416
         msg = self.session.send(self.stdin_socket, u'input_request', content, parent)
414 417
 
415 418
         # Await a response.
5  IPython/zmq/zmqshell.py
@@ -30,6 +30,7 @@
30 30
 from IPython.core.magic import MacroToEdit
31 31
 from IPython.core.payloadpage import install_payload_page
32 32
 from IPython.utils import io
  33
+from IPython.utils.jsonutil import json_clean
33 34
 from IPython.utils.path import get_py_filename
34 35
 from IPython.utils.traitlets import Instance, Type, Dict, CBool
35 36
 from IPython.utils.warn import warn
@@ -69,7 +70,7 @@ def publish(self, source, data, metadata=None):
69 70
         content['data'] = data
70 71
         content['metadata'] = metadata
71 72
         self.session.send(
72  
-            self.pub_socket, u'display_data', content,
  73
+            self.pub_socket, u'display_data', json_clean(content),
73 74
             parent=self.parent_header
74 75
         )
75 76
 
@@ -144,7 +145,7 @@ def _showtraceback(self, etype, evalue, stb):
144 145
         dh = self.displayhook
145 146
         # Send exception info over pub socket for other clients than the caller
146 147
         # to pick up
147  
-        exc_msg = dh.session.send(dh.pub_socket, u'pyerr', exc_content, dh.parent_header)
  148
+        exc_msg = dh.session.send(dh.pub_socket, u'pyerr', json_clean(exc_content), dh.parent_header)
148 149
 
149 150
         # FIXME - Hack: store exception info in shell object.  Right now, the
150 151
         # caller is reading this info after the fact, we need to fix this logic
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.