qtconsole crashes with crcmod #2989

Open
the-moog opened this Issue Feb 28, 2013 · 9 comments

Projects

None yet

2 participants

@the-moog

When using crcmod in qtconsole I get the following crash.
I type the following into qtconsole:
import crcmod
crc32=crcmod.predefined.mkPredefinedCrcFun("crc-32")
crc32(

Exception occurs on typing the opening bracket, e.g. 'crc32('
crcmod is version 1.7 from pypi, installed using easy_install crcmod.
This crash does not happen in IPython in console mode.

ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 357, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
    callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/kernelmanager.py", line 179, in _handle_recv
    self.call_handlers(self.session.unserialize(smsg))
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 734, in unserialize
    message['content'] = self.unpack(msg_list[3])
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 79, in <lambda>
    json_unpacker = lambda s: extract_dates(jsonapi.loads(s))
  File "/usr/lib/python2.7/dist-packages/zmq/utils/jsonapi.py", line 82, in loads
    return jsonmod.loads(s,**kwargs)
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 385, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 402, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 418, in raw_decode
    obj, end = self.scan_once(s, idx)
JSONDecodeError: Unpaired low surrogate: line 1 column 2210 (char 2210)
ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 383, in _handle_events
    self._handle_recv()
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 423, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 357, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
    callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/kernelmanager.py", line 179, in _handle_recv
    self.call_handlers(self.session.unserialize(smsg))
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 734, in unserialize
    message['content'] = self.unpack(msg_list[3])
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 79, in <lambda>
    json_unpacker = lambda s: extract_dates(jsonapi.loads(s))
  File "/usr/lib/python2.7/dist-packages/zmq/utils/jsonapi.py", line 82, in loads
    return jsonmod.loads(s,**kwargs)
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 385, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 402, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 418, in raw_decode
    obj, end = self.scan_once(s, idx)
JSONDecodeError: Unpaired low surrogate: line 1 column 2210 (char 2210)
ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0xa67dbcc>
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 291, in start
    self._handlers[fd](fd, events)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 383, in _handle_events
    self._handle_recv()
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 423, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 357, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
    callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/kernelmanager.py", line 179, in _handle_recv
    self.call_handlers(self.session.unserialize(smsg))
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 734, in unserialize
    message['content'] = self.unpack(msg_list[3])
  File "/usr/local/lib/python2.7/dist-packages/IPython/zmq/session.py", line 79, in <lambda>
    json_unpacker = lambda s: extract_dates(jsonapi.loads(s))
  File "/usr/lib/python2.7/dist-packages/zmq/utils/jsonapi.py", line 82, in loads
    return jsonmod.loads(s,**kwargs)
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 385, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 402, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 418, in raw_decode
    obj, end = self.scan_once(s, idx)
JSONDecodeError: Unpaired low surrogate: line 1 column 2210 (char 2210)
@takluyver
IPython member

In a terminal, can you find the docstring for crc32 and crc32.__call__? It might be falling over on some odd sequence in that.

@the-moog

crcmod uses function factories of which the example, crc32, is one. The doc string (doc) is empty, but pressing '?' (question) in ipython produces gibberish. I don't know enough about the internals of ipython (or crcmod) to explain this.
I'm sure you are correct that the computer generated doc string is the likely cause of the problem, but I don't think it should be intended behaviour that qtconsole bombs so spectacularly if it receives data it does not like. If there are rules on the content of third party data, then IMO qtconsole should impose those rules.

@takluyver
IPython member
@the-moog

I did not look that closely yesterday. I just ignored it and worked round the problem (I wrote a script rather than used qtconsole). In ipython It prints lots of 'rubbish' then scrolls off the screen. However I found if I make my terminal larger I can see what is going on, and for crcmod it would be intended behaviour. No problem there. I've pasted it below, but snipped out some text from the middle as it is just the crc lookup table as a string.

I guess the problem in qtconsole is the size of the 'rubbish' it's probably overflowing some internal buffer or breaking some assumption about the size of default parameters to functions.

typing crc32? results in:

Type: function
String Form:
File: /usr/local/lib/python2.7/dist-packages/crcmod-1.7-py2.7-linux-i686.egg/crcmod/crcmod.py
Definition: crc32(data, crc=0L, table='\x00\x00\x00\x00\x960\x07w,a\x0e\xee\xbaQ\t\x99\x19\xc4m\x07\x8f\xf4jp5\xa5c\xe9\xa3\x95d\x9e2\x88\xdb\x0e\xa4\xb8\xdcy\x1e\xe9\xd5\xe0\x88\xd9\xd2\x97+L\xb6\t\xbd|\xb1~\x07-\xb8\xe7\x91\x1d\xbf\x90d\x10\xb7\x1d\xf2 \xb0jHq\xb9\xf3\xdeA\xbe\x84}\xd4\xda\x1a\xeb\xe4\xddmQ\xdcZ\xd6
...
..SNIP...
...
\xd9f\x0b\xdf@\xf0;\xd87S\xae\xbc\xa9\xc5\x9e\xbb\xde\x7f\xcf\xb2G\xe9\xff\xb50\x1c\xf2\xbd\xbd\x8a\xc2\xba\xca0\x93\xb3S\xa6\xa3\xb4$\x056\xd0\xba\x93\x06\xd7\xcd)W\xdeT\xbfg\xd9#.zf\xb3\xb8Ja\xc4\x02\x1bh]\x94+o*7\xbe\x0b\xb4\xa1\x8e\x0c\xc3\x1b\xdf\x05Z\x8d\xef\x02-', fun=)
Docstring:

@takluyver
IPython member

Ah, right, it's the function signature. It's the content of the rubbish, not the size of it, that's the issue. It's a series of random bytes, which we attempt to decode into unicode to send (because json strings are unicode, not bytes). That much is OK, using the replacement character a lot. However, some of the bytes randomly happen to be valid UTF-8 sequences, so they are decoded and sent as unicode code points. Then on the receiving end, the json decoder sees a code point from the range dedicated to surrogate pairs, which causes it to throw a wobbly.

Ugh, that's fiddly. There are two things we could change, both with some drawbacks:

  • Send the repr() of bytes strings in object info, rather than trying to decode them. The downside is that, on Python 2, people often use bytestrings to represent text, so you might want it to appear as 'café', not 'caf\xc3\xa9'. (I've just tested, and the error doesn't occur on Python 3)
  • Catch any failures to decode to parse Javascript. In the case of an object_info_reply, the program can continue happily without it. But we don't know what type of message it is until we've parsed it, and ignoring them all could make debugging tricky later.
@the-moog

Hmmm, I guess the question is, is qtconsole going to barf lots just because somebody passes random data in the function signature, or is crcmod a one off?
If the author of crcmod was to change his/her code to use something other than strings then it would be fine. But is that just burying the problem under the rug from the point of view of qtconsole?
Can't ipython treat non unicode strings in function signatures that contain ord(c) < 32 or ord(c) > 128 as a special case?

@takluyver
IPython member

In principle, the problem could come up with random data from any function signature. In practice, I've not come across anything else that has this problem. It only manifests itself when the bytes happen to form a UTF-8 sequence, and that happens to represent a code point from the surrogate pairs range. Oh, and it only occurs on wide-unicode Python builds, which I think are mainly on Linux.

It's not just function signatures, unfortunately. If you construct crc32 then do print crc32.func_defaults[1], you get a similar error. The minimal case is print u'\udc00'.encode('utf-8').

Perhaps, when we prepare the JSON to send, we should replace any surrogate code points with the object replacement character (�). Although ideally we'd avoid doing that for valid surrogate pairs, which will be in use on narrow-unicode Python builds, like those on Windows.

@the-moog

Perhaps simply having the render function detect illegal characters, then replace the string u"\udc00" with " length 2"

@takluyver
IPython member

It's not the rendering, though - it's causing problems right down in the internals of our messaging protocol. The tricky bit will be efficiently deciding when the characters are invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment