UnicodeDecodeError in py3compat from "xlrd??" #1177

jdmarch opened this Issue Dec 18, 2011 · 10 comments


None yet

4 participants

jdmarch commented Dec 18, 2011

EPD python 2.7 and xlrd
ipython from master

Windows or OSX. Here is traceback on OSX:

In [1]: import xlrd

In [2]: xlrd??
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (11, 0))
UnicodeDecodeError                        Traceback (most recent call last)
/Users/jmarch/<ipython-input-2-361a5c5c220b> in <module>()
----> 1 get_ipython().magic(u'pinfo2 xlrd')

/Users/jmarch/ipython/IPython/core/interactiveshell.pyc in magic(self, arg_s, next_input)
   1996                 self._magic_locals = sys._getframe(1).f_locals
   1997             with self.builtin_trap:
-> 1998                 result = fn(magic_args)
   1999             # Ensure we're not keeping object references around:

   2000             self._magic_locals = {}

/Users/jmarch/ipython/IPython/core/magic.py in magic_pinfo2(self, parameter_s, namespaces)
    564         '%pinfo2 object' is just a synonym for object?? or ??object."""
    565         self.shell._inspect('pinfo', parameter_s, detail_level=1,
--> 566                             namespaces=namespaces)
    568     @skip_doctest

/Users/jmarch/ipython/IPython/core/interactiveshell.pyc in _inspect(self, meth, oname, namespaces, **kw)
   1416                 pmethod(info.obj, oname, formatter)
   1417             elif meth == 'pinfo':
-> 1418                 pmethod(info.obj, oname, formatter, info, **kw)
   1419             else:
   1420                 pmethod(info.obj, oname)

/Users/jmarch/ipython/IPython/core/oinspect.pyc in pinfo(self, obj, oname, formatter, info, detail_level)
    458         # source found.

    459         if detail_level > 0 and info['source'] is not None:
--> 460             displayfields.append(("Source", self.format(py3compat.unicode_to_str(info['source']))))
    461         elif info['docstring'] is not None:
    462             displayfields.append(("Docstring", info["docstring"]))

/Users/jmarch/ipython/IPython/utils/py3compat.pyc in encode(u, encoding)
     18 def encode(u, encoding=None):
     19     encoding = encoding or sys.stdin.encoding or sys.getdefaultencoding()
---> 20     return u.encode(encoding, "replace")
     22 def cast_unicode(s, encoding=None):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 77: ordinal not in range(128)
fperez commented Dec 18, 2011

Ouch. Would be nice if we could fix this in the nick of time for 0.12...


At a glance, it might be as simple as replacing the call to unicode_to_str with cast_bytes_py2. We'll need to test on a few different things in Py 2 and 3, though.

fperez commented Dec 18, 2011

Yes, actually because the ?? machinery is so central to everything we do, I think we should play it safe and fix this for 0.13. While it's very nasty to have a bug like this in there, we know that encountering it in the wild is rare (else it would have been reported many times long ago). And there is a non-trivial risk of introducing a much worse bug by messing with that code this late in the game. I've been burned before by much more innocent looking changes, so let's play it safe.

Great to have a fix, but let's not rush it for the official 0.12.


Git bisect points to afdb570.

fperez commented Dec 18, 2011

ah, interesting... This is a relatively new issue then. One thing we need before doing anything else is a clean PR that includes a test that fails in the current state of the code and that is fixed by the PR.

I'm still mildly against making such a change this late in the game, it's just really dangerous. A more sensible plan is to cut 0.12 as is, and if need be we can cut a 0.12.1 with a few fixes in a couple of weeks.


@fperez The rarity you refer to is undoubtedly because xlrd/__init__.py uses the cp1252 encoding.

fperez commented Dec 18, 2011

OK, all the more reason to not rush this one; that's an uncommon encoding to declare explicitly and I worry about all of a sudden breaking badly much more common use cases. Thanks a ton for the detective work, @bfroehle, this is very useful.


There are places where we'll trip up on a cp1252 encoded file, but I don't think this is one of them. I think for some reason I assumed that the source code would be retrieved as unicode in Python 2. I suspect the same problem will occur with any non-ascii character in Python 2.

I've just tried with cast_bytes_py2, and it seems to work fine. On a theoretical level, I can't see any way in which it could introduce more failures:

Py2 str input: this failure case, cast_bytes_py2 works
Py2 unicode input: either function will have the same effect
Py3: both functions are no-ops (but should always be called with unicode anyway).

@takluyver takluyver closed this in a85c230 Dec 18, 2011

Min and I agreed that the fix wouldn't break anything else, so I've merged
it, and opened #1179 to add a test for it.


Regarding @takluyver's comment:

I think for some reason I assumed that the source code would be retrieved as unicode in Python 2.

The source code is retrieved using linecache.getlines() which essentially just does

with open(fullname, 'rU') as fp:
    lines = fp.readlines()
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@takluyver takluyver Correct string type casting in pinfo.
Closes gh-1177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment