Skip to content
This repository

UnicodeDecodeError in py3compat from "xlrd??" #1177

Closed
jdmarch opened this Issue December 18, 2011 · 10 comments

4 participants

Jonathan March Thomas Kluyver Fernando Perez Bradley M. Froehle
Jonathan March
Collaborator

EPD python 2.7 and xlrd
ipython from master

Windows or OSX. Here is traceback on OSX:

In [1]: import xlrd

In [2]: xlrd??
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (11, 0))
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/Users/jmarch/<ipython-input-2-361a5c5c220b> in <module>()
----> 1 get_ipython().magic(u'pinfo2 xlrd')

/Users/jmarch/ipython/IPython/core/interactiveshell.pyc in magic(self, arg_s, next_input)
   1996                 self._magic_locals = sys._getframe(1).f_locals
   1997             with self.builtin_trap:
-> 1998                 result = fn(magic_args)
   1999             # Ensure we're not keeping object references around:

   2000             self._magic_locals = {}

/Users/jmarch/ipython/IPython/core/magic.py in magic_pinfo2(self, parameter_s, namespaces)
    564         '%pinfo2 object' is just a synonym for object?? or ??object."""
    565         self.shell._inspect('pinfo', parameter_s, detail_level=1,
--> 566                             namespaces=namespaces)
    567 
    568     @skip_doctest

/Users/jmarch/ipython/IPython/core/interactiveshell.pyc in _inspect(self, meth, oname, namespaces, **kw)
   1416                 pmethod(info.obj, oname, formatter)
   1417             elif meth == 'pinfo':
-> 1418                 pmethod(info.obj, oname, formatter, info, **kw)
   1419             else:
   1420                 pmethod(info.obj, oname)

/Users/jmarch/ipython/IPython/core/oinspect.pyc in pinfo(self, obj, oname, formatter, info, detail_level)
    458         # source found.

    459         if detail_level > 0 and info['source'] is not None:
--> 460             displayfields.append(("Source", self.format(py3compat.unicode_to_str(info['source']))))
    461         elif info['docstring'] is not None:
    462             displayfields.append(("Docstring", info["docstring"]))

/Users/jmarch/ipython/IPython/utils/py3compat.pyc in encode(u, encoding)
     18 def encode(u, encoding=None):
     19     encoding = encoding or sys.stdin.encoding or sys.getdefaultencoding()
---> 20     return u.encode(encoding, "replace")
     21 
     22 def cast_unicode(s, encoding=None):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 77: ordinal not in range(128)
Fernando Perez
Owner

Ouch. Would be nice if we could fix this in the nick of time for 0.12...

Thomas Kluyver
Collaborator

At a glance, it might be as simple as replacing the call to unicode_to_str with cast_bytes_py2. We'll need to test on a few different things in Py 2 and 3, though.

Fernando Perez
Owner

Yes, actually because the ?? machinery is so central to everything we do, I think we should play it safe and fix this for 0.13. While it's very nasty to have a bug like this in there, we know that encountering it in the wild is rare (else it would have been reported many times long ago). And there is a non-trivial risk of introducing a much worse bug by messing with that code this late in the game. I've been burned before by much more innocent looking changes, so let's play it safe.

Great to have a fix, but let's not rush it for the official 0.12.

Bradley M. Froehle
Collaborator

Git bisect points to afdb570.

Fernando Perez
Owner

ah, interesting... This is a relatively new issue then. One thing we need before doing anything else is a clean PR that includes a test that fails in the current state of the code and that is fixed by the PR.

I'm still mildly against making such a change this late in the game, it's just really dangerous. A more sensible plan is to cut 0.12 as is, and if need be we can cut a 0.12.1 with a few fixes in a couple of weeks.

Bradley M. Froehle
Collaborator

@fperez The rarity you refer to is undoubtedly because xlrd/__init__.py uses the cp1252 encoding.

Fernando Perez
Owner

OK, all the more reason to not rush this one; that's an uncommon encoding to declare explicitly and I worry about all of a sudden breaking badly much more common use cases. Thanks a ton for the detective work, @bfroehle, this is very useful.

Thomas Kluyver
Collaborator

There are places where we'll trip up on a cp1252 encoded file, but I don't think this is one of them. I think for some reason I assumed that the source code would be retrieved as unicode in Python 2. I suspect the same problem will occur with any non-ascii character in Python 2.

I've just tried with cast_bytes_py2, and it seems to work fine. On a theoretical level, I can't see any way in which it could introduce more failures:

Py2 str input: this failure case, cast_bytes_py2 works
Py2 unicode input: either function will have the same effect
Py3: both functions are no-ops (but should always be called with unicode anyway).

Thomas Kluyver takluyver closed this in a85c230 December 18, 2011
Thomas Kluyver
Collaborator
Bradley M. Froehle
Collaborator

Regarding @takluyver's comment:

I think for some reason I assumed that the source code would be retrieved as unicode in Python 2.

The source code is retrieved using linecache.getlines() which essentially just does

with open(fullname, 'rU') as fp:
    lines = fp.readlines()
Brian E. Granger ellisonbg referenced this issue from a commit January 10, 2012
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.