Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

Closed
supern8ent opened this issue Mar 17, 2015 · 4 comments
Closed

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

supern8ent opened this issue Mar 17, 2015 · 4 comments
Milestone

Comments

@supern8ent
Copy link

I use pandas and since a recent update (sorry I don't know what the old version was, but now I'm at ipython 3.0.0 and pandas 0.15.2) I need to set

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

in order to view the html version of a dataframe, otherwise I get UnicodeDecodeError.

Unfortunately, this workaround has the side-effect that I no longer see output from print statements and %%time.

Per rkern, more specifically what pandas content breaks in my case:

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-29-eaddc9d8e339> in <module>()
----> 1 pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    236                 self.write_format_data(format_dict, md_dict)
    237                 self.log_output(format_dict)
--> 238             self.finish_displayhook()
    239 
    240     def cull_cache(self):

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/displayhook.pyc in finish_displayhook(self)
     70         sys.stderr.flush()
     71         if self.msg['content']['data']:
---> 72             self.session.send(self.pub_socket, self.msg, ident=self.topic)
     73         self.msg = None
     74 

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    647         if self.adapt_version:
    648             msg = adapt(msg, self.adapt_version)
--> 649         to_send = self.serialize(msg, ident)
    650         to_send.extend(buffers)
    651         longest = max([ len(s) for s in to_send ])

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in serialize(self, msg, ident)
    551             content = self.none
    552         elif isinstance(content, dict):
--> 553             content = self.pack(content)
    554         elif isinstance(content, bytes):
    555             # content is already packed, as in a relayed message

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in <lambda>(obj)
     83 # disallow nan, because it's not actually valid JSON
     84 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default,
---> 85     ensure_ascii=False, allow_nan=False,
     86 )
     87 json_unpacker = lambda s: jsonapi.loads(s)

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/zmq/utils/jsonapi.pyc in dumps(o, **kwargs)
     38         kwargs['separators'] = (',', ':')
     39 
---> 40     s = jsonmod.dumps(o, **kwargs)
     41 
     42     if isinstance(s, unicode):

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    248         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    249         separators=separators, encoding=encoding, default=default,
--> 250         sort_keys=sort_keys, **kw).encode(obj)
    251 
    252 

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in encode(self, o)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)
--> 210         return ''.join(chunks)
    211 
    212     def iterencode(self, o, _one_shot=False):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 81: ordinal not in range(128)

Add the workaround:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

and I get the table with the text as you would expect.

@rkern
Copy link
Contributor

rkern commented Mar 17, 2015

Please provide an example of how the HTML version of a dataframe gives you a UnicodeDecodeError. That's the bug to fix. The bug may be in Pandas.

sys.setdefaultencoding() should never be used, so the fact that setting it breaks other stuff is not something to worry about.

@takluyver
Copy link
Member

I think we're already tracking the issue with displaying DataFrames as #6799.

I agree with @rkern that sys.setdefaultencoding() should be expected to break stuff.

@takluyver
Copy link
Member

And it looks like Min already fixed that, so non-ascii data frames should work in 3.1.

Or you can upgrade to Python 3, where unicode generally isn't a problem.

@takluyver takluyver added this to the no action milestone Mar 17, 2015
@supern8ent
Copy link
Author

Thanks for clearing that up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants