sys.setdefaultencoding('utf-8') breaks print and %%time #8071

supern8ent · 2015-03-17T18:05:51Z

I use pandas and since a recent update (sorry I don't know what the old version was, but now I'm at ipython 3.0.0 and pandas 0.15.2) I need to set

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

in order to view the html version of a dataframe, otherwise I get UnicodeDecodeError.

Unfortunately, this workaround has the side-effect that I no longer see output from print statements and %%time.

Per rkern, more specifically what pandas content breaks in my case:

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-29-eaddc9d8e339> in <module>()
----> 1 pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    236                 self.write_format_data(format_dict, md_dict)
    237                 self.log_output(format_dict)
--> 238             self.finish_displayhook()
    239 
    240     def cull_cache(self):

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/displayhook.pyc in finish_displayhook(self)
     70         sys.stderr.flush()
     71         if self.msg['content']['data']:
---> 72             self.session.send(self.pub_socket, self.msg, ident=self.topic)
     73         self.msg = None
     74 

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    647         if self.adapt_version:
    648             msg = adapt(msg, self.adapt_version)
--> 649         to_send = self.serialize(msg, ident)
    650         to_send.extend(buffers)
    651         longest = max([ len(s) for s in to_send ])

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in serialize(self, msg, ident)
    551             content = self.none
    552         elif isinstance(content, dict):
--> 553             content = self.pack(content)
    554         elif isinstance(content, bytes):
    555             # content is already packed, as in a relayed message

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in <lambda>(obj)
     83 # disallow nan, because it's not actually valid JSON
     84 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default,
---> 85     ensure_ascii=False, allow_nan=False,
     86 )
     87 json_unpacker = lambda s: jsonapi.loads(s)

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/zmq/utils/jsonapi.pyc in dumps(o, **kwargs)
     38         kwargs['separators'] = (',', ':')
     39 
---> 40     s = jsonmod.dumps(o, **kwargs)
     41 
     42     if isinstance(s, unicode):

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    248         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    249         separators=separators, encoding=encoding, default=default,
--> 250         sort_keys=sort_keys, **kw).encode(obj)
    251 
    252 

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in encode(self, o)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)
--> 210         return ''.join(chunks)
    211 
    212     def iterencode(self, o, _one_shot=False):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 81: ordinal not in range(128)

Add the workaround:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

and I get the table with the text as you would expect.

The text was updated successfully, but these errors were encountered:

rkern · 2015-03-17T18:09:25Z

Please provide an example of how the HTML version of a dataframe gives you a UnicodeDecodeError. That's the bug to fix. The bug may be in Pandas.

sys.setdefaultencoding() should never be used, so the fact that setting it breaks other stuff is not something to worry about.

takluyver · 2015-03-17T18:15:15Z

I think we're already tracking the issue with displaying DataFrames as #6799.

I agree with @rkern that sys.setdefaultencoding() should be expected to break stuff.

takluyver · 2015-03-17T18:28:29Z

And it looks like Min already fixed that, so non-ascii data frames should work in 3.1.

Or you can upgrade to Python 3, where unicode generally isn't a problem.

supern8ent · 2015-03-17T18:30:17Z

Thanks for clearing that up!

takluyver closed this as completed Mar 17, 2015

takluyver added this to the no action milestone Mar 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

supern8ent commented Mar 17, 2015

rkern commented Mar 17, 2015

takluyver commented Mar 17, 2015

takluyver commented Mar 17, 2015

supern8ent commented Mar 17, 2015

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

Comments

supern8ent commented Mar 17, 2015

rkern commented Mar 17, 2015

takluyver commented Mar 17, 2015

takluyver commented Mar 17, 2015

supern8ent commented Mar 17, 2015