Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.setdefaultencoding('utf-8') breaks print and %%time #8071

Closed
supern8ent opened this issue Mar 17, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@supern8ent
Copy link

commented Mar 17, 2015

I use pandas and since a recent update (sorry I don't know what the old version was, but now I'm at ipython 3.0.0 and pandas 0.15.2) I need to set

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

in order to view the html version of a dataframe, otherwise I get UnicodeDecodeError.

Unfortunately, this workaround has the side-effect that I no longer see output from print statements and %%time.

Per rkern, more specifically what pandas content breaks in my case:

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-29-eaddc9d8e339> in <module>()
----> 1 pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    236                 self.write_format_data(format_dict, md_dict)
    237                 self.log_output(format_dict)
--> 238             self.finish_displayhook()
    239 
    240     def cull_cache(self):

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/displayhook.pyc in finish_displayhook(self)
     70         sys.stderr.flush()
     71         if self.msg['content']['data']:
---> 72             self.session.send(self.pub_socket, self.msg, ident=self.topic)
     73         self.msg = None
     74 

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    647         if self.adapt_version:
    648             msg = adapt(msg, self.adapt_version)
--> 649         to_send = self.serialize(msg, ident)
    650         to_send.extend(buffers)
    651         longest = max([ len(s) for s in to_send ])

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in serialize(self, msg, ident)
    551             content = self.none
    552         elif isinstance(content, dict):
--> 553             content = self.pack(content)
    554         elif isinstance(content, bytes):
    555             # content is already packed, as in a relayed message

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in <lambda>(obj)
     83 # disallow nan, because it's not actually valid JSON
     84 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default,
---> 85     ensure_ascii=False, allow_nan=False,
     86 )
     87 json_unpacker = lambda s: jsonapi.loads(s)

/Users/nathan/.virtualenvs/A1/lib/python2.7/site-packages/zmq/utils/jsonapi.pyc in dumps(o, **kwargs)
     38         kwargs['separators'] = (',', ':')
     39 
---> 40     s = jsonmod.dumps(o, **kwargs)
     41 
     42     if isinstance(s, unicode):

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    248         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    249         separators=separators, encoding=encoding, default=default,
--> 250         sort_keys=sort_keys, **kw).encode(obj)
    251 
    252 

/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in encode(self, o)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)
--> 210         return ''.join(chunks)
    211 
    212     def iterencode(self, o, _one_shot=False):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 81: ordinal not in range(128)

Add the workaround:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

pd.DataFrame({'x':[u'water, 38.71 mg/L @ 25 \xb0C (est), water, 14.1 mg/L @ 25 \xb0C (exp)']})

and I get the table with the text as you would expect.

@rkern

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2015

Please provide an example of how the HTML version of a dataframe gives you a UnicodeDecodeError. That's the bug to fix. The bug may be in Pandas.

sys.setdefaultencoding() should never be used, so the fact that setting it breaks other stuff is not something to worry about.

@takluyver

This comment has been minimized.

Copy link
Member

commented Mar 17, 2015

I think we're already tracking the issue with displaying DataFrames as #6799.

I agree with @rkern that sys.setdefaultencoding() should be expected to break stuff.

@takluyver

This comment has been minimized.

Copy link
Member

commented Mar 17, 2015

And it looks like Min already fixed that, so non-ascii data frames should work in 3.1.

Or you can upgrade to Python 3, where unicode generally isn't a problem.

@takluyver takluyver closed this Mar 17, 2015

@takluyver takluyver added this to the no action milestone Mar 17, 2015

@supern8ent

This comment has been minimized.

Copy link
Author

commented Mar 17, 2015

Thanks for clearing that up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.