Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_normalize() can't deal with non-ascii characters in unicode keys #13213

Closed
fmarczin opened this issue May 18, 2016 · 0 comments
Closed

json_normalize() can't deal with non-ascii characters in unicode keys #13213

fmarczin opened this issue May 18, 2016 · 0 comments
Labels
Bug Unicode Unicode strings
Milestone

Comments

@fmarczin
Copy link
Contributor

Example code:

import pandas
import json

testjson = u'''
[{"Ünicøde":0,"sub":{"A":1, "B":2}},
 {"Ünicøde":1,"sub":{"A":3, "B":4}}]
 '''.encode('utf8')
pd.io.json.json_normalize(json.loads(testjson))

Output:

Traceback (most recent call last):
  File "...lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-f866f9c7ec7c>", line 5, in <module>
    pd.io.json.json_normalize(json.loads(testjson))
  File ".../lib/python2.7/site-packages/pandas/io/json.py", line 715, in json_normalize
    data = nested_to_record(data)
  File ".../lib/python2.7/site-packages/pandas/io/json.py", line 617, in nested_to_record
    newkey = str(k)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 0: ordinal not in range(128)

Expected output

   sub.A  sub.B  Ünicøde
0      1      2        0
1      3      4        1

The cause are probably
https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L618
and https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L620

Those lines seemingly were introduced to deal with numeric types, but fail when k is a Unicode object containing non-ascii characters.

It seems to be the same bug in principle as #13101

fmarczin pushed a commit to fmarczin/pandas that referenced this issue May 18, 2016
fmarczin pushed a commit to fmarczin/pandas that referenced this issue May 18, 2016
@jreback jreback added Bug Unicode Unicode strings labels May 18, 2016
@jreback jreback added this to the 0.18.2 milestone May 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Unicode Unicode strings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants