Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

Closed
tturocy opened this issue Jan 31, 2020 · 13 comments · Fixed by #31524
Closed

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

tturocy opened this issue Jan 31, 2020 · 13 comments · Fixed by #31524
Labels
IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@tturocy
Copy link

tturocy commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

import json
from pandas.io.json import json_normalize

the_json = """                                                                  
[{"id": 99,                                                                     
  "data": [{"one": 1, "two": 2}]                                                
}]                                                                              
"""

print(json_normalize(json.loads(the_json),
                     record_path=['data'], meta=['id']))

Problem description

Through 0.25.3, this program generates a DataFrame with one row. In 1.0.0 it fails with an exception:

Traceback (most recent call last):
  File "foo.py", line 11, in <module>
    record_path=['data'], meta=['id']))
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/util/_decorators.py", line 66, in wrapper
    return alternative(*args, **kwargs)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 327, in _json_normalize
    _recursive_extract(data, record_path, {}, level=0)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 314, in _recursive_extract
    meta_val = _pull_field(obj, val[level:])
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 246, in _pull_field
    f"{js} has non iterable value {result} for path {spec}. "
TypeError: {'id': 99, 'data': [{'one': 1, 'two': 2}]} has non iterable value 99 for path ['id']. Must be iterable or null.

I don't see any documentation changes that suggest a backwards-incompatible change. All my calls to json_normalize that don't use meta function as before.

Expected Output

Through 0.25.3, the output was:

   one  two  id
0    1    2  99

Output of pd.show_versions()

From my virtualenv with pandas 1.0.0:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

From my virtualenv with 0.25.x:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 0.25.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@WillAyd
Copy link
Member

WillAyd commented Jan 31, 2020

Looks buggy - investigation and PRs would certainly be welcome!

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version labels Jan 31, 2020
@tturocy
Copy link
Author

tturocy commented Jan 31, 2020

It looks like this may have been introduced with pull request #30145.

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 31, 2020

are you into fixing it? @tturocy otherwise, i could take this one

Before working on it, I would like to clarify the correct behaviour @WillAyd
So checking #30145 it seems we should have iterable as values for records, which I do not think is correct, since {["foo": 1]} should also be a valid json. Then why only Iterable is allowed? i might miss discussions/information from #30145

And IMO, there is sort of inconsistency now in behaviour, because if we do not specify anything, then that id in this case can be parsed.

@WillAyd
Copy link
Member

WillAyd commented Jan 31, 2020

@charlesdong1991 keep in mind that json normalizing and json parsing are not the same. I think your question is if record_path needs to point to something "iterable", in which case I believe the answer is yes. That same stipulation doesn't need to apply to meta though

@charlesdong1991
Copy link
Member

Ah, yeah, thanks for your quick reply @WillAyd , my doubt was indeed why both have to be iterable, and it's clear to me now!

@tturocy
Copy link
Author

tturocy commented Feb 1, 2020

Thank you for picking this up @charlesdong1991 !

@MustafaWaheed91
Copy link

I am having the same issue:

@sboagibm
Copy link

@charlesdong1991 Any status on this issue? We've had to revert have to 0.25.3. Is there a workaround?

@keua
Copy link

keua commented Feb 26, 2020

I had the same issue, I added the code from @charlesdong1991 pull request and it is working for me

@charlesdong1991
Copy link
Member

thanks for all your comments!!

The PR is underway, and the issue which causes this problem is for meta the values of fields are not necessarily Iterables, but a patch added before changed this. As the PR in pandas usually takes quite a while to get merged (due to code quality/cleanness requirement and large amount of PRs here), so it will take longer to be released, and I hope this fix patch could be included in 1.0.2.

If it is big issue @sboagibm in your use case, as @keua said, use the current change in this PR could work around it or roll back to earlier version, and then wait for the new release. Thanks!

@jreback jreback added this to the 1.0.2 milestone Mar 11, 2020
@yunpeng-zhang
Copy link

Is this issue solved in which version of pandas?

@usamatrq94
Copy link

Still experiencing this issue, any way around?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Oct 20, 2021

This looks fine to me in the latest version of pandas

In [1]: import json
   ...: from pandas import json_normalize
   ...: 
   ...: the_json = """                                                                  
   ...: [{"id": 99,                                                                     
   ...:   "data": [{"one": 1, "two": 2}]                                                
   ...: }]                                                                              
   ...: """
   ...: 
   ...: print(json_normalize(json.loads(the_json),
   ...:                      record_path=['data'], meta=['id']))

   one  two  id
0    1    2  99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants