json_normalize in 1.0.0 with meta path specified - expects iterable #31507

tturocy · 2020-01-31T17:51:06Z

Code Sample, a copy-pastable example if possible

import json
from pandas.io.json import json_normalize

the_json = """                                                                  
[{"id": 99,                                                                     
  "data": [{"one": 1, "two": 2}]                                                
}]                                                                              
"""

print(json_normalize(json.loads(the_json),
                     record_path=['data'], meta=['id']))

Problem description

Through 0.25.3, this program generates a DataFrame with one row. In 1.0.0 it fails with an exception:

Traceback (most recent call last):
  File "foo.py", line 11, in <module>
    record_path=['data'], meta=['id']))
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/util/_decorators.py", line 66, in wrapper
    return alternative(*args, **kwargs)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 327, in _json_normalize
    _recursive_extract(data, record_path, {}, level=0)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 314, in _recursive_extract
    meta_val = _pull_field(obj, val[level:])
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 246, in _pull_field
    f"{js} has non iterable value {result} for path {spec}. "
TypeError: {'id': 99, 'data': [{'one': 1, 'two': 2}]} has non iterable value 99 for path ['id']. Must be iterable or null.

I don't see any documentation changes that suggest a backwards-incompatible change. All my calls to json_normalize that don't use meta function as before.

Expected Output

Through 0.25.3, the output was:

   one  two  id
0    1    2  99

Output of `pd.show_versions()`

From my virtualenv with pandas 1.0.0:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

From my virtualenv with 0.25.x:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 0.25.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

WillAyd · 2020-01-31T18:56:47Z

Looks buggy - investigation and PRs would certainly be welcome!

tturocy · 2020-01-31T20:09:57Z

It looks like this may have been introduced with pull request #30145.

charlesdong1991 · 2020-01-31T22:37:39Z

are you into fixing it? @tturocy otherwise, i could take this one

Before working on it, I would like to clarify the correct behaviour @WillAyd
So checking #30145 it seems we should have iterable as values for records, which I do not think is correct, since {["foo": 1]} should also be a valid json. Then why only Iterable is allowed? i might miss discussions/information from #30145

And IMO, there is sort of inconsistency now in behaviour, because if we do not specify anything, then that id in this case can be parsed.

WillAyd · 2020-01-31T22:42:19Z

@charlesdong1991 keep in mind that json normalizing and json parsing are not the same. I think your question is if record_path needs to point to something "iterable", in which case I believe the answer is yes. That same stipulation doesn't need to apply to meta though

charlesdong1991 · 2020-01-31T22:46:09Z

Ah, yeah, thanks for your quick reply @WillAyd , my doubt was indeed why both have to be iterable, and it's clear to me now!

tturocy · 2020-02-01T07:46:13Z

Thank you for picking this up @charlesdong1991 !

MustafaWaheed91 · 2020-02-03T22:10:27Z

I am having the same issue:

sboagibm · 2020-02-25T22:52:13Z

@charlesdong1991 Any status on this issue? We've had to revert have to 0.25.3. Is there a workaround?

keua · 2020-02-26T09:10:57Z

I had the same issue, I added the code from @charlesdong1991 pull request and it is working for me

charlesdong1991 · 2020-02-26T09:24:46Z

thanks for all your comments!!

The PR is underway, and the issue which causes this problem is for meta the values of fields are not necessarily Iterables, but a patch added before changed this. As the PR in pandas usually takes quite a while to get merged (due to code quality/cleanness requirement and large amount of PRs here), so it will take longer to be released, and I hope this fix patch could be included in 1.0.2.

If it is big issue @sboagibm in your use case, as @keua said, use the current change in this PR could work around it or roll back to earlier version, and then wait for the new release. Thanks!

yunpeng-zhang · 2021-04-27T20:32:50Z

Is this issue solved in which version of pandas?

usamatrq94 · 2021-10-20T13:02:52Z

Still experiencing this issue, any way around?

MarcoGorelli · 2021-10-20T16:35:13Z

This looks fine to me in the latest version of pandas

In [1]: import json
   ...: from pandas import json_normalize
   ...: 
   ...: the_json = """                                                                  
   ...: [{"id": 99,                                                                     
   ...:   "data": [{"one": 1, "two": 2}]                                                
   ...: }]                                                                              
   ...: """
   ...: 
   ...: print(json_normalize(json.loads(the_json),
   ...:                      record_path=['data'], meta=['id']))

   one  two  id
0    1    2  99

WillAyd added IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version labels Jan 31, 2020

charlesdong1991 mentioned this issue Jan 31, 2020

BUG: non-iterable value in meta raise error in json_normalize #31524

Merged

5 tasks

This was referenced Mar 10, 2020

json_normalize errors if meta fields are integer #32480

Closed

RLS: 1.0.2 #32415

Closed

jreback added this to the 1.0.2 milestone Mar 11, 2020

WillAyd closed this as completed in #31524 Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

tturocy commented Jan 31, 2020

WillAyd commented Jan 31, 2020

tturocy commented Jan 31, 2020

charlesdong1991 commented Jan 31, 2020 •

edited

Loading

WillAyd commented Jan 31, 2020

charlesdong1991 commented Jan 31, 2020

tturocy commented Feb 1, 2020

MustafaWaheed91 commented Feb 3, 2020

sboagibm commented Feb 25, 2020

keua commented Feb 26, 2020

charlesdong1991 commented Feb 26, 2020

yunpeng-zhang commented Apr 27, 2021

usamatrq94 commented Oct 20, 2021

MarcoGorelli commented Oct 20, 2021 •

edited

Loading

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

Comments

tturocy commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

WillAyd commented Jan 31, 2020

tturocy commented Jan 31, 2020

charlesdong1991 commented Jan 31, 2020 • edited Loading

WillAyd commented Jan 31, 2020

charlesdong1991 commented Jan 31, 2020

tturocy commented Feb 1, 2020

MustafaWaheed91 commented Feb 3, 2020

sboagibm commented Feb 25, 2020

keua commented Feb 26, 2020

charlesdong1991 commented Feb 26, 2020

yunpeng-zhang commented Apr 27, 2021

usamatrq94 commented Oct 20, 2021

MarcoGorelli commented Oct 20, 2021 • edited Loading

Output of `pd.show_versions()`

charlesdong1991 commented Jan 31, 2020 •

edited

Loading

MarcoGorelli commented Oct 20, 2021 •

edited

Loading