DOC: Additional examples for `json_normalize` #16415

DGrady · 2017-05-22T17:02:40Z

When handling JSON data, a common use case is to start with a list of hierarchically nested records with an unknown, or possibly inconsistent, layout, and transform them into a flat tabular structure. Pandas' existing json_normalize function handles this use case, but the examples in the function's documentation don't make this clear. It could be useful to provide some additional explanation and examples in these functions.

Code Sample

data = [
    ...: {'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
    ...: {'name': {'given': 'Mose', 'family': 'Regner'}},
    ...: {'id': 2, 'name': 'Faye Raker'},
    ...: ]

json_normalize(data)

    id        name name.family name.first name.given name.last
0  1.0         NaN         NaN     Coleen        NaN      Volk
1  NaN         NaN      Regner        NaN       Mose       NaN
2  2.0  Faye Raker         NaN        NaN        NaN       NaN

Problem description

Direct conversion to a data frame doesn't provide information about the nested structure. pandas.read_json is also designed to work with data that's already flat.

The existing documentation for json_normalize only includes an example of using it for a somewhat more complicated process. The tutorial sections on JSON parsing use the same example. These items could be updated to include additional examples that would help others understand when and how to apply json_normalize

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-05-22T17:11:52Z

That seems like a good idea! (both tutorial docs and docstring can be updated)

jorisvandenbossche added Docs IO JSON read_json, to_json, json_normalize labels May 22, 2017

zzgao mentioned this issue May 22, 2017

DOC: add example on json_normalize #16438

Merged

1 task

jreback added this to the 0.21.0 milestone May 22, 2017

jorisvandenbossche closed this as completed in #16438 Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Additional examples for `json_normalize` #16415

DOC: Additional examples for `json_normalize` #16415

DGrady commented May 22, 2017

jorisvandenbossche commented May 22, 2017

DOC: Additional examples for json_normalize #16415

DOC: Additional examples for json_normalize #16415

Comments

DGrady commented May 22, 2017

Code Sample

Problem description

Output of pd.show_versions()

jorisvandenbossche commented May 22, 2017

DOC: Additional examples for `json_normalize` #16415

DOC: Additional examples for `json_normalize` #16415

Output of `pd.show_versions()`