ENH: json_normalize should allow a different separator than . #14883

Closed
jowens opened this Issue Dec 14, 2016 · 5 comments

Comments

Projects
None yet
3 participants

jowens commented Dec 14, 2016

>>> import pandas
>>> col_in = ['c1', 'c2.x']
>>> df = pandas.DataFrame([['A', 0], ['B', 1]], columns=col_in)
>>> df.c1
0    A
1    B
Name: c1, dtype: object
>>> df.c2.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'c2'

Problem description

The above snippet shows that it's not ideal to have . as a character in a column name. (I'm running into this when using Vega for data visualization, vega/vega-lite#1775.) When json_normalize flattens a nested input JSON, it separates the nesting levels with a .. I believe this happens on this line:

https://github.com/pandas-dev/pandas/blob/7d8bc0deaeb8237a0cf361048363c78f4867f218/pandas/io/json.py#L831

I'd like to see an additional argument to json_normalize, separator, with default ., that specified the character (string) that separated nesting levels. In the line of code above, '.'.join(val) would be replaced by separator.join(val) (if I'm reading what that line does correctly). I could use, say, _ to use underscore instead of period.

n00b at pandas, please correct me if I'm doing anything wrong.

Expected Output

Output of pd.show_versions()

>>> pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.US-ASCII
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 30.3.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Contributor

rcarneva commented Dec 15, 2016

The dot notation for accessing columns is just a convenience. You can still get the column normally using df['c2.x'].

jowens commented Dec 15, 2016

I understand. I'm merely offering the observation that columns with . in the names are perhaps not a perfect fit for everything in Pandas and having the option to use a different separator might be also useful for someone who isn't me.

Contributor

rcarneva commented Dec 15, 2016

Sure thing. Just wanted to point that out in case it was keeping you from doing something you needed to do now since you said that you were new to pandas.

jowens commented Dec 15, 2016

Yeah, it's the vega-lite bug I filed (vega/vega-lite#1775) that's my proximate difficulty here.

jreback added the IO JSON label Dec 15, 2016

Contributor

jreback commented Dec 15, 2016

this would be quite easy to add; PR's welcome.

jreback added this to the Next Major Release milestone Dec 15, 2016

@jreback jreback added a commit to jowens/pandas that referenced this issue Jan 22, 2017

@jowens @jreback jowens + jreback ENH: json_normalize now takes a user-specified separator
closes #14883
e707e79

@jreback jreback modified the milestone: 0.20.0, Next Major Release Mar 28, 2017

@jreback jreback added a commit to jowens/pandas that referenced this issue Mar 28, 2017

@jowens @jreback jowens + jreback ENH: json_normalize now takes a user-specified separator
closes #14883
6a0f954

@jreback jreback added a commit to jowens/pandas that referenced this issue Mar 28, 2017

@jowens @jreback jowens + jreback ENH: json_normalize now takes a user-specified separator
closes #14883
8edc40e

jreback closed this in 34c6bd0 Mar 28, 2017

@mattip mattip added a commit to mattip/pandas that referenced this issue Apr 3, 2017

@jreback @mattip jreback + mattip ENH: GH14883: json_normalize now takes a user-specified separator
closes #14883

Author: Jeff Reback <jeff@reback.net>
Author: John Owens <jowens@ece.ucdavis.edu>

Closes #14950 from jowens/json_normalize-separator and squashes the following commits:

0327dd1 [Jeff Reback] compare sorted columns
bc5aae8 [Jeff Reback] CLN: fixup json_normalize with sep
8edc40e [John Owens] ENH: json_normalize now takes a user-specified separator
75b6512

@linebp linebp added a commit to linebp/pandas that referenced this issue Apr 17, 2017

@jreback @linebp jreback + linebp ENH: GH14883: json_normalize now takes a user-specified separator
closes #14883

Author: Jeff Reback <jeff@reback.net>
Author: John Owens <jowens@ece.ucdavis.edu>

Closes #14950 from jowens/json_normalize-separator and squashes the following commits:

0327dd1 [Jeff Reback] compare sorted columns
bc5aae8 [Jeff Reback] CLN: fixup json_normalize with sep
8edc40e [John Owens] ENH: json_normalize now takes a user-specified separator
6ca3087
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment