Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_json changes dtype (int => float) #12866

Closed
stellasia opened this issue Apr 11, 2016 · 6 comments
Closed

read_json changes dtype (int => float) #12866

stellasia opened this issue Apr 11, 2016 · 6 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize
Milestone

Comments

@stellasia
Copy link

Hi there!

This might be a well-known problem but could not find a track/explaination about it. When reading a json to create a pandas object (Series or DataFrame), the index dtype is changed from int to float.

The doc only mentions the inverse trans-typing:

a column that was float data will be converted to integer if it can be done safely, e.g. a column of 1.

Is this behaviour expected as well? Is the only solution giving the read_json a dtype argument?

Code Sample, a copy-pastable example if possible

    In [1]: import pandas as pd

    In [2]: s = pd.Series([11, 12, 13])

    In [3]: s
    Out[3]: 
    0    11
    1    12
    2    13
    dtype: int64

    In [4]: s.to_json()
    Out[4]: '{"0":11,"1":12,"2":13}'

    In [5]: pd.read_json(s.to_json(), typ="serie")
    Out[5]:  
    0.0    11
    1.0    12
    2.0    13
    dtype: int64

    In [6]: s.index.dtype  
    Out[6]: dtype('int64')

    In [7]: pd.read_json(s.to_json(), typ="serie").index.dtype
    Out[7]: dtype('float64')

Expected Output

I would like to recover the integer index for the new Series :

    In [5]: pd.read_json(s.to_json(), typ="serie")
    Out[5]:  
    0    11
    1    12
    2    13
    dtype: int64

    In [7]: pd.read_json(s.to_json(), typ="serie").index.dtype
    Out[7]: dtype('int64')

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-44-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.18.0
nose: 1.3.4
pip: 1.5.6
setuptools: 12.2
Cython: None
numpy: 1.11.0
scipy: 0.14.1
statsmodels: 0.5.0
xarray: None
IPython: 4.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 2.2.0-b1
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: 1.0.0
pymysql: None
psycopg2: 2.6 (dt dec mx pq3 ext lo64)
jinja2: 2.8
boto: None

Thanks!

@jreback
Copy link
Contributor

jreback commented Apr 11, 2016

both other orients recover this, I don't know why orient='index' (the default does not).
cc @Komnomnomnom

In [7]: pd.read_json(s.to_json(orient='split'),typ='series',orient='split')
Out[7]: 
0    11
1    12
2    13
dtype: int64

In [9]: pd.read_json(s.to_json(orient='records'),typ='series',orient='records')
Out[9]: 
0    11
1    12
2    13
dtype: int64

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize labels Apr 11, 2016
@stellasia
Copy link
Author

Indeed, orient='split' is working fine (even better for my needs). Thanks.

I would be interested in understanding the reason of the different behaviour for orient='index' if it is known.

@Komnomnomnom
Copy link
Contributor

The default orient='index' converts the pandas object to a JSON object which must have string keys. Therefore when decoded the index values are all strings. They are then (by default) promoted to a float dtype here

orient='records' doesn't apply here as it doesn't serialise the index and orient='split' serialises the index, columns and values into separate arrays so the index values are not encoded to JSON strings and their type is preserved.

So this is as expected unless the default of converting object types to float when deserialising is changed. Could attempt int first and then float? (esp for index)

@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

@Komnomnomnom right, do we know its an part of an index at that point? (I don't remember).

actually pd.to_numeric(data) will do the right thing here (e.g. convert to int if possible, then float otherwise), and raise if not convertible.

In [64]: pd.to_numeric(['a','1',2])
ValueError: Unable to parse string

In [65]: pd.to_numeric(['0','1',2])
Out[65]: array([0, 1, 2])

In [66]: pd.to_numeric(['0','1',2.0])
Out[66]: array([ 0.,  1.,  2.])

In [67]: pd.to_numeric(['0','1.5',2.0])
Out[67]: array([ 0. ,  1.5,  2. ])

@Komnomnomnom
Copy link
Contributor

Yes it is happening after deserialisation (the c code) has finished.

It actually does try to convert to int further along but it runs into TypeError: Setting <class 'pandas.core.index.Float64Index'> dtype to anything other than float64 or object is not supported

In [61]: from pandas import Index

In [62]: data = Index([u'0', u'1', u'2'], dtype='object')

In [63]: data.astype('float64').astype('int64')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-63-7bbb22a22b83> in <module>()
----> 1 data.astype('float64').astype('int64')

/home/kieran/.virtualenvs/py27/lib/python2.7/site-packages/pandas/core/index.pyc in astype(self, dtype)
   3828             raise TypeError('Setting %s dtype to anything other than '
   3829                             'float64 or object is not supported' %
-> 3830                             self.__class__)
   3831         return Index(self._values, name=self.name, dtype=dtype)
   3832 

TypeError: Setting <class 'pandas.core.index.Float64Index'> dtype to anything other than float64 or object is not supported

@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

@Komnomnomnom sorry, that is a bug. let me create another issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants