pandas read from json don't infer data types #916

mparkhe · 2019-02-23T07:32:48Z

Dont infer data types while converting from json
unit tests for parse_json_input

by default pandas.read_json will attempt at up-converting data types. In the following example "zip_code" column is presented as strings in json but converted to int64 by default. Auto infer can be troublesome when writing custom model scoring code. This PR uses dtype=False argument to stop this auto infer.

>>> json_string = '{"columns":["zip_code","cost"],"index":[0,1,2],"data":[["95120",10.45],["95128",23.0],["95128",12.1]]}'

>>> str(pd.read_json(json_string, orient="split").dtypes["zip_code"])
'int64'

>>> str(pd.read_json(json_string, orient="split", dtype=False).dtypes["zip_code"])
'object'

Same behavior is seen when using orient="records"

mateiz · 2019-02-24T03:01:16Z

What was the problem exactly? It's hard to tell from the patch.

…on_dtypes

mparkhe · 2019-02-25T19:22:22Z

What was the problem exactly? It's hard to tell from the patch.

Added more details in description.

dbczumar · 2019-02-25T19:29:37Z

This seems like it might create backwards compatibility issues if users' existing pipelines depend on the inferred datatypes. Perhaps we should consider implementing an interim warning about the fact that this behavior will be changing (e.g. in 0.8.3.) and implement the change in a later version (e.g. 0.9.0).

tomasatdatabricks · 2019-02-27T21:12:29Z

This would mean all integers are parsed as object / string if I understand it correctly? I am not sure if that is a good idea.

Can we instead add a parameter to the REST api or during the deployment process?

mparkhe · 2019-02-27T21:46:32Z

Reply to @tomasatdatabricks, re:

This would mean all integers are parsed as object / string if I understand it correctly? I am not sure if that is a good idea.

Can we instead add a parameter to the REST api or during the deployment process?

No. Users can pass integer and floats as a part of json. Look at examples in tests--
test_records_oriented_json_to_df and test_split_oriented_json_to_df

Or above in description that shows "cost" field sent in as a float64 without quotes. They are not serialized as objects.

…on_dtypes

dbczumar

LGTM, though we definitely need docs as well prior to the release.

* pandas read from json don't infer data types * added more tests * Adding int64 columns for json -> pandas

pandas read from json don't infer data types

6a33c6d

mparkhe added 2 commits February 25, 2019 11:20

added more tests

fe9212c

Merge branch 'master' of github.com:mlflow/mlflow into pandas_from_js…

dad1267

…on_dtypes

mparkhe requested review from dbczumar and mateiz February 25, 2019 19:24

mparkhe added 3 commits March 7, 2019 15:58

Merge branch 'master' of github.com:mlflow/mlflow into pandas_from_js…

025c63c

…on_dtypes

Adding int64 columns for json -> pandas

460f7a6

Merge branch 'master' of github.com:mlflow/mlflow into pandas_from_js…

3b724df

…on_dtypes

dbczumar approved these changes Mar 8, 2019

View reviewed changes

mparkhe merged commit e20e712 into mlflow:master Mar 8, 2019

mparkhe deleted the pandas_from_json_dtypes branch March 8, 2019 03:00

eedeleon pushed a commit to eedeleon/mlflow that referenced this pull request Mar 13, 2019

pandas read from json don't infer data types (mlflow#916)

05fa1c7

* pandas read from json don't infer data types * added more tests * Adding int64 columns for json -> pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas read from json don't infer data types #916

pandas read from json don't infer data types #916

mparkhe commented Feb 23, 2019 •

edited

mateiz commented Feb 24, 2019

mparkhe commented Feb 25, 2019

dbczumar commented Feb 25, 2019

tomasatdatabricks commented Feb 27, 2019

mparkhe commented Feb 27, 2019 •

edited

dbczumar left a comment

pandas read from json don't infer data types #916

pandas read from json don't infer data types #916

Conversation

mparkhe commented Feb 23, 2019 • edited

mateiz commented Feb 24, 2019

mparkhe commented Feb 25, 2019

dbczumar commented Feb 25, 2019

tomasatdatabricks commented Feb 27, 2019

mparkhe commented Feb 27, 2019 • edited

dbczumar left a comment

Choose a reason for hiding this comment

mparkhe commented Feb 23, 2019 •

edited

mparkhe commented Feb 27, 2019 •

edited