New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas read from json don't infer data types #916
Conversation
What was the problem exactly? It's hard to tell from the patch. |
Added more details in description. |
This seems like it might create backwards compatibility issues if users' existing pipelines depend on the inferred datatypes. Perhaps we should consider implementing an interim warning about the fact that this behavior will be changing (e.g. in |
This would mean all integers are parsed as object / string if I understand it correctly? I am not sure if that is a good idea. Can we instead add a parameter to the REST api or during the deployment process? |
Reply to @tomasatdatabricks, re:
No. Users can pass integer and floats as a part of json. Look at examples in tests-- Or above in description that shows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though we definitely need docs as well prior to the release.
* pandas read from json don't infer data types * added more tests * Adding int64 columns for json -> pandas
parse_json_input
by default
pandas.read_json
will attempt at up-converting data types. In the following example "zip_code" column is presented as strings in json but converted toint64
by default. Auto infer can be troublesome when writing custom model scoring code. This PR usesdtype=False
argument to stop this auto infer.Same behavior is seen when using
orient="records"