Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read_json overflow error when json contains big number strings #30320

Closed
boing102 opened this issue Dec 18, 2019 · 5 comments · Fixed by #30329
Closed

Read_json overflow error when json contains big number strings #30320

boing102 opened this issue Dec 18, 2019 · 5 comments · Fixed by #30329
Assignees
Labels
IO JSON read_json, to_json, json_normalize
Milestone

Comments

@boing102
Copy link

Code Sample, a copy-pastable example if possible

import json
import pandas as pd

test_data = [{"col": "31900441201190696999"}, {"col": "Text"}]
test_json = json.dumps(test_data)
pd.read_json(test_json)

Problem description

The current behaviour doesn't return a dateframe for a valid JSON. Note when the number is smaller, it works fine. It also works when only big numbers are present. It would be cool to have it work with big numbers as it works for small numbers.

Expected Output

A dataframe with a number and string

       col
0  3.190044e+19
1     Text

Output of pd.read_json()

Traceback (most recent call last): File "", line 1, in File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 592, in read_json result = json_reader.read() File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 717, in read obj = self._get_object_parser(self.data) File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 739, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 855, in parse self._try_convert_types() File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 1151, in _try_convert_types lambda col, c: self._try_convert_data(col, c, convert_dates=False) File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 1131, in _process_converter new_data, result = f(col, c) File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 1151, in lambda col, c: self._try_convert_data(col, c, convert_dates=False) File ".../.venv/lib/python3.6/site-packages/pandas/io/json/_json.py", line 927, in _try_convert_data new_data = data.astype("int64") File ".../.venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5882, in astype dtype=dtype, copy=copy, errors=errors, **kwargs File ".../.venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 581, in astype return self.apply("astype", dtype=dtype, **kwargs) File ".../.venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 438, in apply applied = getattr(b, f)(**kwargs) File ".../.venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 559, in astype return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs) File ".../.venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 643, in _astype values = astype_nansafe(vals1d, dtype, copy=True, **kwargs) File ".../.venv/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 707, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas/_libs/lib.pyx", line 547, in pandas._libs.lib.astype_intsafe OverflowError: Python int too large to convert to C long
@jbrockmendel jbrockmendel added the IO JSON read_json, to_json, json_normalize label Dec 18, 2019
@rohitkg98
Copy link
Contributor

take

@rohitkg98
Copy link
Contributor

I'm new to Open Source contributions so please bear with me. It seems that we are coercing ints wherever possible while parsing JSON. The code b/w line 943 to line 950 in file pandas/io/json/_json.py is what is causing the problem. The int coerce is being checked using a try except which only catches TypeErrors and ValueErrors. If it catches an OverflowError too then things work as intended. Will submit a PR regarding this soon.

@simonjayhawkins
Copy link
Member

if test_json is [{"col": "31900441201190696999"}, {"col": "Text"}], would we not expect the result to be a string and OverflowError: int too big to convert should not be raised. (eg. a number as a string could be a barcode)

If test_json is [{"col": 31900441201190696999}, {"col": "Text"}], then expecting a DataFrame with a number and a string would be reasonable. This currently raises ValueError: Value is too big

@rohitkg98
Copy link
Contributor

The ValueError is raised inside ultrajsondec.c . Should I try to solve this in '.c' or add an exception handler in _json.py?

@rohitkg98
Copy link
Contributor

@simonjayhawkins any updates on how should I approach this?

@jreback jreback added this to the 1.1 milestone Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants