Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: #58925

Open
3 tasks done
cs-ranbi opened this issue Jun 4, 2024 · 3 comments · May be fixed by #58985
Open
3 tasks done

BUG: #58925

cs-ranbi opened this issue Jun 4, 2024 · 3 comments · May be fixed by #58985
Assignees
Labels
Bug IO JSON read_json, to_json, json_normalize

Comments

@cs-ranbi
Copy link

cs-ranbi commented Jun 4, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import io
import pandas as pd

df = pd.DataFrame(data={'index':[1,2], 'a': [2,3]})
s = df.to_json(orient="table")
df = pd.read_json(io.StringIO(s), orient="table")

Issue Description

read_json failed with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_json.py", line 815, in read_json
    return json_reader.read()
           ^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_json.py", line 1025, in read
    obj = self._get_object_parser(self.data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_json.py", line 1187, in parse
    self._parse()
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_json.py", line 1427, in _parse
    self.obj = parse_table_schema(json, precise_float=self.precise_float)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/io/json/_table_schema.py", line 380, in parse_table_schema
    df = df.set_index(table["schema"]["primaryKey"])
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 6178, in set_index
    index = ensure_index_from_sequences(arrays, names)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 7588, in ensure_index_from_sequences
    return Index(sequences[0], name=names)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 528, in __new__
    return cls(np.asarray(data), dtype=dtype, copy=copy, name=name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ranbi/vsa_cs/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 570, in __new__
    raise ValueError("Index data must be 1-dimensional") from err
ValueError: Index data must be 1-dimensional

Expected Behavior

read_json should return the same df as the original one

   index  a
0      1  2
1      2  3

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.2.final.0
python-bits : 64
OS : Darwin
OS-release : 23.1.0
Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:33:00 PDT 2023; root:xnu-10002.41.9~7/RELEASE_ARM64_T6031
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.2
numpy : 1.26.3
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.5.1
pip : 24.0
Cython : None
pytest : 8.0.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.2.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.12.3
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.3.1
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@cs-ranbi cs-ranbi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2024
@Aloqeely
Copy link
Member

Aloqeely commented Jun 5, 2024

Thanks for the report! The problem is caused because your DataFrame index did not have a name so it gets set a default name of index when using to_json(orient="table"), but you also have another column with the name index, so 2 index fields were saved.

You can fix this issue by renaming your index column or by setting a name for the actual index using df.rename_axis

PRs to fix this are welcome if they don't complicate the logic too much.

@Aloqeely Aloqeely added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2024
@taranarmo
Copy link

take

taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 11, 2024
This commit is itended to fix a bug GH pandas-dev#58925. If index.name is empty it
will use set_default_names inside __init__ to make check on overlapping
names fail. Otherwise it's done during schema creation and not reflected
on the dataframe itself which creates inconsistency between the data and
its schema.
taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 12, 2024
This commit is itended to fix a bug GH pandas-dev#58925. If index.name is empty it
will use set_default_names inside __init__ to make check on overlapping
names fail. Otherwise it's done during schema creation and not reflected
on the dataframe itself which creates inconsistency between the data and
its schema.
taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 12, 2024
This commit is itended to fix a bug GH pandas-dev#58925. If index.name is empty it
will use set_default_names inside __init__ to make check on overlapping
names fail. Otherwise it's done during schema creation and not reflected
on the dataframe itself which creates inconsistency between the data and
its schema.
taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 12, 2024
This commit is itended to fix GH pandas-dev#58925. If index.name is empty it will
use set_default_names inside __init__ to make check on overlapping names
fail. Otherwise it's done during schema creation and not reflected on
the dataframe itself which creates inconsistency between the data and
its schema.
taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 12, 2024
This commit is itended to fix GH pandas-dev#58925. If index.name is empty it will
use set_default_names inside __init__ to make check on overlapping names
fail. Otherwise it's done during schema creation and not reflected on
the dataframe itself which creates inconsistency between the data and
its schema.
taranarmo added a commit to taranarmo/pandas that referenced this issue Jun 12, 2024
This commit is itended to fix GH pandas-dev#58925. If index.name is empty it will
use set_default_names inside __init__ to make check on overlapping names
fail. Otherwise it's done during schema creation and not reflected on
the dataframe itself which creates inconsistency between the data and
its schema.
@taranarmo
Copy link

I made this case to fail on check whether index.name is in columns names as the easiest solution. The others would change the names of user's columns or invent other generic names of nameless index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants