Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load json file error with v2.20.0 #6977

Closed
xiaoyaolangzhi opened this issue Jun 18, 2024 · 2 comments
Closed

load json file error with v2.20.0 #6977

xiaoyaolangzhi opened this issue Jun 18, 2024 · 2 comments
Assignees

Comments

@xiaoyaolangzhi
Copy link

Describe the bug

load_dataset(path="json", data_files="./test.json")
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 132, in _generate_tables
    pa_table = paj.read_json(
  File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1997, in _prepare_split_single
    for _, table in generator:
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 155, in _generate_tables
    df = pd.read_json(f, dtype_backend="pyarrow")
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
TypeError: read_json() got an unexpected keyword argument 'dtype_backend'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/t1.py", line 11, in <module>
    load_dataset(path=data_path, data_files="./t2.json")
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2616, in load_dataset
    builder_instance.download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1029, in download_and_prepare
    self._download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1124, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1884, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 2040, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
import pandas as pd
with open("./test.json", "r") as f:
    df = pd.read_json(f, dtype_backend="pyarrow")
Traceback (most recent call last):
  File "/app/t3.py", line 3, in <module>
    df = pd.read_json(f, dtype_backend="pyarrow")
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
TypeError: read_json() got an unexpected keyword argument 'dtype_backend'

Steps to reproduce the bug

.

Expected behavior

.

Environment info

datasets                  2.20.0
pandas                    1.5.3
@albertvillanova albertvillanova self-assigned this Jun 18, 2024
@albertvillanova
Copy link
Member

albertvillanova commented Jun 18, 2024

Thanks for reporting, @xiaoyaolangzhi.

Indeed, we are currently requiring pandas >= 2.0.0.

You will need to update pandas in your local environment:

pip install -U pandas

@xiaoyaolangzhi
Copy link
Author

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants