Skip to content

Cannot load dataset, fails with nested data conversions not implemented for chunked array outputs #7793

@neevparikh

Description

@neevparikh

Describe the bug

Hi! When I load this dataset, it fails with a pyarrow error. I'm using datasets 4.1.1, though I also see this with datasets 4.1.2

To reproduce:

import datasets

ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")

Error:

Traceback (most recent call last):
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1815, in _prepare_split_single
    for _, table in generator:
                    ^^^^^^^^^
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/packaged_modules/parquet/parquet.py", line 93, in _generate_tables
    for batch_idx, record_batch in enumerate(
                                   ~~~~~~~~~^
        parquet_fragment.to_batches(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        )
        ^
    ):
    ^
  File "pyarrow/_dataset.pyx", line 3904, in _iterator
  File "pyarrow/_dataset.pyx", line 3494, in pyarrow._dataset.TaggedRecordBatchIterator.__next__
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/neev/scratch/test_hf.py", line 3, in <module>
    ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/load.py", line 1412, in load_dataset
    builder_instance.download_and_prepare(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        download_config=download_config,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        storage_options=storage_options,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 894, in download_and_prepare
    self._download_and_prepare(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        dl_manager=dl_manager,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **download_and_prepare_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 970, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1702, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1858, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

Steps to reproduce the bug

To reproduce:

import datasets

ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")

Expected behavior

The dataset loads

Environment info

Datasets: 4.1.1
Python: 3.13
Platform: Macos

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions