-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Describe the bug
Hi! When I load this dataset, it fails with a pyarrow error. I'm using datasets 4.1.1, though I also see this with datasets 4.1.2
To reproduce:
import datasets
ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")
Error:
Traceback (most recent call last):
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1815, in _prepare_split_single
for _, table in generator:
^^^^^^^^^
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/packaged_modules/parquet/parquet.py", line 93, in _generate_tables
for batch_idx, record_batch in enumerate(
~~~~~~~~~^
parquet_fragment.to_batches(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
)
^
):
^
File "pyarrow/_dataset.pyx", line 3904, in _iterator
File "pyarrow/_dataset.pyx", line 3494, in pyarrow._dataset.TaggedRecordBatchIterator.__next__
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/neev/scratch/test_hf.py", line 3, in <module>
ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/load.py", line 1412, in load_dataset
builder_instance.download_and_prepare(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
download_config=download_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
storage_options=storage_options,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 894, in download_and_prepare
self._download_and_prepare(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
dl_manager=dl_manager,
^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
**download_and_prepare_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 970, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1702, in _prepare_split
for job_id, done, content in self._prepare_split_single(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/Users/neev/scratch/.venv/lib/python3.13/site-packages/datasets/builder.py", line 1858, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
Steps to reproduce the bug
To reproduce:
import datasets
ds = datasets.load_dataset(path="metr-evals/malt-public", name="irrelevant_detail")
Expected behavior
The dataset loads
Environment info
Datasets: 4.1.1
Python: 3.13
Platform: Macos
Metadata
Metadata
Assignees
Labels
No labels