Skip to content

fail to download the benchmark #2

@MengHao666

Description

@MengHao666

code

from datasets import load_dataset


dataset = load_dataset("liyyy/ComplexBench-Edit")

error

Generating test split:   0%|                                                                                                           | 0/763 [00:00<?, ? examples/s]
Traceback (most recent call last):
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/builder.py", line 1854, in _prepare_split_single
    for _, table in generator:
                    ^^^^^^^^^
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/packaged_modules/parquet/parquet.py", line 93, in _generate_tables
    for batch_idx, record_batch in enumerate(
                                   ^^^^^^^^^^
  File "pyarrow/_dataset.pyx", line 3830, in _iterator
  File "pyarrow/_dataset.pyx", line 3436, in pyarrow._dataset.TaggedRecordBatchIterator.__next__
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: Repetition level histogram size mismatch

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/load.py", line 2151, in load_dataset
    builder_instance.download_and_prepare(
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/builder.py", line 924, in download_and_prepare
    self._download_and_prepare(
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/builder.py", line 1000, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/builder.py", line 1741, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/momiao/miniforge3/lib/python3.12/site-packages/datasets/builder.py", line 1897, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions