You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Error code: StreamingRowsError
Exception: FileNotFoundError
Message: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Traceback: Traceback (most recent call last):
File "/src/services/worker/src/worker/responses/first_rows.py", line 337, in get_first_rows_response
rows = get_rows(dataset, config, split, streaming=True, rows_max_number=rows_max_number, hf_token=hf_token)
File "/src/services/worker/src/worker/utils.py", line 123, in decorator
return func(*args, **kwargs)
File "/src/services/worker/src/worker/responses/first_rows.py", line 77, in get_rows
rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 718, in __iter__
for key, example in self._iter():
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 708, in _iter
yield from ex_iterable
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 112, in __iter__
yield from self.generate_examples_fn(**self.kwargs)
File "/tmp/modules-cache/datasets_modules/datasets/Jean-Baptiste--wikiner_fr/683a580ba6ec769d508f7dfc603a651667b0ed3817b1ae5bfd45f97cc024923f/wikiner_fr.py", line 165, in _generate_examples
dataset = Dataset.load_from_disk(filepath)
File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1210, in load_from_disk
with open(Path(dataset_path, config.DATASET_STATE_JSON_FILENAME).as_posix(), encoding="utf-8") as state_file:
FileNotFoundError: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Is it an error with the dataset script, or the data itself, @huggingface/datasets?
The script uses Dataset.load_from_disk, which as you can expect, doesn't work in streaming mode.
It would probably be more practical to load the dataset locally using Dataset.load_from_disk first and then push_to_hub to upload it in Parquet on the Hub
Link
https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr
Description
Is it an error with the dataset script, or the data itself, @huggingface/datasets?
https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/tree/main
Owner
No
The text was updated successfully, but these errors were encountered: