Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Viewer issue for Jean-Baptiste/wikiner_fr #4996

Closed
severo opened this issue Sep 20, 2022 · 2 comments
Closed

Dataset Viewer issue for Jean-Baptiste/wikiner_fr #4996

severo opened this issue Sep 20, 2022 · 2 comments

Comments

@severo
Copy link
Contributor

severo commented Sep 20, 2022

Link

https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr

Description

Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'
Traceback:    Traceback (most recent call last):
               File "/src/services/worker/src/worker/responses/first_rows.py", line 337, in get_first_rows_response
                 rows = get_rows(dataset, config, split, streaming=True, rows_max_number=rows_max_number, hf_token=hf_token)
               File "/src/services/worker/src/worker/utils.py", line 123, in decorator
                 return func(*args, **kwargs)
               File "/src/services/worker/src/worker/responses/first_rows.py", line 77, in get_rows
                 rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 718, in __iter__
                 for key, example in self._iter():
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 708, in _iter
                 yield from ex_iterable
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 112, in __iter__
                 yield from self.generate_examples_fn(**self.kwargs)
               File "/tmp/modules-cache/datasets_modules/datasets/Jean-Baptiste--wikiner_fr/683a580ba6ec769d508f7dfc603a651667b0ed3817b1ae5bfd45f97cc024923f/wikiner_fr.py", line 165, in _generate_examples
                 dataset = Dataset.load_from_disk(filepath)
               File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1210, in load_from_disk
                 with open(Path(dataset_path, config.DATASET_STATE_JSON_FILENAME).as_posix(), encoding="utf-8") as state_file:
             FileNotFoundError: [Errno 2] No such file or directory: 'zip:/data/train::https:/huggingface.co/datasets/Jean-Baptiste/wikiner_fr/resolve/main/data.zip/state.json'

Is it an error with the dataset script, or the data itself, @huggingface/datasets?

https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/tree/main

Owner

No

@severo severo added the dataset-viewer Related to the dataset viewer on huggingface.co label Sep 20, 2022
@severo severo assigned severo and unassigned severo Sep 20, 2022
@severo severo removed the dataset-viewer Related to the dataset viewer on huggingface.co label Sep 20, 2022
@lhoestq
Copy link
Member

lhoestq commented Sep 20, 2022

The script uses Dataset.load_from_disk, which as you can expect, doesn't work in streaming mode.

It would probably be more practical to load the dataset locally using Dataset.load_from_disk first and then push_to_hub to upload it in Parquet on the Hub

@albertvillanova
Copy link
Member

I've transferred this issue to the Hub repo: https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr/discussions/3

I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants