Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_dataset save_to_disk load_from_disk error #6353

Closed
brisker opened this issue Oct 26, 2023 · 5 comments
Closed

load_dataset save_to_disk load_from_disk error #6353

brisker opened this issue Oct 26, 2023 · 5 comments

Comments

@brisker
Copy link

brisker commented Oct 26, 2023

Describe the bug

datasets version: 2.10.1
I load_dataset and save_to_disk sucessfully on windows10( and I load_from_disk(/LLM/data/wiki) succcesfully on windows10), and I copy the dataset /LLM/data/wiki
into a ubuntu system, but when I load_from_disk(/LLM/data/wiki) on ubuntu, something weird happens:

load_from_disk('/LLM/data/wiki')
  File "/usr/local/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 1874, in load_from_disk
    return DatasetDict.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
  File "/usr/local/miniconda3/lib/python3.8/site-packages/datasets/dataset_dict.py", line 1309, in load_from_disk
    dataset_dict[k] = Dataset.load_from_disk(
  File "/usr/local/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1543, in load_from_disk
    fs_token_paths = fsspec.get_fs_token_paths(dataset_path, storage_options=storage_options)
  File "/usr/local/miniconda3/lib/python3.8/site-packages/fsspec/core.py", line 610, in get_fs_token_paths
    chain = _un_chain(urlpath0, storage_options or {})
  File "/usr/local/miniconda3/lib/python3.8/site-packages/fsspec/core.py", line 325, in _un_chain
    cls = get_filesystem_class(protocol)
  File "/usr/local/miniconda3/lib/python3.8/site-packages/fsspec/registry.py", line 232, in get_filesystem_class
    raise ValueError(f"Protocol not known: {protocol}")
ValueError: Protocol not known: /LLM/data/wiki

It seems that something went wrong on the arrow file?
How can I solve this , since currently I can not save_to_disk on ubuntu system

Steps to reproduce the bug

datasets version: 2.10.1

Expected behavior

datasets version: 2.10.1

Environment info

datasets version: 2.10.1

@brisker
Copy link
Author

brisker commented Oct 26, 2023

solved.
fsspec version problem

@brisker brisker closed this as completed Oct 26, 2023
@beyondguo
Copy link

I'm using the latest datasets and fsspec , but still got this error!

datasets : Version: 2.13.0

fsspec Version: 2023.10.0

File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/datasets/load.py", line 1892, in load_from_disk
    return DatasetDict.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
  File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/datasets/dataset_dict.py", line 1371, in load_from_disk
    dataset_dict[k] = Dataset.load_from_disk(
  File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1639, in load_from_disk
    fs_token_paths = fsspec.get_fs_token_paths(dataset_path, storage_options=storage_options)
  File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/fsspec/core.py", line 610, in get_fs_token_paths
    chain = _un_chain(urlpath0, storage_options or {})
  File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/fsspec/core.py", line 325, in _un_chain
    cls = get_filesystem_class(protocol)
  File "/home/guoby/app/Anaconda3-2021.05/envs/news/lib/python3.8/site-packages/fsspec/registry.py", line 232, in get_filesystem_class
    raise ValueError(f"Protocol not known: {protocol}")

@AlisonWen
Copy link

AlisonWen commented Nov 22, 2023

These two versions work.
截圖 2023-11-22 下午5 55 28

@robinsonmhj
Copy link

datasets==2.10.1 and fsspec==2023.6.0 also works for me.

@1412690667
Copy link

确实

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants