Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download using HuggingFace datasets library #7

Closed
minhduc0711 opened this issue Nov 24, 2022 · 3 comments
Closed

Unable to download using HuggingFace datasets library #7

minhduc0711 opened this issue Nov 24, 2022 · 3 comments

Comments

@minhduc0711
Copy link

I'm using the same snippet from readme to download the dataset from HF

from datasets import load_dataset

dataset = load_dataset('poloclub/diffusiondb', 'large_random_1k')

but I'm getting this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/site-packages/datasets/load.py", line 1729, in load_dataset
    **config_kwargs,
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/site-packages/datasets/load.py", line 1498, in load_dataset_builder
    builder_cls = import_main_class(dataset_module.module_path)
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/site-packages/datasets/load.py", line 115, in import_main_class
    module = importlib.import_module(module_path)
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/minhduc0711/.cache/huggingface/modules/datasets_modules/datasets/poloclub--diffusiondb/8e4f79d20e94e3f261bfbea0101aa5047d6961c1d124920dc067889f88f5cddd/diffusiondb.py", line 50, in <module>
    "datasets/poloclub/diffusiondb", filename=f"images/part-{i:06}.zip"
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/minhduc0711/miniconda3/envs/diffusion/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py", line 167, in validate_repo_id
    "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'datasets/poloclub/diffusiondb'. Use `repo_type` argument if needed.

Some details about my environment:

  • Python 3.7.15
  • datasets 2.7.1
@neverix
Copy link

neverix commented Nov 24, 2022

Same issue

@xiaohk
Copy link
Member

xiaohk commented Nov 24, 2022

Hi @minhduc0711 , @neverix, thanks for reaching out!

I can reproduce the error with datasets==2.7.1 and huggingface_hub==0.11.0. This issue is related to huggingface/datasets#5274. It requires the maintainer of datasets to fix this issue.

In the meantime, you can consider downgrading datasets and huggingface_hub as a workaround.

pip install huggingface_hub==0.10.1 datasets==2.6.1

xiaohk added a commit that referenced this issue Nov 24, 2022
Signed-off-by: Jay Wang <jay@zijie.wang>
@xiaohk
Copy link
Member

xiaohk commented Nov 24, 2022

Actually 9eb91c7 should fix it!

@minhduc0711 @neverix Please try again with the latest datasets and huggingface_hub. Let me know if you encounter any issues.

@xiaohk xiaohk closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants