Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Value Error while loading a dataset.. #5388

Closed
valmetisrinivas opened this issue Dec 23, 2022 · 4 comments
Closed

Getting Value Error while loading a dataset.. #5388

valmetisrinivas opened this issue Dec 23, 2022 · 4 comments

Comments

@valmetisrinivas
Copy link

Describe the bug

I am trying to load a dataset using Hugging Face Datasets load_dataset method. I am getting the value error as show below. Can someone help with this? I am using Windows laptop and Google Colab notebook.

WARNING:datasets.builder:Using custom data configuration default-a1d9e8eaedd958cd
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-12-5b4fdcb8e6d5>](https://localhost:8080/#) in <module>
      6 )
      7 
----> 8 next(iter(law_dataset_streamed))

17 frames
[/usr/local/lib/python3.8/dist-packages/fsspec/core.py](https://localhost:8080/#) in get_compression(urlpath, compression)
    485         compression = infer_compression(urlpath)
    486     if compression is not None and compression not in compr:
--> 487         raise ValueError("Compression type %s not supported" % compression)
    488     return compression
    489 

ValueError: Compression type zstd not supported

Steps to reproduce the bug

!pip install zstandard
from datasets import load_dataset

lds = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)

Expected behavior

I expect an iterable object as the output 'lds' to be created.

Environment info

Windows laptop with Google Colab notebook

@mariosasko
Copy link
Collaborator

Hi! I can't reproduce this error locally (Mac) or in Colab. What version of datasets are you using?

@valmetisrinivas
Copy link
Author

Hi mariosasko, the datasets version is '2.8.0'.

@albertvillanova
Copy link
Member

albertvillanova commented Dec 27, 2022

@valmetisrinivas you get that error because you imported datasets (and thus fsspec) before installing zstandard.

Please, restart your Colab runtime and execute the install commands before importing datasets:

!pip install datasets
!pip install zstandard

from datasets import load_dataset

ds = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)
next(iter(ds))

@valmetisrinivas
Copy link
Author

@valmetisrinivas you get that error because you imported datasets (and thus fsspec) before installing zstandard.

Please, restart your Colab runtime and execute the install commands before importing datasets:

!pip install datasets
!pip install zstandard

from datasets import load_dataset

ds = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)
next(iter(ds))

I guess that was the problem, importing datasets before the installation of zstandard. Thank you for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants