You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the code en = load_dataset("allenai/c4", "en", streaming=True), I encounter an error: raise ValueError(f"Couldn't infer the same data file format for all splits. Got {split_modules}") ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})}.
However, running dataset = load_dataset('allenai/c4', streaming=True, data_files={'validation': 'en/c4-validation.00003-of-00008.json.gz'}, split='validation') works fine. What is the issue here?
Steps to reproduce the bug
run code:
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
from datasets import load_dataset
en = load_dataset("allenai/c4", "en", streaming=True)
Describe the bug
When I run the code en = load_dataset("allenai/c4", "en", streaming=True), I encounter an error: raise ValueError(f"Couldn't infer the same data file format for all splits. Got {split_modules}") ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})}.
However, running dataset = load_dataset('allenai/c4', streaming=True, data_files={'validation': 'en/c4-validation.00003-of-00008.json.gz'}, split='validation') works fine. What is the issue here?
Steps to reproduce the bug
run code:
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
from datasets import load_dataset
en = load_dataset("allenai/c4", "en", streaming=True)
Expected behavior
Successfully loaded the dataset.
Environment info
datasets
version: 2.18.0huggingface_hub
version: 0.22.2fsspec
version: 2024.2.0The text was updated successfully, but these errors were encountered: