ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})} #6930

CLL112 · 2024-05-29T12:40:05Z

Describe the bug

When I run the code en = load_dataset("allenai/c4", "en", streaming=True), I encounter an error: raise ValueError(f"Couldn't infer the same data file format for all splits. Got {split_modules}") ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})}.
However, running dataset = load_dataset('allenai/c4', streaming=True, data_files={'validation': 'en/c4-validation.00003-of-00008.json.gz'}, split='validation') works fine. What is the issue here?

Steps to reproduce the bug

run code：
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
from datasets import load_dataset

en = load_dataset("allenai/c4", "en", streaming=True)

Expected behavior

Successfully loaded the dataset.

Environment info

datasets version: 2.18.0
Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.17
Python version: 3.8.19
huggingface_hub version: 0.22.2
PyArrow version: 15.0.2
Pandas version: 2.0.3
fsspec version: 2024.2.0

The text was updated successfully, but these errors were encountered:

xioatian1 · 2024-06-18T11:56:12Z

How do you solve it ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})} #6930

ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})} #6930

CLL112 commented May 29, 2024

xioatian1 commented Jun 18, 2024

ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})} #6930

ValueError: Couldn't infer the same data file format for all splits. Got {'train': ('json', {}), 'validation': (None, {})} #6930

Comments

CLL112 commented May 29, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

xioatian1 commented Jun 18, 2024