-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Describe the bug
Creating a dataset with ArrayXD features leads to errors when downloading from hub due to DatasetCardData removing the Nones
Steps to reproduce the bug
import numpy as np
from datasets import Array2D, Dataset, Features, load_dataset
def examples_generator():
for i in range(4):
yield {
"array_1d": np.zeros((10,1), dtype="uint16"),
"array_2d": np.zeros((10, 1), dtype="uint16"),
}
features = Features(array_1d=Array2D((None,1), "uint16"), array_2d=Array2D((None, 1), "uint16"))
dataset = Dataset.from_generator(examples_generator, features=features)
dataset.push_to_hub("alex-hh/test_array_1d2d")
ds = load_dataset("alex-hh/test_array_1d2d")Source of error appears to be DatasetCardData.to_dict invoking DatasetCardData._remove_none
from huggingface_hub import DatasetCardData
from datasets.info import DatasetInfosDict
dataset_card_data = DatasetCardData()
DatasetInfosDict({"default": dataset.info.copy()}).to_dataset_card_data(dataset_card_data)
print(dataset_card_data.to_dict()) # removes Nones in shapeExpected behavior
Should be possible to load datasets saved with shape None in leading dimension
Environment info
3.0.2 and latest huggingface_hub
MeriDK
Metadata
Metadata
Assignees
Labels
No labels