You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is impossible to concatenate datasets if a feature is sequence of dict in one dataset and a dict of sequence in another. But based on the document, it should be automatically converted.
A datasets.Sequence with a internal dictionary feature will be automatically converted into a dictionary of lists. This behavior is implemented to have a compatilbity layer with the TensorFlow Datasets library but may be un-wanted in some cases. If you don’t want this behavior, you can use a python list instead of the datasets.Sequence.
Hi! I agree we should improve the features equality checks to account for this particular case. However, your code fails due to answer_start having the dtype int64 instead of int32 after loading from JSON (it's not possible to embed type precision info into a JSON file; save_to_disk does that for arrow files), which would lead to the concatenation error as PyArrow does not support this sort of type promotion. This can be fixed as follows:
Describe the bug
It is impossible to concatenate datasets if a feature is sequence of dict in one dataset and a dict of sequence in another. But based on the document, it should be automatically converted.
Steps to reproduce the bug
Expected results
No error executing that code
Actual results
Environment info
datasets
version: 2.3.2The text was updated successfully, but these errors were encountered: