You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Am I correct in assuming that if you specify a "config" in a dataset, only the given config is downloaded, but if you specify a split, all splits for that config are downloaded? I came across it when using facebook's belebele (https://huggingface.co/datasets/facebook/belebele). Instead of a config for each language, they use a split for each language, but that seems to mean that the full dataset is downloaded, even if you select just one language split.
For languages, we recommend using different configs, not splits.
Maybe we should also show a warning / open a PR/discussion? when a dataset contains more than 5 splits, hinting that it might be better to use configs?
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
See https://huggingface.slack.com/archives/C039P47V1L5/p1713172703779839
For languages, we recommend using different configs, not splits.
Maybe we should also show a warning / open a PR/discussion? when a dataset contains more than 5 splits, hinting that it might be better to use configs?
The text was updated successfully, but these errors were encountered: