-
Notifications
You must be signed in to change notification settings - Fork 387
add config_name and data_files to datasetcard.md specs #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add config_name and data_files to datasetcard.md specs #1292
Conversation
julien-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Note that there's also a config_names: list[str] (i.e the previous property) but i don't think we need to document it
polinaeterna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you a lot for working on this!
I've left a suggestion to reformat it a bit to keep it aligned with the rest of the file.
Also I wonder if we should try to include all possible cases in these specs or keep it simple as it is now and refer to https://huggingface.co/docs/hub/datasets-manual-configuration instead? cc @severo because there are a lot of parameters apart from data_files that can be passed, for example for csv builder one can specify a sep, for audio/imagefolder builders - drop_metadata/drop_labels, etc. there is also data_dir parameter, similar to data_files.
I personally would keep it simple but not sure cc @lhoestq @stevhliu
Co-authored-by: Polina Kazakova <polina@huggingface.co>
Since this is just an example dataset card, we don't necessarily need to include all possible parameters. A simple example use case should suffice, with a link to more options (to allow users to explore more gradually if they'd like), as you suggested sounds great to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the addition ! looks like we missed this update
Also I wonder if we should try to include all possible cases in these specs or keep it simple as it is now and refer to https://huggingface.co/docs/hub/datasets-manual-configuration instead? cc @severo because there are a lot of parameters apart from data_files that can be passed, for example for csv builder one can specify a sep, for audio/imagefolder builders - drop_metadata/drop_labels, etc. there is also data_dir parameter, similar to data_files.
I personally would keep it simple but not sure cc @lhoestq @stevhliu
I think it's fine for now. We only have this page for a full list of parameters, it would be cool to have a more readable page for this that we could link to from this spec file later
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
julien-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Co-authored-by: Polina Kazakova <polina@huggingface.co>
As realised in #1222 the specs of the datasetcard are not up to date. This PR therefore updates the specs.
Let me know if this is sufficient or you would like to add more config examples.
Cc: @severo, @polinaeterna