Skip to content

Conversation

@lappemic
Copy link
Contributor

@lappemic lappemic commented May 17, 2024

As realised in #1222 the specs of the datasetcard are not up to date. This PR therefore updates the specs.

Let me know if this is sufficient or you would like to add more config examples.

Cc: @severo, @polinaeterna

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Note that there's also a config_names: list[str] (i.e the previous property) but i don't think we need to document it

Copy link

@polinaeterna polinaeterna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you a lot for working on this!
I've left a suggestion to reformat it a bit to keep it aligned with the rest of the file.

Also I wonder if we should try to include all possible cases in these specs or keep it simple as it is now and refer to https://huggingface.co/docs/hub/datasets-manual-configuration instead? cc @severo because there are a lot of parameters apart from data_files that can be passed, for example for csv builder one can specify a sep, for audio/imagefolder builders - drop_metadata/drop_labels, etc. there is also data_dir parameter, similar to data_files.
I personally would keep it simple but not sure cc @lhoestq @stevhliu

Co-authored-by: Polina Kazakova <polina@huggingface.co>
@stevhliu
Copy link
Member

I personally would keep it simple but not sure

Since this is just an example dataset card, we don't necessarily need to include all possible parameters. A simple example use case should suffice, with a link to more options (to allow users to explore more gradually if they'd like), as you suggested sounds great to me.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the addition ! looks like we missed this update

Also I wonder if we should try to include all possible cases in these specs or keep it simple as it is now and refer to https://huggingface.co/docs/hub/datasets-manual-configuration instead? cc @severo because there are a lot of parameters apart from data_files that can be passed, for example for csv builder one can specify a sep, for audio/imagefolder builders - drop_metadata/drop_labels, etc. there is also data_dir parameter, similar to data_files.
I personally would keep it simple but not sure cc @lhoestq @stevhliu

I think it's fine for now. We only have this page for a full list of parameters, it would be cool to have a more readable page for this that we could link to from this spec file later

lappemic and others added 2 commits May 17, 2024 23:02
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@lappemic lappemic requested a review from polinaeterna May 17, 2024 21:09
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Co-authored-by: Polina Kazakova <polina@huggingface.co>
@lhoestq lhoestq merged commit 48ea61b into huggingface:main May 23, 2024
@lappemic lappemic deleted the 1222-improve-datasetcard-spec branch May 27, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants