-
Notifications
You must be signed in to change notification settings - Fork 382
Dataset Viewer, Structure and Libraries docs #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
severo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for these docs, they are at the correct level of simplicity and avoid having to understand the details of the datasets library.
julien-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you ping me again when the doc-build worked? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a quick glance i do like the fact that we present multiple tools that are compatible with dataset repos! That's quite cool
julien-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last comment, you should also tag @davanstrien for review IMO given the recent post https://huggingface.co/blog/researcher-dataset-sharing and interests in dataset advocacy!!
|
|
||
| ### Create a Dataset card | ||
|
|
||
| Adding a Dataset card is super valuable for helping users find your dataset and understand how to use it responsibly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a link to https://huggingface.co/docs/hub/datasets-cards for users that don't know what a Dataset card is.
Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co>
Sorry, missed that :) |
mariosasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And some comments from me:
| Some libraries like [🤗 Datasets](https://huggingface.co/docs/datasets/index), [Pandas](https://pandas.pydata.org/), [Dask](https://www.dask.org/) or [DuckDB](https://duckdb.org/) can upload files to the Hub. | ||
| See the list of [Libraries supported by the Datasets Hub](./datasets-libraries) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to have links to the upload examples from these libraries' dedicated doc pages here (we can split these pages into the Upload and Download sections to make them linkable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The links already appear on the navigation tab on the left when you are on this page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. But I'm not sure why these links are not expanded (automatically) when clicking on the [Libraries supported by the Datasets Hub](./datasets-libraries) link on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm also seeing this behavior when clicking on the Libraries link from the documentation index page.
Let's see with the docs front-end team if we can fix that
Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
|
Thanks for the comments, I took them into account :) |
polinaeterna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very much like! i also added a few suggestions :)
|
|
||
| ## Basic use-case | ||
|
|
||
| If your dataset isn't split into [train/validation/test splits](https://en.wikipedia.org/wiki/Training,_validation,_and_test_data_sets), the simplest dataset structure is to have one file: `data.csv` (this works with any supported file format and any file name). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any supported data format - there are a lot of mentions of supported formats through all the docs so maybe indeed it makes sense to list a complete set of them somewhere and point to it?
Co-authored-by: Polina Kazakova <polina@huggingface.co>
|
Let me know if you have other comments @mariosasko @polinaeterna @Wauplin Also cc @julien-c the documentation preview is working now if you want to take a look |
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Made a last check on the docs and everything looks fine 👍
Co-authored-by: Lucain <lucainp@gmail.com>
mariosasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
polinaeterna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you a lot for this work! i've left a couple of nits suggestions, you can ignore them except for the link to the file-formats section (otherwise it's broken) and some punctuation.
|
let's 🚢 this verrrrry long-standing PR no? |
Co-authored-by: Polina Kazakova <polina@huggingface.co>
|
Thanks for all the reviews :) |
The goal is to make the Dataset Hub docs more focused on the Hub features like the Viewer, and less on the
datasetslib. In this PRdatasetsbecomes one of the many libraries that can be use with the Hub instead of being shown as en entry point.I added a Configure the Dataset Viewer page
-> this will help users have a working Dataset Viewer without knowledge about how
datasetsworksI added a Data files Configuration page
-> this gives more detail on how to structure a dataset (e.g. for splits)
I added a Libraries page and dedicated pages for:
datasetsdocs)-> the focus is less on the
datasetslibrary to show that people are actually free to use whatever tools they want.-> they're pretty simple for now and we should keep enriching them
Added Uploading datasets and Downloading datasets for consistency with model docs
TODO in the
datasetsdocs (will open a PR shortly):