Tweak readme #5210

lhoestq · 2022-11-07T14:51:23Z

Tweaked some paragraphs mentioning the modalities we support + added a paragraph on security

HuggingFaceDocBuilderDev · 2022-11-07T15:42:07Z

The documentation is not available anymore as the PR was closed or merged.

mariosasko · 2022-11-09T15:17:04Z

Nit: We should also update the Disclaimers section to let the dataset owners know they should use Hub discussions rather than GH issues for removal requests/updates

lhoestq · 2022-11-18T10:58:26Z

Updated the disclaimers section, thanks !

Does it sound good to you @albertvillanova ?

albertvillanova

Thanks for the docs improvements.

Some nits below: feel free to ignore them.

README.md

albertvillanova · 2022-11-24T07:52:19Z

README.md

@@ -46,6 +46,8 @@
 - Smart caching: never wait for your data to process several times.
 - Lightweight and fast with a transparent and pythonic API (multi-processing/caching/memory-mapping).
 - Built-in interoperability with NumPy, pandas, PyTorch, Tensorflow 2 and JAX.
+- Native support for audio and image data
+- Stream datasets without downloading them completely


I think it can be confusing what means "without downloading them completely".

this could be understood as: stream datasets just saving them to disk partially

I changed it to

Enable streaming mode to save disk space and start iterating over the dataset immediately.

albertvillanova · 2022-11-24T08:01:08Z

README.md

+If your dataset is bigger than your disk or if you don't want to wait to download the data, you can use streaming:
+
+```python
+# If you want to efficiently download the data as you iterate over the dataset


Again here, "download" can be understood as "save to disk".

Changed to

If you want to use the dataset immediately and efficiently stream the data as you iterate over the dataset

README.md

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

tweak readme

46c3306

lhoestq requested a review from albertvillanova November 7, 2022 14:51

add streaming

b1651a4

update disclaimers

3d9779e

albertvillanova approved these changes Nov 24, 2022

View reviewed changes

lhoestq and others added 4 commits November 24, 2022 12:02

Apply suggestions from code review

2659416

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

better wording on streaming

2623b20

typo

fbd3591

typo2

56907e0

lhoestq merged commit 4c047f1 into main Nov 24, 2022

lhoestq deleted the tweak-readme branch November 24, 2022 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak readme #5210

Tweak readme #5210

lhoestq commented Nov 7, 2022

HuggingFaceDocBuilderDev commented Nov 7, 2022 •

edited

mariosasko commented Nov 9, 2022

lhoestq commented Nov 18, 2022

albertvillanova left a comment

albertvillanova Nov 24, 2022

lhoestq Nov 24, 2022

albertvillanova Nov 24, 2022

lhoestq Nov 24, 2022 •

edited

Tweak readme #5210

Tweak readme #5210

Conversation

lhoestq commented Nov 7, 2022

HuggingFaceDocBuilderDev commented Nov 7, 2022 • edited

mariosasko commented Nov 9, 2022

lhoestq commented Nov 18, 2022

albertvillanova left a comment

Choose a reason for hiding this comment

albertvillanova Nov 24, 2022

Choose a reason for hiding this comment

lhoestq Nov 24, 2022

Choose a reason for hiding this comment

albertvillanova Nov 24, 2022

Choose a reason for hiding this comment

lhoestq Nov 24, 2022 • edited

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 7, 2022 •

edited

lhoestq Nov 24, 2022 •

edited