Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak readme #5210

Merged
merged 7 commits into from Nov 24, 2022
Merged

Tweak readme #5210

merged 7 commits into from Nov 24, 2022

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Nov 7, 2022

Tweaked some paragraphs mentioning the modalities we support + added a paragraph on security

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 7, 2022

The documentation is not available anymore as the PR was closed or merged.

@mariosasko
Copy link
Contributor

Nit: We should also update the Disclaimers section to let the dataset owners know they should use Hub discussions rather than GH issues for removal requests/updates

@lhoestq
Copy link
Member Author

lhoestq commented Nov 18, 2022

Updated the disclaimers section, thanks !

Does it sound good to you @albertvillanova ?

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs improvements.

Some nits below: feel free to ignore them.

README.md Outdated Show resolved Hide resolved
README.md Outdated
@@ -46,6 +46,8 @@
- Smart caching: never wait for your data to process several times.
- Lightweight and fast with a transparent and pythonic API (multi-processing/caching/memory-mapping).
- Built-in interoperability with NumPy, pandas, PyTorch, Tensorflow 2 and JAX.
- Native support for audio and image data
- Stream datasets without downloading them completely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be confusing what means "without downloading them completely".

  • this could be understood as: stream datasets just saving them to disk partially

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to

Enable streaming mode to save disk space and start iterating over the dataset immediately.

README.md Outdated
If your dataset is bigger than your disk or if you don't want to wait to download the data, you can use streaming:

```python
# If you want to efficiently download the data as you iterate over the dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here, "download" can be understood as "save to disk".

Copy link
Member Author

@lhoestq lhoestq Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

If you want to use the dataset immediately and efficiently stream the data as you iterate over the dataset

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
lhoestq and others added 4 commits November 24, 2022 12:02
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
@lhoestq lhoestq merged commit 4c047f1 into main Nov 24, 2022
@lhoestq lhoestq deleted the tweak-readme branch November 24, 2022 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants