Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the private datasets #39

Closed
3 tasks
severo opened this issue Sep 23, 2021 · 35 comments
Closed
3 tasks

Enable the private datasets #39

severo opened this issue Sep 23, 2021 · 35 comments
Labels
feature request Request for a new feature P1 Not as needed as P0, but still important/wanted

Comments

@severo
Copy link
Collaborator

severo commented Sep 23, 2021

The code is already present to pass the token, but it's disabled in the code (hardcoded):

https://github.com/huggingface/datasets-preview-backend/blob/df04ffba9ca1a432ed65e220cf7722e518e0d4f8/src/datasets_preview_backend/cache.py#L119-L120

  • enable private datasets and manage their cache adequately
  • separate private caches from public caches: for authenticated requests, we need to check every time, or at least use a much lower TTL, because an access can be removed. Also: since a hub dataset can be turned private, how should we manage them?
  • add doc. See f6576d5
@severo severo added the question Further information is requested label Sep 23, 2021
@severo severo changed the title Enable the private datasets Enable the private datasets? Sep 23, 2021
@severo severo mentioned this issue Sep 23, 2021
13 tasks
severo added a commit that referenced this issue Oct 4, 2021
It's not compatible with the cache for now. See
#39 to
restablish the feature.
@severo
Copy link
Collaborator Author

severo commented Oct 4, 2021

Previous code has been removed with b08e649, for reference

@severo
Copy link
Collaborator Author

severo commented Jan 20, 2022

Requested here: huggingface/datasets#3604

@severo severo added high-priority and removed question Further information is requested labels Jan 31, 2022
@severo
Copy link
Collaborator Author

severo commented Jan 31, 2022

Also: asked for by the AutoNLP team, to be able to preview the training datasets

@severo
Copy link
Collaborator Author

severo commented Mar 16, 2022

@severo
Copy link
Collaborator Author

severo commented Mar 24, 2022

@rajshah4
Copy link

As we get more enterprise prospects and customers, it would be nice to be able to show off the dataset viewer on private datasets. So +1 on this feature request from me.

@severo
Copy link
Collaborator Author

severo commented Jun 14, 2022

Yes, it's in the list of the next features that we will implement. We're working on a roadmap to make it clearer for everybody what could be expected in the following months. Viewer for private models could not be reasonably implemented with the previous "artisanal" infrastructure because it would have fallen under the load, but now that we run in Kubernetes, it should work seamlessly.

@severo severo added this to the Private datasets milestone Jun 17, 2022
@severo severo added the question Further information is requested label Jun 17, 2022
@severo
Copy link
Collaborator Author

severo commented Jul 19, 2022

@severo
Copy link
Collaborator Author

severo commented Sep 15, 2022

Priority level:

https://huggingface.slack.com/archives/C03M773CSBS/p1663233611753009?thread_ts=1662756193.298359&cid=C03M773CSBS

it's not a top priority from what we've said yesterday (focus on shipping the dataset-server)

@severo severo added feature request Request for a new feature and removed enhancement question Further information is requested labels Sep 19, 2022
@severo severo changed the title Enable the private datasets? Enable the private datasets Sep 20, 2022
@severo
Copy link
Collaborator Author

severo commented Nov 25, 2022

@severo
Copy link
Collaborator Author

severo commented Jan 5, 2023

@faustusdotbe
Copy link

faustusdotbe commented Jun 27, 2023

Hi! Chiming in to show interest. I completely understand this is not a top priority, but being able to have the viewer in private datasets would be super cool (also as a last final "sanity check" before making the repo public for example).

Happy to help debug, if needed

@alexgshaw
Copy link

+1 to this feature

@DavidFarago
Copy link

+1 for this feature

@arikanev
Copy link

+1

@TheAnimeGuru
Copy link

+1 This would be very useful

@severo severo added the P2 Nice to have label Jul 26, 2023
@nealchandra
Copy link

I see this issue has been tagged as a P2 -- are you able to give us a rough estimate of what that might mean in terms of when it could land?

@severo
Copy link
Collaborator Author

severo commented Jul 31, 2023

No, we will update here when we have an ETA. Meanwhile, see #39 (comment)

@Stillerman
Copy link

+1

@prassanna-ravishankar
Copy link

Is there a way we can explicitly enable this for "private datasets"(at our own risk) ? At the moment my dataset has to be private, but I am concerned less about leakages etc. Would love to enable it

@severo
Copy link
Collaborator Author

severo commented Sep 7, 2023

Is there a way we can explicitly enable this for "private datasets"(at our own risk) ?

No, it's not possible at the moment. It is not a replacement, but if it works for you, maybe you can set your dataset as gated: the dataset viewer works with them.

@DavidFarago
Copy link

What is the strictest possible gating? Can you restrict it to effectively be a private dataset?

@severo
Copy link
Collaborator Author

severo commented Sep 8, 2023

What is the strictest possible gating? Can you restrict it to effectively be a private dataset?

It's not the same as private, but if you opt to manually approve the requests to access the gated dataset (https://huggingface.co/docs/hub/datasets-gated#manual-approval), you would avoid giving public access to the data.

@enoriega
Copy link

+1 to enable dataset viewer on private datasets

@eduardm
Copy link

eduardm commented Nov 8, 2023

+1. As an enterprise user, I don't want to share the dataset publicly, but not having a preview is really frustrating and makes collaboration difficult. Currently, we are creating a tiny dataset (10 rows) and making that public just to have a preview and understand what is what.

mattstern31 added a commit to mattstern31/datasets-server-storage-admin that referenced this issue Nov 11, 2023
It's not compatible with the cache for now. See
huggingface/dataset-viewer#39 to
restablish the feature.
@severo severo added P1 Not as needed as P0, but still important/wanted and removed P2 Nice to have labels Nov 14, 2023
@andreemic
Copy link

+1 for this!

The datasets filtering and sorting UI is very very barebones, so the viewer would help to navigate private datasets a lot

@severo
Copy link
Collaborator Author

severo commented Dec 18, 2023

@qionghuang6
Copy link

+1 for viewing private datasets

@julien-c
Copy link
Member

please continue +1'ing this issue (can be on the OP) so we get a sense for how many people/teams need this! 🙏

@sutgeorge
Copy link

+1

@egrace479
Copy link

This would be great for development of datasets, especially since the documentation for how the viewer/preview works is quite limited and currently requires piecing together information from multiple pages on the docs.

While we're waiting on this to be developed (assuming it still will be), could the docs be updated with clearer requirements for the preview feature (i.e., more than just that the dataset needs to be public)? When I run the list parquet files query I'm able to see that the parquet has failed, but it's not clear why. Is there a way to get more information about what went wrong? For one of the datasets in question it is simply a collection of images in train and val folders that is simpler than a similar dataset for which the parquet conversion was successful and displays a preview.

@severo
Copy link
Collaborator Author

severo commented Jan 31, 2024

Done!

The private datasets are now supported in the datasets-server, enabling the dataset viewer + parquet conversion on the dataset pages.

Enregistrement.de.l.ecran.2024-01-31.a.10.39.59.mov

Note that it's a paid feature, available as of today for Pro users and Enterprise orgs. Please give us feedback when you try this new feature!

@severo severo closed this as completed Jan 31, 2024
@faustusdotbe
Copy link

Merci Sylvain !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature P1 Not as needed as P0, but still important/wanted
Projects
None yet
Development

No branches or pull requests