Enable the private datasets #39

severo · 2021-09-23T09:42:47Z

The code is already present to pass the token, but it's disabled in the code (hardcoded):

https://github.com/huggingface/datasets-preview-backend/blob/df04ffba9ca1a432ed65e220cf7722e518e0d4f8/src/datasets_preview_backend/cache.py#L119-L120

enable private datasets and manage their cache adequately
separate private caches from public caches: for authenticated requests, we need to check every time, or at least use a much lower TTL, because an access can be removed. Also: since a hub dataset can be turned private, how should we manage them?
add doc. See f6576d5

The text was updated successfully, but these errors were encountered:

It's not compatible with the cache for now. See #39 to restablish the feature.

severo · 2021-10-04T16:25:53Z

Previous code has been removed with b08e649, for reference

severo · 2022-01-20T08:11:40Z

Requested here: huggingface/datasets#3604

severo · 2022-01-31T21:52:08Z

Also: asked for by the AutoNLP team, to be able to preview the training datasets

severo · 2022-03-16T19:50:07Z

Also requested here: https://huggingface.slack.com/archives/C01229B19EX/p1647460106784919?thread_ts=1647372809.022069&cid=C01229B19EX

severo · 2022-03-24T09:53:01Z

https://github.com/huggingface/moon-landing/pull/2442 has added an endpoint to check the authentication:

https://github.com/huggingface/moon-landing/blob/bb1bcde43561c1831aef4c1c6a331dec7f735171/server/server.ts#L368-L369

rajshah4 · 2022-06-14T03:01:54Z

As we get more enterprise prospects and customers, it would be nice to be able to show off the dataset viewer on private datasets. So +1 on this feature request from me.

severo · 2022-06-14T08:05:27Z

Yes, it's in the list of the next features that we will implement. We're working on a roadmap to make it clearer for everybody what could be expected in the following months. Viewer for private models could not be reasonably implemented with the previous "artisanal" infrastructure because it would have fallen under the load, but now that we run in Kubernetes, it should work seamlessly.

severo · 2022-07-19T13:23:00Z

Required here for private hub: https://huggingface.slack.com/archives/CTKK32GE8/p1658236535144079?thread_ts=1658236048.820219&cid=CTKK32GE8

severo · 2022-09-15T12:49:19Z

Priority level:

https://huggingface.slack.com/archives/C03M773CSBS/p1663233611753009?thread_ts=1662756193.298359&cid=C03M773CSBS

it's not a top priority from what we've said yesterday (focus on shipping the dataset-server)

severo · 2022-11-25T08:55:55Z

Requested on the forum: https://discuss.huggingface.co/t/the-dataset-preview-has-been-disabled-on-this-dataset/21339/5

severo · 2023-01-05T17:46:41Z

Also here: https://discuss.huggingface.co/t/does-the-rest-api-work-with-private-repo/28987 (to consume the API)

faustusdotbe · 2023-06-27T06:25:56Z

Hi! Chiming in to show interest. I completely understand this is not a top priority, but being able to have the viewer in private datasets would be super cool (also as a last final "sanity check" before making the repo public for example).

Happy to help debug, if needed

alexgshaw · 2023-07-04T01:49:32Z

+1 to this feature

DavidFarago · 2023-07-18T14:43:21Z

+1 for this feature

arikanev · 2023-07-20T22:45:35Z

+1

TheAnimeGuru · 2023-07-21T09:59:11Z

+1 This would be very useful

nealchandra · 2023-07-31T08:16:51Z

I see this issue has been tagged as a P2 -- are you able to give us a rough estimate of what that might mean in terms of when it could land?

severo · 2023-07-31T20:57:06Z

No, we will update here when we have an ETA. Meanwhile, see #39 (comment)

Stillerman · 2023-08-02T17:34:15Z

+1

prassanna-ravishankar · 2023-09-07T14:12:26Z

Is there a way we can explicitly enable this for "private datasets"(at our own risk) ? At the moment my dataset has to be private, but I am concerned less about leakages etc. Would love to enable it

severo · 2023-09-07T14:25:18Z

Is there a way we can explicitly enable this for "private datasets"(at our own risk) ?

No, it's not possible at the moment. It is not a replacement, but if it works for you, maybe you can set your dataset as gated: the dataset viewer works with them.

DavidFarago · 2023-09-07T15:46:04Z

What is the strictest possible gating? Can you restrict it to effectively be a private dataset?

severo · 2023-09-08T11:32:09Z

What is the strictest possible gating? Can you restrict it to effectively be a private dataset?

It's not the same as private, but if you opt to manually approve the requests to access the gated dataset (https://huggingface.co/docs/hub/datasets-gated#manual-approval), you would avoid giving public access to the data.

enoriega · 2023-09-28T16:40:37Z

+1 to enable dataset viewer on private datasets

eduardm · 2023-11-08T13:21:00Z

+1. As an enterprise user, I don't want to share the dataset publicly, but not having a preview is really frustrating and makes collaboration difficult. Currently, we are creating a tiny dataset (10 rows) and making that public just to have a preview and understand what is what.

It's not compatible with the cache for now. See huggingface/dataset-viewer#39 to restablish the feature.

andreemic · 2023-11-19T12:03:42Z

+1 for this!

The datasets filtering and sorting UI is very very barebones, so the viewer would help to navigate private datasets a lot

severo · 2023-12-18T22:29:27Z

Also, internal request: https://huggingface.slack.com/archives/C02EMARJ65P/p1702930353945389

qionghuang6 · 2024-01-18T19:32:54Z

+1 for viewing private datasets

julien-c · 2024-01-18T19:40:33Z

please continue +1'ing this issue (can be on the OP) so we get a sense for how many people/teams need this! 🙏

sutgeorge · 2024-01-19T11:22:13Z

+1

egrace479 · 2024-01-26T14:56:54Z

This would be great for development of datasets, especially since the documentation for how the viewer/preview works is quite limited and currently requires piecing together information from multiple pages on the docs.

While we're waiting on this to be developed (assuming it still will be), could the docs be updated with clearer requirements for the preview feature (i.e., more than just that the dataset needs to be public)? When I run the list parquet files query I'm able to see that the parquet has failed, but it's not clear why. Is there a way to get more information about what went wrong? For one of the datasets in question it is simply a collection of images in train and val folders that is simpler than a similar dataset for which the parquet conversion was successful and displays a preview.

severo · 2024-01-31T09:50:02Z

Done!

The private datasets are now supported in the datasets-server, enabling the dataset viewer + parquet conversion on the dataset pages.

Enregistrement.de.l.ecran.2024-01-31.a.10.39.59.mov

Note that it's a paid feature, available as of today for Pro users and Enterprise orgs. Please give us feedback when you try this new feature!

faustusdotbe · 2024-01-31T10:14:07Z

Merci Sylvain !

severo added the question Further information is requested label Sep 23, 2021

severo changed the title ~~Enable the private datasets~~ Enable the private datasets? Sep 23, 2021

severo mentioned this issue Sep 23, 2021

Cache the responses #3

Closed

13 tasks

severo added a commit that referenced this issue Oct 4, 2021

feat: 🎸 remove support for private datasets

b08e649

It's not compatible with the cache for now. See #39 to restablish the feature.

severo mentioned this issue Dec 21, 2021

Support gated datasets #74

Closed

severo mentioned this issue Jan 31, 2022

Add auth to the technical endpoints #95

Closed

severo added high-priority and removed question Further information is requested labels Jan 31, 2022

severo added enhancement and removed high-priority labels Jun 17, 2022

severo added this to the Private datasets milestone Jun 17, 2022

severo added the question Further information is requested label Jun 17, 2022

severo mentioned this issue Aug 2, 2022

Protect the gated datasets #429

Closed

severo added feature request Request for a new feature and removed enhancement question Further information is requested labels Sep 19, 2022

severo changed the title ~~Enable the private datasets?~~ Enable the private datasets Sep 20, 2022

severo mentioned this issue Sep 26, 2022

Dataset Viewer not showing Previews for Private Datasets huggingface/datasets#3604

Closed

severo mentioned this issue Dec 27, 2022

Improve dataset .skip() speed in streaming mode huggingface/datasets#5380

Open

severo added the P2 Nice to have label Jul 26, 2023

This was referenced Aug 9, 2023

Give a better error message for private datasets #1655

Closed

Offline dataset viewer huggingface/datasets#6139

Closed

mattstern31 added a commit to mattstern31/datasets-server-storage-admin that referenced this issue Nov 11, 2023

feat: 🎸 remove support for private datasets

fedf367

It's not compatible with the cache for now. See huggingface/dataset-viewer#39 to restablish the feature.

severo added P1 Not as needed as P0, but still important/wanted and removed P2 Nice to have labels Nov 14, 2023

severo mentioned this issue Dec 14, 2023

Add a collection with datasets infos #2208

Closed

severo closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the private datasets #39

Enable the private datasets #39

severo commented Sep 23, 2021 •

edited

severo commented Oct 4, 2021

severo commented Jan 20, 2022

severo commented Jan 31, 2022

severo commented Mar 16, 2022

severo commented Mar 24, 2022

rajshah4 commented Jun 14, 2022

severo commented Jun 14, 2022 •

edited

severo commented Jul 19, 2022

severo commented Sep 15, 2022 •

edited

severo commented Nov 25, 2022

severo commented Jan 5, 2023

faustusdotbe commented Jun 27, 2023 •

edited

alexgshaw commented Jul 4, 2023

DavidFarago commented Jul 18, 2023

arikanev commented Jul 20, 2023

TheAnimeGuru commented Jul 21, 2023

nealchandra commented Jul 31, 2023

severo commented Jul 31, 2023

Stillerman commented Aug 2, 2023

prassanna-ravishankar commented Sep 7, 2023

severo commented Sep 7, 2023

DavidFarago commented Sep 7, 2023

severo commented Sep 8, 2023

enoriega commented Sep 28, 2023

eduardm commented Nov 8, 2023

andreemic commented Nov 19, 2023

severo commented Dec 18, 2023

qionghuang6 commented Jan 18, 2024

julien-c commented Jan 18, 2024

sutgeorge commented Jan 19, 2024

egrace479 commented Jan 26, 2024

severo commented Jan 31, 2024

faustusdotbe commented Jan 31, 2024

Enable the private datasets #39

Enable the private datasets #39

Comments

severo commented Sep 23, 2021 • edited

severo commented Oct 4, 2021

severo commented Jan 20, 2022

severo commented Jan 31, 2022

severo commented Mar 16, 2022

severo commented Mar 24, 2022

rajshah4 commented Jun 14, 2022

severo commented Jun 14, 2022 • edited

severo commented Jul 19, 2022

severo commented Sep 15, 2022 • edited

severo commented Nov 25, 2022

severo commented Jan 5, 2023

faustusdotbe commented Jun 27, 2023 • edited

alexgshaw commented Jul 4, 2023

DavidFarago commented Jul 18, 2023

arikanev commented Jul 20, 2023

TheAnimeGuru commented Jul 21, 2023

nealchandra commented Jul 31, 2023

severo commented Jul 31, 2023

Stillerman commented Aug 2, 2023

prassanna-ravishankar commented Sep 7, 2023

severo commented Sep 7, 2023

DavidFarago commented Sep 7, 2023

severo commented Sep 8, 2023

enoriega commented Sep 28, 2023

eduardm commented Nov 8, 2023

andreemic commented Nov 19, 2023

severo commented Dec 18, 2023

qionghuang6 commented Jan 18, 2024

julien-c commented Jan 18, 2024

sutgeorge commented Jan 19, 2024

egrace479 commented Jan 26, 2024

severo commented Jan 31, 2024

faustusdotbe commented Jan 31, 2024

severo commented Sep 23, 2021 •

edited

severo commented Jun 14, 2022 •

edited

severo commented Sep 15, 2022 •

edited

faustusdotbe commented Jun 27, 2023 •

edited