Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI Generation for Artifacts #25

Closed
cakiki opened this issue Jul 3, 2021 · 18 comments
Closed

DOI Generation for Artifacts #25

cakiki opened this issue Jul 3, 2021 · 18 comments

Comments

@cakiki
Copy link
Contributor

cakiki commented Jul 3, 2021

It would be useful to be able to generate a Digital Object Identifier to artifacts living on the Hub.

It would let people cite a specific dataset or a specific model, and make their own datasets and models citable. Some venues require DOIs for digital resources and it would be nice to not have to use 3rd parties for that.

Kaggle, for instance, currently has that feature for public datasets: https://www.kaggle.com/product-feedback/108594

Drawback: It costs money, because one has to go through approved agencies (e.g.: https://datacite.org/feemodel.html)

Automation would probably be straightforward: Creating DOIs with the Datacite REST API

@julien-c
Copy link
Member

julien-c commented Jul 4, 2021

Yes I think we should do that.

I've reached out to Datacite and a few other potential registrants – I will follow up here when I know more.

@julien-c
Copy link
Member

julien-c commented Jul 4, 2021

(also gotta say that I love your README.md on your github @cakiki, and I am absolutely going to rip it off 😂)

@cakiki
Copy link
Contributor Author

cakiki commented Jul 4, 2021

You are more than welcome to it! 😃
You could also experiment with nicer layouts (https://rich.readthedocs.io/en/latest/layout.html#creating-layouts)
(@willmcgugan did most of the heavy lifting on this)

@LysandreJik LysandreJik transferred this issue from huggingface/huggingface_hub Mar 16, 2022
@osanseviero osanseviero changed the title 🚀 Feature Request: DOI Generation for Artifacts DOI Generation for Artifacts Mar 17, 2022
@julien-c
Copy link
Member

BTW (because this subject came up again recently), i remember trying to ping some DOI registrants last year and it looked very bureaucratically complex, TBH

Maybe at some point someone wants to give this another shot, but be aware this will probably be a long endeavour :)

@davanstrien
Copy link
Member

BTW (because this subject came up again recently), i remember trying to ping some DOI registrants last year and it looked very bureaucratically complex, TBH

Maybe at some point someone wants to give this another shot, but be aware this will probably be a long endeavour :)

One possible stop-gap solution (mainly applicable to models) could be to create a GitHub Action that in response to a webhook (or on a schedule), downloads a snapshot of a repository and pushes it to a Zenodo repository via their API.

This would require manual setup for those wanting to use it but would be a route to getting a citable and versioned DOI for their model and has the added benefit of creating a 'preservation' copy of the model. This extra copy is also quite desirable for some communities.

This would be a little bit hands-on for people to set up but could give a sense of how many people want this kind of feature. Full integration between Zenodo like the one with GitHub would be great, but I think this would be much more involved to establish.

I have it on my todo list to set up something similar to push models from https://huggingface.co/BritishLibraryLabs to https://bl.iro.bl.uk/. I will ping this thread when I get around to that in case it is helpful for other people.

@cakiki
Copy link
Contributor Author

cakiki commented Apr 25, 2022

Fantastic idea @davanstrien ! It would still require some organization and negotiation (zenodo limits uploads to 50GB I believe), but substantially less bureaucratic hassle I'm sure than dealing with registrars.

@julien-c
Copy link
Member

maybe we could try again to see how to provide DOIs directly now. (I feel like syncing to Zenodo is a bit less elegant and depending on dataset size not sure how well it will work)

Will try to sync up with the https://datacite.org/ team in the coming weeks

@julien-c
Copy link
Member

@cakiki would you want to participate to a call with them?

@cakiki
Copy link
Contributor Author

cakiki commented Aug 10, 2022

@julien-c I would love to; thank you for including me!

Another thing to try would be to reach out to Kaggle and ask about their experience. They also had it requested by users before they added the DOI feature to datasets.

@yoshitomo-matsubara
Copy link

yoshitomo-matsubara commented Aug 10, 2022

@julien-c @cakiki
I'm here just to say thank you all (including those involving this thread) for revisiting this issue
Providing DOIs directly (upon request? like Kaggle does) sounds like a great idea!

@julien-c
Copy link
Member

Ok we're now officially working on this 🔥

No ETA yet but it seems to be not-super-hard to do =)

On a related subject we can embed a "How to cite" button to those model or dataset repos where the authors will have generated a DOI. Supporting the full https://citation-file-format.github.io/about/ citation file format is maybe a bit overkill for now, so I was thinking to maybe just generate BibTeX snippet for the repos that have a DOI. Any other citation format we should support? WDYT?

@BramVanroy
Copy link
Contributor

This is awesome! Very glad to hear it. Will it take commit/tag into consideration, or just one DOI per repo for now? (Which is also already awesome!)

Bibtex should be a good start (there are plenty of online bibtext-to-X converters). However, I do notice that Github seems to provide both the Bibtex and APA. E.g., on transformers on the sidebar, when you click on "Cite this repository":

cite this repo

This is also described in the Github docs.

@cakiki
Copy link
Contributor Author

cakiki commented Aug 24, 2022

+1 for bibtex! as @BramVanroy said, one can convert bibtex to just about anything that people care about.

I personally quite like the citation UX of both the ACL Anthology and that of Semantic Scholar.

@yoshitomo-matsubara
Copy link

I'm so glad to hear the update! 🙏
+1 for bibtex and +1 for citation UX like those @cakiki mentioned :)

@Ahleroy
Copy link

Ahleroy commented Aug 24, 2022

+1 for Bibtex.
Zbib.org is a great example of easy-to-use ref export. They offer Copy / Download buttons and RIS / Bibtex exports.
https://zbib.org/

@julien-c I would be happy to have a chat to discuss how I can contribute to this feature.

Alix

@julien-c
Copy link
Member

julien-c commented Oct 7, 2022

Hi everyone!! We just launched DOI generation on the HuggingFace Hub, thanks to DataCite, @Kakulukian @sashavor @cakiki and @Ahleroy 🔥

Here's a blogpost about the feature: https://huggingface.co/blog/introducing-doi

Please try assigning DOIs to some of your repo(s) on the Hub and post any feedback here! Thank you 🤗

@yoshitomo-matsubara
Copy link

It's great news!
Many thanks to the team for making this happen!

@cakiki
Copy link
Contributor Author

cakiki commented Oct 7, 2022

Thank you everyone; really happy this is now a feature! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants