-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOI Generation for Artifacts #25
Comments
Yes I think we should do that. I've reached out to Datacite and a few other potential registrants – I will follow up here when I know more. |
(also gotta say that I love your |
You are more than welcome to it! 😃 |
BTW (because this subject came up again recently), i remember trying to ping some DOI registrants last year and it looked very bureaucratically complex, TBH Maybe at some point someone wants to give this another shot, but be aware this will probably be a long endeavour :) |
One possible stop-gap solution (mainly applicable to models) could be to create a GitHub Action that in response to a webhook (or on a schedule), downloads a snapshot of a repository and pushes it to a Zenodo repository via their API. This would require manual setup for those wanting to use it but would be a route to getting a citable and versioned DOI for their model and has the added benefit of creating a 'preservation' copy of the model. This extra copy is also quite desirable for some communities. This would be a little bit hands-on for people to set up but could give a sense of how many people want this kind of feature. Full integration between Zenodo like the one with GitHub would be great, but I think this would be much more involved to establish. I have it on my todo list to set up something similar to push models from https://huggingface.co/BritishLibraryLabs to https://bl.iro.bl.uk/. I will ping this thread when I get around to that in case it is helpful for other people. |
Fantastic idea @davanstrien ! It would still require some organization and negotiation (zenodo limits uploads to 50GB I believe), but substantially less bureaucratic hassle I'm sure than dealing with registrars. |
maybe we could try again to see how to provide DOIs directly now. (I feel like syncing to Zenodo is a bit less elegant and depending on dataset size not sure how well it will work) Will try to sync up with the https://datacite.org/ team in the coming weeks |
@cakiki would you want to participate to a call with them? |
@julien-c I would love to; thank you for including me! Another thing to try would be to reach out to Kaggle and ask about their experience. They also had it requested by users before they added the DOI feature to datasets. |
@julien-c @cakiki |
Ok we're now officially working on this 🔥 No ETA yet but it seems to be not-super-hard to do =) On a related subject we can embed a "How to cite" button to those model or dataset repos where the authors will have generated a DOI. Supporting the full https://citation-file-format.github.io/about/ citation file format is maybe a bit overkill for now, so I was thinking to maybe just generate BibTeX snippet for the repos that have a DOI. Any other citation format we should support? WDYT? |
This is awesome! Very glad to hear it. Will it take commit/tag into consideration, or just one DOI per repo for now? (Which is also already awesome!) Bibtex should be a good start (there are plenty of online bibtext-to-X converters). However, I do notice that Github seems to provide both the Bibtex and APA. E.g., on transformers on the sidebar, when you click on "Cite this repository": This is also described in the Github docs. |
+1 for bibtex! as @BramVanroy said, one can convert bibtex to just about anything that people care about. I personally quite like the citation UX of both the ACL Anthology and that of Semantic Scholar. |
I'm so glad to hear the update! 🙏 |
+1 for Bibtex. @julien-c I would be happy to have a chat to discuss how I can contribute to this feature. Alix |
Hi everyone!! We just launched DOI generation on the HuggingFace Hub, thanks to DataCite, @Kakulukian @sashavor @cakiki and @Ahleroy 🔥 Here's a blogpost about the feature: https://huggingface.co/blog/introducing-doiPlease try assigning DOIs to some of your repo(s) on the Hub and post any feedback here! Thank you 🤗 |
It's great news! |
Thank you everyone; really happy this is now a feature! 🤗 |
It would be useful to be able to generate a Digital Object Identifier to artifacts living on the Hub.
It would let people cite a specific dataset or a specific model, and make their own datasets and models citable. Some venues require DOIs for digital resources and it would be nice to not have to use 3rd parties for that.
Kaggle, for instance, currently has that feature for public datasets: https://www.kaggle.com/product-feedback/108594
Drawback: It costs money, because one has to go through approved agencies (e.g.: https://datacite.org/feemodel.html)
Automation would probably be straightforward: Creating DOIs with the Datacite REST API
The text was updated successfully, but these errors were encountered: