New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add common data sources #2
Comments
I think the github repos will be fairly easy to scrape. The others may actually be really easy because they might use a lot of modern HTML practices like filling in #ids. |
OAI-PMH is a standard API that is exposed by Figshare and DataDryad and probably many others. |
Dataverse supports OAI-PMH. You can find a list of OAI sets by installation at https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/edit?usp=sharing |
@pdurbin thanks. You can see the current generator prototype (with reference outputs) I have for the UCI ML repo at https://github.com/oxinabox/DataDepsGenerators.jl/pull/1/files |
@oxinabox I'm sorry but I'm not familiar enough with OAI-PMH to know the answer. Someone on the dataverse-community mailing list might, and you've be welcome to start a thread about this: https://groups.google.com/forum/#!forum/dataverse-community |
@oxinabox if that was you over at http://irclog.iq.harvard.edu/dataverse/2018-01-12 I'm sorry I missed you. Yes, you can think of SWORD as being for uploads and OAI-PMH as being for downloading metadata (but not files, generally speaking). |
indeed it was me. I'm thinking about this a bit more again. |
Right, from DDI you can get names of files and such. For tabular files, you can even get summary stats on variables (columns), like this example from https://dataverse.harvard.edu/api/datasets/export?exporter=ddi&persistentId=doi:10.7910/DVN/TJCLKP
I'm not really an expert on all this, but again if you email https://groups.google.com/forum/#!forum/dataverse-community someone with more information could weigh in. |
The DataOne api is really nice: Looks like it would add a fair few sites, https://www.dataone.org/current-member-nodes#uploads The way to do this would be to implement an abstract dispatch type |
If I may i would like to point to EDMOND the open data repository of the Max-Planck society |
@BeastyBlacksmith I am not actively adding new data sources at the moment. You also might want to raise an issue with the EDMOND team to follow the google/schema.org guidelines for including JSON-LD structure data fragments on the pages. https://developers.google.com/search/docs/data-types/dataset |
@pdurbin it is kinda stalled, since that GSOC project is over. But feel free to. |
The text was updated successfully, but these errors were encountered: