Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Can/should we be able to import a dataspiced dataset from the web/elsewhere? #57

Open
amoeba opened this issue May 30, 2018 · 3 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@amoeba
Copy link
Collaborator

amoeba commented May 30, 2018

This comes from a good question in my dataspice demo today: If user X authors a dataspice page for their dataset, and another scientist, Y, wants to use it, it'd be cool if they just ran:

import_spice("https://amoeba.github.io/some-dataset")

And their computer downloaded something like some-dataset.zip which had the dataspice.json and the files described in access.csv attached to it somehow.

@amoeba amoeba added the help wanted Extra attention is needed label May 30, 2018
@khondula
Copy link
Contributor

That seems cool! Would that depend on the reliability/persistence of 'contentUrl' or 'contentUrl' + 'fileName'?

Would there maybe be a way to generate a .bib as well, to suggest a citation?

@cboettig
Copy link
Member

👏

I think it might be potentially more robust to have a function that just extracts the metadata and returns an R object which contains the download urls? e.g. something like

x <- import_spice()
read_csv(x$files[[1]]

(Some examples of schema.org Dataset contentUrls do not contain direct links to download a data file, but rather a web page that has links).

Could potentially make this behavior part of a read_spice() function; i.e. read_spice could work locally on a dataspice.json object or could extract dataspice.json from HTML content on the web.

An R object could also contain the citation (perhaps as an R bibitem object, which R can already turn into either bibtex or text-based citation). i.e. simply x$citation; or we could have a methods-y interface like citation(x)

@amoeba
Copy link
Collaborator Author

amoeba commented May 30, 2018

@khondula wrote:

That seems cool! Would that depend on the reliability/persistence of 'contentUrl' or 'contentUrl' + 'fileName'?

Yes, I see it as a huge need to resolve this stuff soon. @cboettig 's idea below helps alleviate that (don't fetch the data at first, just metadata) then give the user a way to fetch some or all of it.

returns an R object

Ooh nice! More robust yes.

Could potentially make this behavior part of a read_spice() function; i.e. read_spice could work locally on a dataspice.json object or could extract dataspice.json from HTML content on the web.

👍 and 👍 on all those ideas @cboettig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants