Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to make contentURL in access.csv be automatic #47

Open
amoeba opened this issue May 25, 2018 · 2 comments
Open

Find a way to make contentURL in access.csv be automatic #47

amoeba opened this issue May 25, 2018 · 2 comments
Milestone

Comments

@amoeba
Copy link
Collaborator

amoeba commented May 25, 2018

I'm not entirely sure this is trivial but I hope it is:

When access.csv gets filled in automatically, the contentURL field is blank. The actual URL to the file would follow the GitHub convention for serving raw files over HTTP:

https://raw.githubusercontent.com/{user|or}/{repo}/{branch}/path/to/file.ext

I think we know this or can find this out before we create the HTML. Take a shot at it and report back! Perhaps the git2r does this in a nice way.

@cboettig
Copy link
Member

Perhaps not much of a concern for us at this time, but this does assume that the data file is committed to git, which is often not ideal and will fail for data > 50Mb.

I've been exploring various ways around this (e.g. see unconf issue ropensci/unconf18#51). My current strategy is to take a clue from Rich FitzJohn and upload the data as assets attached to a release. piggyback will let you do something like:

library(piggyback)
pb_upload("user/repo", tag= "data", file = "mydata.csv.gz")
url <- pb_download_url("user/repo", tag= "data", file = "mydata.csv.gz")

to construct a download url for the asset.

This should work for any individual data files up to 2GB in size. I know this mid-size range of > 50mb but < 2 GB isn't huge, so may not be particularly useful for most people, but it is an easy way to avoid cluttering up a git repo.

Of course ideally the data would eventually be uploaded to a DOI-providing repository and contentURL would be amended to that anyway.

@amoeba
Copy link
Collaborator Author

amoeba commented May 29, 2018

Whoa, piggyback is cool. This looks like a nice way to get a file shareable fast considering the user already has a repo on GitHub. Does Zenono archive release assets like the ones piggyback creates?

Part of me likes the idea of scoping this package to "data checked into git" but that might be just to simplify things for me rather than a user.

Stepping back, I had thought about how we'd support users making use of non-local files in their scripts. There's nothing in our metadata generation process that prohibits a user from filling in more rows in the access.csv but it'd be nice to automate this. Can you think of any other patterns we could leverage to automatically fill in rows in access.csv (and attributes.csv too for that matter) when the user wants to document more than files checked in under ./data?

@amoeba amoeba added this to the v1.1 milestone Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants