Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure AWS S3 backend #5

Closed
4 tasks done
wlandau opened this issue Nov 19, 2021 · 6 comments
Closed
4 tasks done

Pure AWS S3 backend #5

wlandau opened this issue Nov 19, 2021 · 6 comments
Assignees

Comments

@wlandau
Copy link
Collaborator

wlandau commented Nov 19, 2021

Prework

  • Read and agree to the Contributor Code of Conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
  • Format your code according to the tidyverse style guide.

Proposal

Similar to #2, but directly implemented on top of AWS S3 through something like aws.s3, paws, or botor. Use the historical versioning and tagging capabilities of buckets.

@wlandau wlandau self-assigned this Nov 19, 2021
@wlandau
Copy link
Collaborator Author

wlandau commented Nov 19, 2021

Probably precedes #2. May actually want to look at DVC first. It may already do a lot of the stuff I mention below.

@wlandau
Copy link
Collaborator Author

wlandau commented Nov 19, 2021

Setback: an S3 object can only have up to 10 tags. Poses a problem if a target is part of more than 10 snapshots, which is likely to come up for almost all projects.

@wlandau
Copy link
Collaborator Author

wlandau commented Nov 19, 2021

Another idea: the metadata already has hashes, which is half the battle for a key-value store.

Snapshot

  1. Commit _targets/meta/meta to a local git repo. Do not commit _targets/objects.
  2. For each target in _targets/meta/meta, upload the file in _targets/objects an S3 bucket. In the bucket, the object name should be the hash recorded in _targets/meta/meta. If the object already exists in the bucket, skip the upload.

Checkout

  1. Check out the metadata file.
  2. For each target in the metadata, if the hash in _targets/meta/meta disagrees with the actual hash of the file, attempt to find the correct hash in the bucket and download the object to _targets/objects/.

Hopefully (2) will be possible without cloning a bunch of infrastructure from targets.

Status

Git status of _targets/meta/meta + checking the hashes of _targets/meta/meta vs _targets/objects files vs the bucket.

@wlandau
Copy link
Collaborator Author

wlandau commented Nov 27, 2021

Closing in favor of ropensci/targets#711

@wlandau
Copy link
Collaborator Author

wlandau commented Nov 30, 2021

Reopening. Relative to native AWS versioning in targets, an AWS gittargets backend would allow less frequent uploads and allow users to opt in later in the project’s life cycle.

@wlandau wlandau reopened this Nov 30, 2021
@wlandau
Copy link
Collaborator Author

wlandau commented Sep 8, 2023

On reflection: if you're already using AWS S3, then https://books.ropensci.org/targets/cloud-storage.html is way better.

@wlandau wlandau closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant