Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3-credentials get-objects command #78

Closed
simonw opened this issue Sep 14, 2022 · 7 comments
Closed

s3-credentials get-objects command #78

simonw opened this issue Sep 14, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Sep 14, 2022

I find myself needing to download all of the objects in an S3 bucket that match a specific path pattern.

Related:

@simonw simonw added the enhancement New feature or request label Sep 14, 2022
@simonw
Copy link
Owner Author

simonw commented Sep 14, 2022

Initial design (help first development):

Usage: s3-credentials get-objects [OPTIONS] BUCKET [KEYS]...

  Download multiple objects from an S3 bucket

  To download everything, run:

      s3-credentials get-objects my-bucket

  Files will be saved to a directory called my-bucket. Use -o dirname to save
  to a different directory.

  To download specific keys, list them:

      s3-credentials get-objects my-bucket one.txt path/two.txt

  To download files matching a glob-style pattern, use:

      s3-credentials get-objects my-bucket --pattern '*/*.js'

Options:
  -o, --output DIRECTORY  Write to this directory instead of one matching the
                          bucket name
  -p, --pattern TEXT      Glob patterns for files to download, e.g. '*/*.js'
  --access-key TEXT       AWS access key ID
  --secret-key TEXT       AWS secret access key
  --session-token TEXT    AWS session token
  --endpoint-url TEXT     Custom endpoint URL
  -a, --auth FILENAME     Path to JSON/INI file containing credentials
  --help                  Show this message and exit.

@simonw
Copy link
Owner Author

simonw commented Sep 15, 2022

I'm going to introduce moto to help test this - I used it in https://github.com/simonw/s3-ocr/blob/0.6.3/tests/conftest.py and it worked really well.

It's going to be a bit confusing having some tests that use moto and others that use botocore.stub but I think it's going to be worthwhile for the productivity boost on implementing this.

@simonw
Copy link
Owner Author

simonw commented Sep 15, 2022

Got this working. Could do with a progress bar of some sort.

simonw added a commit that referenced this issue Sep 15, 2022
@simonw
Copy link
Owner Author

simonw commented Sep 15, 2022

The trick with progress bars is that I know the size of the keys I am going to download in the case where I fetched a list of keys first, but I don't know the size of the keys in the case where the user specified them on the command-line.

I could run some HEAD requests first for those I guess?

Need to support -s / --silent for hiding the progress bar, for consistency with https://s3-credentials.readthedocs.io/en/stable/other-commands.html#put-object

simonw added a commit that referenced this issue Sep 15, 2022
@simonw
Copy link
Owner Author

simonw commented Sep 15, 2022

Demo:

% s3-credentials get-objects static.niche-museums.com -o out -p '*gas*'
Downloading 4.3 MB (1 file)  [####################################]  100%
% s3-credentials get-objects static.niche-museums.com -o out -p '*big*'
Downloading 6.6 MB (4 files)  [####################################]  100%          

@simonw
Copy link
Owner Author

simonw commented Sep 15, 2022

Idea:

  • --skip to skip downloading a file if it already exists with the same filename
  • --skip-hash to skip downloading a file if it already exists AND the MD5 hash has not changed (more expensive as needs to calculate the local hash)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant