Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast removal of S3 Storage buckets with 10k-1 million objects #893

Merged
merged 552 commits into from
Jun 6, 2022

Conversation

michaelzhiluo
Copy link
Collaborator

@michaelzhiluo michaelzhiluo commented Jun 2, 2022

Fixes #434.

Test

touch ~/Downloads/hallo/{00001..10000}.c

Sky YAML

file_mounts:
  /checkpoints:
    name: michaels-new-bucket-1
    source: ~/Downloads/hallo
    store: s3
    mode: MOUNT

sky launch -c hello storage.yaml , sky storage delete michaels-new-bucket-1

Before: 3-4 min
After: 10-20(s)

Michaelvll and others added 30 commits February 17, 2022 23:39
* Update the installation and tutorial

* fix

* Add description in README

* fix

* Add link to examples

* Polish readme

* Update readme

* readme

* Fix comment
* gather adaptors

* give instructions
* Limit grpcio version

* rename disk_size

* format
* restore README to the original place
* Support 'name:cnt' accelerators spec in YAML

* Fixes #373: 'sky start/down' should error out
Reasons:
- `K80:4` is not available on AWS, arguably the most common cloud our target users use (so they will hit resource unavailable)
- `V100:1` is available on all three clouds and is a popular GPU
* Ok

* Addressed all comments

* Changed to new git link

* ok
* Fix check_local_gpus

* Break a line to meet 80 char constraint

* Address the review comments
* Initial Draft

* Delete bad file

* ?

* ???

* format

* sky storage status; sky storage down

* Fixed

* Done

* Addressed Comments

* Addressed Comments, TODO: Documentation

* Documentation added

* ok
* Initial Draft

* Delete bad file

* ?

* ???

* format

* sky storage status; sky storage down

* Fixed

* Done

* Addressed Comments

* Addressed Comments, TODO: Documentation

* Documentation added

* ok

* Addressed Zhanghao's comments

* Fix

* SGTM
* wip: Add setup in provision pipeline

* Fix gcp/azure

* remove useless variables

* minor fix

* Add some TODOs

* Fix comments

* Fix comments

* Fix gcp/azure initialization_commands

* Remove setup from template

* Fix setup directory

* Change rsync back to -Pavz

* Remove unused argument

* fix file_mount/dir_mount
* WIP Debug

* revert file_mounts using storage mounts and update docs

* remove print

* Fix credential cmd

* lint

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
… status` (#427)

* A quick fix in killing sky docker containers

* Add a comment
…422)

* Add conda activate support to bashrc

* Add doc and make sure conda activate works

* bring back conda activate command for GCP

* Move comment to quickstart

* format

* Fix comments

* Add test/example of using user_script

* Fix indents

* bash -i only for conda activate

* Fix the SKY_NODE_IPS fail to pass to the shell script

* Update readme

* update env_check

* Fix comments

* Change to head -n1
* Remove -i option

* Fix docs

* Fix run.sh

* add comments

* fix comment

* format
* Fix az key generation

* Add private keys to retval
* ok

* Shorten YAML

* Ok

* Done

* Nit added

* Romil's changes
…/...` not `~/sky_workdir/workdir/...`) (#443)

* Fix

* Resnet Example working

* Fix
* Fix gitignore in file_mounts

* Add test

* Update j2 for filter

* format
* Remove tracebacks

* Fix job fail color

* fix comments

* Hide tracebacks

* Fix #442

* fix `workdir` becomes `~/sky_workdir/workdir` #442

* add logging error for job_id problem

* format

* update error message for retry

* Update docs

* Fix login

* Add more checks

* format

* fix return type

* format

* refactor returncode handling

* Update return handling

* Fix filemount testing
* Switch to ray job logs

* Optimize log tailing

* format

* Fix exec logging

* Add comment

* Move back to run_with_log

* Bring back our tailing function for progress bar

* format

* Fix comments

* Remove check argument from run_with_log

* Add comment

* Add comment

* lint
concretevitamin and others added 10 commits June 1, 2022 11:04
#862)

* Distinguish controller failure and user failure

* Add hints for getting error messages

* Fix

* update message

* rename to cluster failure

* message for cluster failed as well

* Fix failing

* address comments

* Add id for end of logs

* Split resource failure and controller failure

* Fix terminal state

* Address comments

* fix typo
* Add some docs

* update

* fix

* fix

* update

* address comments

* reorg

* reorg and add fig

* Add imgs

* fix

* update
* Distinguish controller failure and user failure

* Add hints for getting error messages

* Fix

* update message

* rename to cluster failure

* message for cluster failed as well

* Fix failing

* Add pending state for spot jobs

* Fix job id

* format

* address comments

* Add id for end of logs

* fix pending

* Add name and resources

* format

* Add failed status check for spot state

* Refactor the backend interface

* address comments

* fix status

* address comment

* Fix comment
@michaelzhiluo michaelzhiluo changed the title Fast removal of Storage buckets with 10k-1 million objects Fast removal of S3 Storage buckets with 10k-1 million objects Jun 2, 2022
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for finding this fix and implementing it! Some comments, similar to #888.

sky/data/storage.py Outdated Show resolved Hide resolved
sky/data/storage.py Outdated Show resolved Hide resolved
sky/data/storage.py Outdated Show resolved Hide resolved
sky/exceptions.py Outdated Show resolved Hide resolved
@michaelzhiluo
Copy link
Collaborator Author

Thanks for the review! @romilbhardwaj If you have time, PTAL

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this! Two small typos in comments

sky/data/storage.py Outdated Show resolved Hide resolved
sky/data/storage.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Storage] sky storage delete for Public Buckets
9 participants