Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage CLI #121

Merged
merged 12 commits into from
Dec 23, 2021
Merged

Storage CLI #121

merged 12 commits into from
Dec 23, 2021

Conversation

michaelzhiluo
Copy link
Collaborator

Adds Storage CLI.

  • Allows for users to override desired location for storage (s3, gcs, azure_blob)
  • Supports multiple storages and storage mounts

Example: (also in examples/resnet_app_storage.yaml)

storage:
  - name: imagenet-bucket
    source: s3://imagenet-bucket # Can also be local path
    # Uncommenting this will force Imagenet transfer to GCS bucket
    #force_stores: [gcs] # Could be [s3, gcs], [s3] default: None
    persistent: True

storage_mounts:
  - storage: imagenet-bucket
    mount_path: /tmp/imagenet 

@concretevitamin
Copy link
Collaborator

@infwinston @Michaelvll would you guys look over the API and see if it's natural for your use case?

@concretevitamin concretevitamin requested review from infwinston and Michaelvll and removed request for concretevitamin December 23, 2021 00:55
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I addressed some nits and added some documentation to the yaml in new commits, please feel free to review those before merging

Comment on lines +200 to +202
if cloud_type in self.stores:
logger.info(f'Storage type {cloud_type} already exists!')
return self.stores[cloud_type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts, no action needed:
This seems fine (one storage object per cloud), but I wonder if there are any use-cases where a single storage object can have multiples stores of the same cloud_type. Perhaps stores in different regions but on the same cloud? For now, this is okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the future the StorageEnum class will change to include both CLOUD_TYPE.REGION_NAME. Good point!

@@ -426,7 +424,7 @@ def _upload_file(self, local_file: str, remote_path: str) -> None:
remote_path: str; Remote path on GCS bucket
"""
blob = self.bucket.blob(remote_path)
blob.upload_from_filename(local_file)
blob.upload_from_filename(local_file, timeout=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are GCS-specific details (times out on uploading large files?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it broke upon uploading huge files (timeout=60s). This seemed to fix it

@michaelzhiluo
Copy link
Collaborator Author

@romilbhardwaj This LGTM. Thanks for the detailed comments in resnet_app_storage.py and the cleanup for task.py. I have no other problems with the code.

resnet-model-dir: 0.1,
}

# storage: List[sky.Storage]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infwinston @Michaelvll This should make storage clear for the GNN use case!!!

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@michaelzhiluo michaelzhiluo merged commit 7d37614 into master Dec 23, 2021
gmittal pushed a commit that referenced this pull request Mar 15, 2022
* Storage CLI Init Commit

* Bug found #1

* Deep Copy Fix + other bugs

* nits, docs

* lint

* Add docs for storage.name

* doc fixes

* doc fixes

* doc fixes

* fix docs for aws sync --delete flag

* Fix

Co-authored-by: Romil <romil.bhardwaj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants