-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Error Handle Attempts to Mount Non-sky Managed Bucket without Source #2804
[Storage] Error Handle Attempts to Mount Non-sky Managed Bucket without Source #2804
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR @landscapepainter! Can we list out all the syntax allowed for a storage object in the docstr of the module? It has been quite complicated for disallowing some of the combination for the fields.
I think one decision we made before was that we allow the following syntax:
The following should be allowed for both sky managed and external buckets.
/destination:
source: s3://xxx
/destination:
name: store-name
stores: s3
The only one we should raise error for is the following:
/destination:
name: store-name
# without stores
mode: COPY
cc'ing @romilbhardwaj @concretevitamin for confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @landscapepainter - left some minor comments. Please also make sure all spot smoke tests pass since the spot controller relies on storage quite a bit for file_mounts translation.
Regarding allowing this (where store-name
is an externally created bucket):
/destination:
name: store-name
stores: s3
@Michaelvll @concretevitamin - I thought about this and concluded that this should not be allowed. Allowing this would make intent ambiguous - it's unclear if the user wants to:
a) create store-name
from scratch or
b) mount s3://store-name
, where store-name
is an existing bucket.
This ambiguity is dangerous when the user:
a) Wants to create a bucket but it already exists -> SkyPilot will silently mount an existing bucket, which may potentially be a public bucket. Instead it should raise an error saying it already exists (which it does today)
b) Wants to use an existing bucket but has a typo (e.g., name: storename
instead of name: store-name
) -> SkyPilot will silently create a new empty bucket for the user and the user's task will error if it tries to read some files it expects to be there.
As you can see, the root cause is stemming from our attempt at having a "get or create" interface through file_mounts
, which introduces this ambiguity unless explicitly handled by our code, such as in this PR.
Our early Sky Storage proposals took inspiration from Kubernetes and used two separate fields in the YAML - one to declare any new Storages, and other to attach storage to task/cluster. This gets rid of ambiguity, but was harder to use. Thus this is the tradeoff we chose, and we minimize ambiguity here by clarifying intent when required.
sky/data/storage.py
Outdated
@@ -321,6 +321,28 @@ def __deepcopy__(self, memo): | |||
# original Store object is returned | |||
return self | |||
|
|||
def _validate_existing_bucket(self): | |||
"""Validates the storage fields for existing buckets.""" | |||
# Check if 'source' is None, indicating Storage is in MOUNT mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary if 'source' is None implies storage is in MOUNT mode? For example, couldn't it be a sky-managed storage object specified as:
file_mounts:
name: myexistingstorage
mode: COPY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romilbhardwaj This is one of the cases we don't allow the users. We don't allow users to only provide the name
of the bucket without source
for COPY
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for another look!
Regarding to listing allowed syntax for storage, I'm wondering if it'd be nice to have it in our readthedoc instead of the docstr of Storage
class so the users can easily access to use them as a reference as well? @Michaelvll @romilbhardwaj @concretevitamin
From offline discussion with @romilbhardwaj, it was discussed the following tasks should be done:
|
Writing this for future reference on why the handling for this edge case cannot be done in We allow to create new To make sure we don't block users from creating new |
@romilbhardwaj This is ready for another look! |
Any update for this? |
@Michaelvll This PR is waiting for the Storage page from readthedoc to be reformatted. @romilbhardwaj is working on it. |
Blocked on #3123 |
…o external-bucket-mount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @landscapepainter! This works nicely - tested with manual tests. Will approve and merge once #3162 is merged.
Edit - smoke tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran smoke tests again, this is good to go. Thanks @landscapepainter!
This closes #2779, #2950
Storage
is designed in a way to not allow externally created cloud storages to be mounted without specifying thesource
field. The only way to mount non-sky managed bucket is to specify the bucket's URI as thesource
field. This fix raisesStorageSpecError
when a non-sky managed bucket is attempted to be mounted just withname
field and without specifying thesource
field.Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py::TestStorageWithCredentials
MOUNT
withname
specified andsource
unspecified. Should raise errorMOUNT
withsource
specified andname
unspecified. Should go throughpytest tests/test_smoke.py::test_gcp_storage_mounts_with_stop --gcp
pytest tests/test_smoke.py::test_aws_storage_mounts_with_stop --aws