Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Error Handle Attempts to Mount Non-sky Managed Bucket without Source #2804

Merged

Conversation

landscapepainter
Copy link
Collaborator

@landscapepainter landscapepainter commented Nov 19, 2023

This closes #2779, #2950

Storage is designed in a way to not allow externally created cloud storages to be mounted without specifying the source field. The only way to mount non-sky managed bucket is to specify the bucket's URI as the source field. This fix raises StorageSpecError when a non-sky managed bucket is attempted to be mounted just with name field and without specifying the source field.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • All smoke tests: pytest tests/test_smoke.py::TestStorageWithCredentials
  • Manually test: Create storage externally and MOUNT with name specified and source unspecified. Should raise error
  • Manually test: Create storage externally and MOUNT with source specified and name unspecified. Should go through
  • pytest tests/test_smoke.py::test_gcp_storage_mounts_with_stop --gcp
  • pytest tests/test_smoke.py::test_aws_storage_mounts_with_stop --aws

@landscapepainter landscapepainter changed the title [Storage] Handle Attempts to Mount Non-sky Managed Bucket without Source [Storage] Error Handle Attempts to Mount Non-sky Managed Bucket without Source Nov 19, 2023
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR @landscapepainter! Can we list out all the syntax allowed for a storage object in the docstr of the module? It has been quite complicated for disallowing some of the combination for the fields.

I think one decision we made before was that we allow the following syntax:
The following should be allowed for both sky managed and external buckets.

/destination:
  source: s3://xxx
/destination:
  name: store-name
  stores: s3

The only one we should raise error for is the following:

/destination:
  name: store-name
  # without stores
  mode: COPY

cc'ing @romilbhardwaj @concretevitamin for confirmation.

sky/data/storage.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @landscapepainter - left some minor comments. Please also make sure all spot smoke tests pass since the spot controller relies on storage quite a bit for file_mounts translation.

Regarding allowing this (where store-name is an externally created bucket):

/destination:
  name: store-name
  stores: s3

@Michaelvll @concretevitamin - I thought about this and concluded that this should not be allowed. Allowing this would make intent ambiguous - it's unclear if the user wants to:

a) create store-name from scratch or
b) mount s3://store-name, where store-name is an existing bucket.

This ambiguity is dangerous when the user:

a) Wants to create a bucket but it already exists -> SkyPilot will silently mount an existing bucket, which may potentially be a public bucket. Instead it should raise an error saying it already exists (which it does today)
b) Wants to use an existing bucket but has a typo (e.g., name: storename instead of name: store-name) -> SkyPilot will silently create a new empty bucket for the user and the user's task will error if it tries to read some files it expects to be there.

As you can see, the root cause is stemming from our attempt at having a "get or create" interface through file_mounts, which introduces this ambiguity unless explicitly handled by our code, such as in this PR.

Our early Sky Storage proposals took inspiration from Kubernetes and used two separate fields in the YAML - one to declare any new Storages, and other to attach storage to task/cluster. This gets rid of ambiguity, but was harder to use. Thus this is the tradeoff we chose, and we minimize ambiguity here by clarifying intent when required.

sky/data/storage.py Outdated Show resolved Hide resolved
@@ -321,6 +321,28 @@ def __deepcopy__(self, memo):
# original Store object is returned
return self

def _validate_existing_bucket(self):
"""Validates the storage fields for existing buckets."""
# Check if 'source' is None, indicating Storage is in MOUNT mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary if 'source' is None implies storage is in MOUNT mode? For example, couldn't it be a sky-managed storage object specified as:

file_mounts: 
  name: myexistingstorage
  mode: COPY

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romilbhardwaj This is one of the cases we don't allow the users. We don't allow users to only provide the name of the bucket without source for COPY mode.

Copy link
Collaborator Author

@landscapepainter landscapepainter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for another look!

Regarding to listing allowed syntax for storage, I'm wondering if it'd be nice to have it in our readthedoc instead of the docstr of Storage class so the users can easily access to use them as a reference as well? @Michaelvll @romilbhardwaj @concretevitamin

@landscapepainter
Copy link
Collaborator Author

landscapepainter commented Dec 28, 2023

From offline discussion with @romilbhardwaj, it was discussed the following tasks should be done:

  1. Check if the following works for sky managed already existing storage:
file_mounts: 
  name: myexistingstorage
  mode: COPY
  1. Fix the following comment since the 'source' being None does not indicate MOUNT mode. It can also be the case with COPY mode for already existing sky managed storage.
    # Check if 'source' is None, indicating Storage is in MOUNT mode.
  2. Write smoke test to make sure this case works if it does not exist already.
    Update 1/7/24: Confirmed that there is already a test to check this: tmp_copy_mnt_existing_storage_obj and test_copy_mount_existing_storage. These test and fixture confirms that using COPY mode with only name field for sky-managed Storage works.

@landscapepainter
Copy link
Collaborator Author

Writing this for future reference on why the handling for this edge case cannot be done in _validate_storage_spec:

We allow to create new MOUNT mode storages without source specified and with name only, which is when user wants to use the bucket as a scratch space. On the other hand, we don't allow to MOUNT non-sky managed Storages with name only and require source to do so. In either case, global_user_state does not contain the hanlde of the Storage. And this is the key reason why we cannot validate this edge case from _validate_storage_spec.

To make sure we don't block users from creating new MOUNT mode storage with only specifying name and without source, we need to make sure not to merely block this early from _validate_storage_spec and check if the storage exists first from _get_bucket(thorugh methods such as self.client.head_bucket()) and then check if it's sky managed through existance of the handle in global_user_state. If it's not sky managed, this is gauranteed to be a storage created externally and not sky managed, which is the case we should block the user's MOUNTing without source and with just name.

@landscapepainter
Copy link
Collaborator Author

@romilbhardwaj This is ready for another look!

@Michaelvll
Copy link
Collaborator

Any update for this?

@Michaelvll Michaelvll added this to the v0.5 milestone Feb 6, 2024
@landscapepainter
Copy link
Collaborator Author

@Michaelvll This PR is waiting for the Storage page from readthedoc to be reformatted. @romilbhardwaj is working on it.

@romilbhardwaj
Copy link
Collaborator

Blocked on #3123

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @landscapepainter! This works nicely - tested with manual tests. Will approve and merge once #3162 is merged.

Edit - smoke tests pass.

@romilbhardwaj romilbhardwaj mentioned this pull request Feb 20, 2024
2 tasks
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran smoke tests again, this is good to go. Thanks @landscapepainter!

@romilbhardwaj romilbhardwaj merged commit f454105 into skypilot-org:master Feb 21, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Storage] sky start failing to re-mount on externally created Storage
3 participants