-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Storage docs rewrite #3162
Conversation
Just activate the hidden doc in our doc page for easier review: https://skypilot.readthedocs.io/en/storage-docs-rewrite/reference/storage.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean! Quick look first.
docs/source/reference/storage.rst
Outdated
|
||
2. **You want to mount an existing S3/GCS/R2 bucket to your remote VM -** specify | ||
just the source field (e.g., s3://my-bucket/, gs://my-bucket/ or r2://my-bucket/). | ||
.. tab-item:: Upload files to new bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we are not able to upload files to existing bucket? This might be unexpected, as in some of our users use case, they woud like to have all the data in a single bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the titles and added a visually distinctnote
- does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I was wondering what will happen if the bucket already exists and was not created by SkyPilot. For example, there is a team writing files to the same bucket, and the bucket will only be managed by SkyPilot
for one of the teammates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a kind bump for this @romilbhardwaj : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh thanks for the ping @Michaelvll, this had slipped through the cracks.
If the bucket was created by a team member, other members will have to use the source: s3://...
pattern.
We disallow name: external-bucket
pattern from mounting external buckets because e.g., if I want to create a bucket and use the name of a pre-existing public (or private) bucket, these semantics will silently mount the bucket and my task will fail when it tries to write the desired path (or worse yet, overwrite unintended files).
This behavior manifested in our smoke tests too (and is now fixed in #2804). Our test_using_file_mounts_with_env_vars
was silently re-mounting skypilot-temp-gcs-test
bucket which was shared in our GCS account, instead of creating a new bucket at each test instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that makes sense! I found it a bit non-intuitive, for the following case, but I guess it is fine to leave it to the future.
If I am a member of a org, and created a bucket for with mount mode for storing some jupyter notebooks:
file_mounts:
/dst:
name: my-bucket
store: s3
mode: MOUNT
run: jupyter notebook /dst
The person then shares the YAML with another member in the org. Does this mean the other member running this yaml will fail and have to change it to:
file_mounts:
/dst:
source: s3://my-bucket
mode: MOUNT
run: jupyter notebook /dst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for revamping the doc for storage @romilbhardwaj! This page looks super nice. Left several comments about the ML specific use case clarification (similar as @concretevitamin 's comments).
Thanks for the feedback @Michaelvll and @concretevitamin! Ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romilbhardwaj, did a pass.
docs/source/reference/storage.rst
Outdated
@@ -1,303 +1,291 @@ | |||
.. _sky-storage: | |||
|
|||
SkyPilot Storage | |||
================= | |||
Using Cloud Buckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Cloud Buckets | |
Cloud Object Storage |
This seems most commonly used. Cloud Bucket doesn't show too many search results.
"Using": Repetitive w/ section title "Using Data"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/source/reference/storage.rst
Outdated
|
||
# Remember to run `sky storage ls` and `sky storage delete` to delete the | ||
# created storage objects! | ||
.. tab-item:: Create (or re-use) a bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. tab-item:: Create (or re-use) a bucket | |
.. tab-item:: Create or reuse a bucket |
Can we make this the 2nd (or even 1st) tab? It's more common than "Upload files ..".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several options
- "Use an existing bucket" / "Create and reuse a bucket" ("and" should avoid the conflict?)
- "Use an external bucket" / "Create and reuse a bucket"
- "Use an existing bucket" / "Create a bucket"
Slight preference for 1 & 2, since I think the "reuse" part may be common to warrant being in the tabs; that said, 3 is ok with me too if @romilbhardwaj and @Michaelvll agree :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with 2 or 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer 1 & 3, as Use an extneral bucket
sounds like it cannot be used with a SkyPilot managed storage. : )
Also, should grep for |
Thanks @concretevitamin, ready for another look |
docs/source/reference/storage.rst
Outdated
|
||
# Remember to run `sky storage ls` and `sky storage delete` to delete the | ||
# created storage objects! | ||
.. tab-item:: Create (or re-use) a bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several options
- "Use an existing bucket" / "Create and reuse a bucket" ("and" should avoid the conflict?)
- "Use an external bucket" / "Create and reuse a bucket"
- "Use an existing bucket" / "Create a bucket"
Slight preference for 1 & 2, since I think the "reuse" part may be common to warrant being in the tabs; that said, 3 is ok with me too if @romilbhardwaj and @Michaelvll agree :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some minor nits. Thanks @romilbhardwaj.
Thanks for the reviews! Merging this soon - lmk if you have any other feedback @Michaelvll @concretevitamin |
Closes #3123.
TODO:
COPY
andMOUNT
Tested: