Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Storage docs rewrite #3162

Merged
merged 15 commits into from
Feb 20, 2024
Merged

[Docs] Storage docs rewrite #3162

merged 15 commits into from
Feb 20, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Feb 14, 2024

Closes #3123.

TODO:

  • Mounting mode considerations section - a nice table to help users pick b/w COPY and MOUNT

Tested:

  • Rendered locally

@romilbhardwaj romilbhardwaj marked this pull request as ready for review February 15, 2024 20:41
@Michaelvll
Copy link
Collaborator

Just activate the hidden doc in our doc page for easier review: https://skypilot.readthedocs.io/en/storage-docs-rewrite/reference/storage.html
Will get to this soon : )

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! Quick look first.

docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved

2. **You want to mount an existing S3/GCS/R2 bucket to your remote VM -** specify
just the source field (e.g., s3://my-bucket/, gs://my-bucket/ or r2://my-bucket/).
.. tab-item:: Upload files to new bucket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are not able to upload files to existing bucket? This might be unexpected, as in some of our users use case, they woud like to have all the data in a single bucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded the titles and added a visually distinctnote - does that help?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I was wondering what will happen if the bucket already exists and was not created by SkyPilot. For example, there is a team writing files to the same bucket, and the bucket will only be managed by SkyPilot for one of the teammates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a kind bump for this @romilbhardwaj : )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh thanks for the ping @Michaelvll, this had slipped through the cracks.

If the bucket was created by a team member, other members will have to use the source: s3://... pattern.

We disallow name: external-bucket pattern from mounting external buckets because e.g., if I want to create a bucket and use the name of a pre-existing public (or private) bucket, these semantics will silently mount the bucket and my task will fail when it tries to write the desired path (or worse yet, overwrite unintended files).

This behavior manifested in our smoke tests too (and is now fixed in #2804). Our test_using_file_mounts_with_env_vars was silently re-mounting skypilot-temp-gcs-test bucket which was shared in our GCS account, instead of creating a new bucket at each test instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks that makes sense! I found it a bit non-intuitive, for the following case, but I guess it is fine to leave it to the future.

If I am a member of a org, and created a bucket for with mount mode for storing some jupyter notebooks:

file_mounts:
  /dst:
    name: my-bucket
    store: s3
    mode: MOUNT
run: jupyter notebook /dst

The person then shares the YAML with another member in the org. Does this mean the other member running this yaml will fail and have to change it to:

file_mounts:
  /dst:
    source: s3://my-bucket
    mode: MOUNT
run: jupyter notebook /dst

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revamping the doc for storage @romilbhardwaj! This page looks super nice. Left several comments about the ML specific use case clarification (similar as @concretevitamin 's comments).

docs/source/reference/storage.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Show resolved Hide resolved
@romilbhardwaj
Copy link
Collaborator Author

Thanks for the feedback @Michaelvll and @concretevitamin! Ready for another look.

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romilbhardwaj, did a pass.

docs/source/reference/storage.rst Show resolved Hide resolved
@@ -1,303 +1,291 @@
.. _sky-storage:

SkyPilot Storage
=================
Using Cloud Buckets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using Cloud Buckets
Cloud Object Storage

This seems most commonly used. Cloud Bucket doesn't show too many search results.

"Using": Repetitive w/ section title "Using Data"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud buckets appears to be the more commonly used term:

image

Changed to just Cloud Buckets.

docs/source/reference/storage.rst Outdated Show resolved Hide resolved

# Remember to run `sky storage ls` and `sky storage delete` to delete the
# created storage objects!
.. tab-item:: Create (or re-use) a bucket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. tab-item:: Create (or re-use) a bucket
.. tab-item:: Create or reuse a bucket

Can we make this the 2nd (or even 1st) tab? It's more common than "Upload files ..".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to 2nd tab.

Also I've removed the mention of "reuse", since it conflicts with "Use an existing bucket". The re-use bit is clearly stated in a note box for people who are interested in using this:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several options

  1. "Use an existing bucket" / "Create and reuse a bucket" ("and" should avoid the conflict?)
  2. "Use an external bucket" / "Create and reuse a bucket"
  3. "Use an existing bucket" / "Create a bucket"

Slight preference for 1 & 2, since I think the "reuse" part may be common to warrant being in the tabs; that said, 3 is ok with me too if @romilbhardwaj and @Michaelvll agree :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with 2 or 3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer 1 & 3, as Use an extneral bucket sounds like it cannot be used with a SkyPilot managed storage. : )

docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/yaml-spec.rst Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/reference/storage.rst Outdated Show resolved Hide resolved
@concretevitamin
Copy link
Collaborator

Also, should grep for SkyPilot Storage and rephrase. E.g., https://skypilot.readthedocs.io/en/storage-docs-rewrite/examples/syncing-code-artifacts.html

@romilbhardwaj
Copy link
Collaborator Author

Thanks @concretevitamin, ready for another look

docs/source/reference/storage.rst Outdated Show resolved Hide resolved
docs/source/examples/syncing-code-artifacts.rst Outdated Show resolved Hide resolved

# Remember to run `sky storage ls` and `sky storage delete` to delete the
# created storage objects!
.. tab-item:: Create (or re-use) a bucket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several options

  1. "Use an existing bucket" / "Create and reuse a bucket" ("and" should avoid the conflict?)
  2. "Use an external bucket" / "Create and reuse a bucket"
  3. "Use an existing bucket" / "Create a bucket"

Slight preference for 1 & 2, since I think the "reuse" part may be common to warrant being in the tabs; that said, 3 is ok with me too if @romilbhardwaj and @Michaelvll agree :)

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits. Thanks @romilbhardwaj.

@romilbhardwaj
Copy link
Collaborator Author

Thanks for the reviews! Merging this soon - lmk if you have any other feedback @Michaelvll @concretevitamin

@romilbhardwaj romilbhardwaj merged commit 2a96ee4 into master Feb 20, 2024
19 checks passed
@romilbhardwaj romilbhardwaj deleted the storage-docs-rewrite branch February 20, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs] Storage docs rewrite
3 participants