Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow longer name for AWS/Azure cluster and refactoring #1648

Merged
merged 15 commits into from
Jan 31, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jan 31, 2023

This PR

  1. fixes the first part of the TODOs in Gracefully handle long cluster names & for spot, show FAILED_CONTROLLER instead #963, to unblock the user from using a long name for spot jobs on AWS.
  2. moves the cluster name checking into each Cloud.
  3. refactors the cloud.supports to cloud.check_features_are_supported and change the boolean return value to exception raising.
  4. optimizes the failover for InvalidClusterNameError and ClusterUserIdentityError to directly move to the next cloud, as those errors apply to the whole cloud.
  5. fixes minor UX issues for failover:
    Previously:
    image
    Now:
    image

Tested (run the relevant ones):

  • Any manual or new tests for this PR (please specify below)
    • sky launch -c i-am-using-a-very-long-name-for-the-aws-cluster-it-should-be-more-than-63-characters-and-hopefully-it-works-this-sentence-has-135-chars --cloud aws
    • AWS 244 characters: sky launch -c i-am-using-a-very-long-name-for-the-aws-cluster-it-should-be-more-than-63-characters-and-hopefully-it-works-by-checking-the-document-of-aws-we-found-that-the-cluster-name-length-should-be-less-than-250-due-to-the-tag-limit-this-sentence-has-246-c --num-nodes 2 --cloud aws -t t3.micro (and 245 does not work)
    • GCP (didn't change the original tested limit)
    • Azure 42 characters: sky launch -c i-am-using-a-very-long-name-for-the-aws-cluste --cloud azure --region southcentralus --num-nodes 2
    • Lambda 64 characters: sky launch -c i-am-using-a-very-long-name-for-the-aws-cluster-it-should-be-mor --cloud lambda works
    • sky launch -c i-am-using-a-very-long-name-for-the-azure-cluster-it-should-be-more-than-63-characters-and-hopefully-it-works-this-sentence-has-136-chars --cloud azure: correctly fails for the Azure
    • sky launch -c i-am-using-a-very-long-name-for-the-aws-cluster-it-should-be-more-than-63-characters-and-hopefully-it-works-this-sentence-has-135-chars --gpus A100:8, it successfully and quickly failover through Lambda/GCP/Azure and AWS (skip the first three clouds with a single try and try all the regions on AWS).

@Michaelvll Michaelvll changed the title Allow longer name for AWS/Azure cluster Allow longer name for AWS/Azure cluster and refactoring Jan 31, 2023
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll, and nice tests! Shall we also test a name length close to AWS's char limit?

sky/core.py Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/cli.py Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Show resolved Hide resolved
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Jan 31, 2023

Shall we also test a name length close to AWS's char limit?

Good catch! Actually, we have to reduce the limit to 246 for AWS because of the limitation of DescribeInstance API.

sky/clouds/aws.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
sky/core.py Show resolved Hide resolved
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll. Just some minor comments and a question about some additional characters added by the autoscaler. May be worth updating PR description with each cloud's at-limit test?

sky/clouds/lambda_cloud.py Show resolved Hide resolved
sky/clouds/lambda_cloud.py Show resolved Hide resolved
sky/clouds/cloud.py Show resolved Hide resolved
sky/clouds/lambda_cloud.py Outdated Show resolved Hide resolved
sky/clouds/cloud.py Outdated Show resolved Hide resolved
@Michaelvll
Copy link
Collaborator Author

May be worth updating PR description with each cloud's at-limit test?

Great point! I added the tests for each cloud, and seems there are multiple corner cases that are caught by those tests. I fixed the cluster name limit for AWS and Azure. Thanks!

@Michaelvll Michaelvll merged commit c1a194f into master Jan 31, 2023
@Michaelvll Michaelvll deleted the fix-long-name branch January 31, 2023 22:46
Michaelvll added a commit that referenced this pull request Feb 1, 2023
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Feb 22, 2023
…#1648)

* Allow longer name for AWS/Azure cluster

* Refactor the cluster name length checking

* Refactor cloud.supports and optimize the failover speed

* format

* Refactor cloud features support

* fix logging

* UX fix

* rename

* Add comments

* format

* address comment

* fix aws length

* address comments

* Add comment
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Feb 22, 2023
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Mar 15, 2023
…#1648)

* Allow longer name for AWS/Azure cluster

* Refactor the cluster name length checking

* Refactor cloud.supports and optimize the failover speed

* format

* Refactor cloud features support

* fix logging

* UX fix

* rename

* Add comments

* format

* address comment

* fix aws length

* address comments

* Add comment
sumanthgenz pushed a commit to sumanthgenz/skypilot that referenced this pull request Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants