Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sky gpunode reuse existing cluster if possible #1787

Merged
merged 2 commits into from
Mar 17, 2023

Conversation

ewzeng
Copy link
Collaborator

@ewzeng ewzeng commented Mar 16, 2023

Partially fixes #1331 by making sky gpunode reuse existing cluster if possible.

Before:

$ sky gpunode --gpus V100
$ sky gpunode --gpus V100 # fails

After

$ sky gpunode --gpus V100
$ sky gpunode --gpus V100 # reuses gpunode

Tested

  • sky gpunode --gpus V100 followed by sky gpunode
  • sky gpunode --gpus V100 followed by sky gpunode --gpus V100
  • sky gpunode followed by sky gpunode --gpus V100 (should error)
  • The above tests, with -c name

Notes:

  • I wanted to add a smoke test, but I couldn't because sky gpunode automatically ssh's into the cluster and there is no detach option.
  • It is not possible to check if sky gpunode was launched with the exact same resource specification (since some resources are decided during execution), so we reuse the cluster if the required resources are a subset of the launched resources.

@ewzeng ewzeng requested review from Michaelvll and romilbhardwaj and removed request for Michaelvll March 16, 2023 19:47
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @ewzeng!

Just a small note after running some tests - sky gpunode –gpus V100 followed by sky gpunode correctly logs you in to the gpunode on current master (464b5db). It's only the sky gpunode –gpus V100 followed by sky gpunode –gpus V100 case that results in an error. Before merging, please update the "before" part in the PR description for future reference:

$ sky gpunode --gpus V100
$ sky gpunode --gpus V100 # fails
$ sky gpunode  # reuses gpunode

@ewzeng
Copy link
Collaborator Author

ewzeng commented Mar 17, 2023

Ah -- my bad. Thanks for catching that!

@ewzeng ewzeng merged commit b990a3a into skypilot-org:master Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Interactive node] Fail to launch a INIT gpunode
2 participants