Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamped SSH Configuration Storage #2706

Merged
merged 19 commits into from
Jan 10, 2024

Conversation

Vaibhav2001
Copy link
Contributor

@Vaibhav2001 Vaibhav2001 commented Oct 12, 2023

Description:

This is in relation to #1765

Overview:
This PR introduces a significant change to the SSHConfigHelper, where the configuration for cluster host details is now stored in a dedicated file within the .sky/generated/ssh folder. The user's primary configuration file now includes an Include statement to reference this folder.

Backward Compatibility:
For users adding existing clusters, the previous configuration for those clusters will be removed from the main configuration file, and a new configuration file will be created in the .sky/generated/ssh folder. This change ensures that removal of existing clusters performed prior to this update functions as expected.

Testing:
Extensive manual testing was conducted, as the existing tests were found to be inadequate for this specific use case. Testing primarily involved adding and removing clusters using sky launch and sky down commands, ensuring that a file is created and deleted for each cluster accordingly. To guarantee backward compatibility, clusters were added with the master and checked out on this branch to verify expected behavior.

Please Review:

  • Ensure the revised SSHConfigHelper's compatibility with existing configurations.
  • Confirm that new cluster configurations are correctly stored and referenced.
  • Test backward compatibility, especially for clusters added prior to this update.
  • Validate the behaviour of the sky launch and sky down commands in relation to configuration files.
  • Your feedback and suggestions are highly appreciated. This change will significantly improve SSH configuration management and ensure smoother cluster management for users.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the PR @Vaibhav2001! The refactoring is very important to keep the ssh config modularized. Left several comments.

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Vaibhav2001! The code looks mostly good to me. Left several comments for making the code cleaner.

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
@Michaelvll Michaelvll self-requested a review November 10, 2023 23:45
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this PR @Vaibhav2001! Left several comments. There seems might be some issues with multi-node clusters. Could you test with multiple nodes as well?

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @Vaibhav2001! Left several comments. : )

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick update @Vaibhav2001! I just checked the PR again, but still found several issues with the docker and undefined variables. I sent a proposed fix below in the comment, could you help test out:

  • With current PR: Launch a normal cluster (multi-node), re-launch the cluster, test the ssh, stop the cluster, check the ssh, restart the cluster, check ssh, terminate the cluster check the ssh.
  • With current PR: launch a cluster (multi-node) with docker image (example), check ssh, stop and restart, check ssh
  • With master branch: launch a cluster, switch to current PR, test sky exec, launch again, check ssh...
  • With master branch, try cluster with docker image and switch to the current PR.

It would be nice if you can figure out some additional tests to make sure this PR is correct before we can merge this. : )

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
@Vaibhav2001
Copy link
Contributor Author

Vaibhav2001 commented Jan 3, 2024

Running these tests right now

  • With current PR: Launch a normal cluster (multi-node), re-launch the cluster, test the ssh, stop the cluster, check the ssh (file should be removed), restart the cluster, check ssh (a new file should be created at sky/generareted/ssh), terminate the cluster check the ssh (file should be removed)
  • With current PR: launch a cluster (multi-node) with docker image (example), check ssh, stop and restart, check ssh
  • With master branch: launch a cluster, switch to current PR, test sky exec, launch again, check ssh, and made sure it is at the right location (user config should have only the include and the actual config should be in a separate file in sky/generareted/ssh)
  • With master branch: launch a cluster, switch to current PR, test sky exec, launch again, terminate the cluster, check ssh and the location (file should be removed)
  • With master branch: launch a cluster, switch to current PR, terminate the cluster, and check ssh (user config should be clean).
  • With master branch, try cluster with docker image and switch to the current PR, stop and restart, check ssh, terminate cluster, check ssh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Vaibhav2001! The code looks good to me! Once the tests are passed, it should be good to ship it.

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
@Vaibhav2001
Copy link
Contributor Author

Running these tests right now

  • With current PR: Launch a normal cluster (multi-node), re-launch the cluster, test the ssh, stop the cluster, check the ssh (file should be removed), restart the cluster, check ssh (a new file should be created at sky/generareted/ssh), terminate the cluster check the ssh (file should be removed)
  • With current PR: launch a cluster (multi-node) with docker image (example), check ssh, stop and restart, check ssh
  • With master branch: launch a cluster, switch to current PR, test sky exec, launch again, check ssh, and made sure it is at the right location (user config should have only the include and the actual config should be in a separate file in sky/generareted/ssh)
  • With master branch: launch a cluster, switch to current PR, test sky exec, launch again, terminate the cluster, check ssh and the location (file should be removed)
  • With master branch: launch a cluster, switch to current PR, terminate the cluster, and check ssh (user config should be clean).
  • With master branch, try cluster with docker image and switch to the current PR, stop and restart, check ssh, terminate cluster, check ssh

Thanks for the update @Vaibhav2001! The code looks good to me! Once the tests are passed, it should be good to ship it.

I ran these tests and it seems to be working as intended.

@Michaelvll Michaelvll merged commit f765168 into skypilot-org:master Jan 10, 2024
19 checks passed
@concretevitamin
Copy link
Collaborator

This is very useful @Vaibhav2001!

@Vaibhav2001 Vaibhav2001 deleted the fix-ssh-config branch February 5, 2024 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants