New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to update version of Ray on a cluster? #246
Comments
For now, I think using pssh is the way to go. Assuming you have a
Then make a script to run via parallel ssh on all the other nodes. E.g.,
Then run it via parallel-ssh
This assumes you have a |
I've been trying to use a script for doing the initial installation of Ray so you don't need to create an AMI, but I haven't quite gotten it to work.
The It currently dies with messages like
FYI @jssmith. My guess is that the command |
A few responses here: On UpdatingInstructions for updating are basically right. The steps should be 1/ shut down Ray on all nodes, 2/ write update script, 3/ run update script on head node, 4/ run update script in parallel on other nodes, 4/ start up Ray on all nodes. I agree that we should add these instructions to https://github.com/ray-project/ray/blob/master/doc/using-ray-on-a-large-cluster.md On AMIThis should work. Some suggestions on bug fixes, then I'll comment on whether it is a good idea. You can use the I still tend to prefer steering users toward creating AMIs, but it is worth considering this. For one, the user doesn't have to get a smooth-running setup script. If there is a need for libraries that require license approval, large files, etc., it may be easier to just do it once, by hand, and then clone the result of this work. The larger worry that I have is that whenever there are external dependencies, e.g., downloading Anaconda or other dependencies, then speed and success become variable factors. Unless one has a good way to verify the success of the installation on each machine then this is a risky way to go. Note that these risks usually scale with the number of machines, so for small clusters the AMI may have less value, but as you get to larger installations it becomes increasingly useful to bring up all of the machines in a well-defined state. |
Instructions for updating the version of Ray using parallel-ssh have been added. #256 |
Someone I chatted with wants to do the following: update an existing Ray cluster with a bunch of nodes to use a newer version of Ray. Right now if the cluster is large the best way to do it seems to be to create an AMI with the new version and restart all the instances. Is there a better way (one possibility: provide an update-ray.sh for pssh)?
The text was updated successfully, but these errors were encountered: