Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add vSphere cluster configuration reference with examples #39379

Merged

Conversation

wfangchi
Copy link
Contributor

@wfangchi wfangchi commented Sep 7, 2023

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?

In PR #37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests (I ran make html and made sure no errors found)
    • Release tests
    • This PR is not tested :(

Similar to other providers, we add example-minimal.yaml and example-full.yaml
to vSphere autoscaler. And we add and refine vSphere related references in the
Getting Started guide as well as the cluster configuration reference page, based
on the newly added examples.

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! For now the example yamls are tested manually, right? In the future we can add them to automatic release tests that are run periodically in CI.

@architkulkarni
Copy link
Contributor

RLlib test failures unrelated
Linkcheck failure:

(ray-contribute/profiling: line 119) broken http://goog-perftools.sourceforge.net/doc/cpu_profiler.html - 403 Client Error: Forbidden for url: http://goog-perftools.sourceforge.net/doc/cpu_profiler.html

  | (ray-contribute/profiling: line 104) broken http://goog-perftools.sourceforge.net/doc/pprof-test-big.gif - 403 Client Error: Forbidden for url: http://goog-perftools.sourceforge.net/doc/pprof-test-big.gif

unrelated

test_redis_tls , test_client_builder unrelated
test_websockets unrelated

@architkulkarni architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 7, 2023
@architkulkarni architkulkarni merged commit fecca87 into ray-project:master Sep 7, 2023
82 of 88 checks passed
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request Sep 7, 2023
…project#39379)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR ray-project#37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html


Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023
…project#39379)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR ray-project#37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html


Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
@wfangchi
Copy link
Contributor Author

wfangchi commented Sep 8, 2023

Thanks @architkulkarni !

For now the example yamls are tested manually, right?

Yes they are.

In the future we can add them to automatic release tests that are run periodically in CI.

Yes we are building some in-house automation currently as they require on-prem vSphere environments. Will explore how to integrate such tests into the GitHub repo.

@wfangchi wfangchi deleted the vsphere-cluster-configuration branch September 8, 2023 03:13
GeneDer pushed a commit that referenced this pull request Sep 8, 2023
…) (#39399)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR #37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Fangchi Wang <wfangchi@vmware.com>
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
…project#39379)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR ray-project#37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
architkulkarni added a commit that referenced this pull request Sep 15, 2023
Similar as other providers, this change adds a user guide for vSphere Ray cluster launcher,
including how to prepare the vSphere environment and the frozen VM, as well as the general
steps to launch the cluster. It also contains a section on how to use vSAN File Service to
provision NFS endpoints as persistent storage for Ray AIR, with a new example YAML file.

In addition to that, existing examples and docs are updated to include the correct command
to install vSphere Python SDK.

Signed-off-by: Fangchi Wang wfangchi@vmware.com

Why are these changes needed?
As mentioned in PR #39379 , we need a dedicated user guide for launching Ray clusters on vSphere. This change does that with a newly added vsphere.md, including a solution for Ray 2.7's deprecation of syncing to head node for Ray AIR, using VMware vSAN File Service.

---------

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
architkulkarni added a commit to architkulkarni/ray that referenced this pull request Sep 28, 2023
Similar as other providers, this change adds a user guide for vSphere Ray cluster launcher,
including how to prepare the vSphere environment and the frozen VM, as well as the general
steps to launch the cluster. It also contains a section on how to use vSAN File Service to
provision NFS endpoints as persistent storage for Ray AIR, with a new example YAML file.

In addition to that, existing examples and docs are updated to include the correct command
to install vSphere Python SDK.

Signed-off-by: Fangchi Wang wfangchi@vmware.com

Why are these changes needed?
As mentioned in PR ray-project#39379 , we need a dedicated user guide for launching Ray clusters on vSphere. This change does that with a newly added vsphere.md, including a solution for Ray 2.7's deprecation of syncing to head node for Ray AIR, using VMware vSAN File Service.

---------

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
GeneDer pushed a commit that referenced this pull request Sep 29, 2023
* [Doc] Add vSphere Ray cluster launcher user guide (#39630)

Similar as other providers, this change adds a user guide for vSphere Ray cluster launcher,
including how to prepare the vSphere environment and the frozen VM, as well as the general
steps to launch the cluster. It also contains a section on how to use vSAN File Service to
provision NFS endpoints as persistent storage for Ray AIR, with a new example YAML file.

In addition to that, existing examples and docs are updated to include the correct command
to install vSphere Python SDK.

Signed-off-by: Fangchi Wang wfangchi@vmware.com

Why are these changes needed?
As mentioned in PR #39379 , we need a dedicated user guide for launching Ray clusters on vSphere. This change does that with a newly added vsphere.md, including a solution for Ray 2.7's deprecation of syncing to head node for Ray AIR, using VMware vSAN File Service.

---------

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>

* [vSphere Provider] Optimize the log, and remove the part for connecting NIC in Python (#39143)

This is one of the tech debt.
The philosopy of this change is:

For the one-time operation during ray up, such has creating the tag category, and the tags on vSphere, still using cli_logger.info
For the other code which will be executed both during ray up and by the autoscaler in the head node, I use the logger.
I changed many logs to debug level, except for the important ones, such as create a VM, delete a VM and reuse the existing VM.

This change also removes a logic for connecting NIC. We don't need that part anymore, because we will have one script in the customze.sh scirpt planted in the frozen VM which does the job. This script will be exectued once right after instant cloning.

---------

Signed-off-by: Chen Jing <jingch@vmware.com>

* [Cluster launcher] [vSphere] Support deploying one frozen VM, or a set of frozen VMs from OVF, then do ray up. (#39783)

Bug fix
The default.yaml file was not built into the Python wheel, also not in the setup.py scirpt. This change added it.
New features
1. Support creating Ray nodes from a set of frozen VMs in a resource pool.
The motivation is when doing instant clone, the new VM must be on the same ESXi host with the parent VM. Previously we have only one frozen VM. The Ray nodes created from that frozen VM need to be relocated to other ESXi hosts by vSphere DRS. After this change, we can do round robin on the ESXi hosts to do instant clone to create the Ray nodes. We save the overhead of doing DRS.

2. Support creating the frozen VM, or a set of frozen VMs from OVF template.
This feature helps save some manual steps when the user has no existing frozen vm(s) but has an OVF template. Previously the user must manully login onto vSphere and deploy a frozen VM from the OVF first. Now we covered this fucntionality in ray up.

3. Support powering on the frozen VM when the VM is at powered off status when doing ray up, we will wait the frozen VM is really "frozen", then do ray up.
Previously we have code logic to power on the frozen VM, but we will not wait it until it is frozen (usually need 2 mins or so). This is a bug actually. In this change we add a function called "wait_until_frozen" to resolve this issue.

4. Some code refactoring work. We split the vsphere sdk related code into another Python file.
5. Update the yaml example files and the corresponding docs for above changes.

---------

Signed-off-by: Chen Jing <jingch@vmware.com>

* [Doc] Update the vSphere cluster Launcher Maintainer. (#39758)

Since Vinod has left the company, we need to update the vSphere Launcher maintainer list to add Roshan and Chen. Roshan acts as Vinod's successor, while Chen will be responsible for overseeing Ray-OSS and facilitating open-source development collaboration.

Signed-off-by: Layne Peng <playne@vmware.com>

---------

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Signed-off-by: Chen Jing <jingch@vmware.com>
Signed-off-by: Layne Peng <playne@vmware.com>
Co-authored-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Chen Jing <jingch@vmware.com>
Co-authored-by: Layne Peng <appamail@hotmail.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…project#39379)

Similar to other providers, we add example-minimal.yaml and example-full.yaml to vSphere autoscaler. And we add and refine vSphere related references in the Getting Started guide as well as the cluster configuration reference page, based on the newly added examples.

Why are these changes needed?
In PR ray-project#37815 we've added vSphere platform support to Ray Autoscaler. However, the related documents are not sufficient. This follow-up change adds related examples similar to other platforms. The related documents including the getting-started guide as well as the cluster configuration reference also need to be updated to include descriptions specific for vSphere.

We will do another follow-up PR to add a "Launching Ray Clusters on vSphere" user guide at https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Signed-off-by: Victor <vctr.y.m@example.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Similar as other providers, this change adds a user guide for vSphere Ray cluster launcher,
including how to prepare the vSphere environment and the frozen VM, as well as the general
steps to launch the cluster. It also contains a section on how to use vSAN File Service to
provision NFS endpoints as persistent storage for Ray AIR, with a new example YAML file.

In addition to that, existing examples and docs are updated to include the correct command
to install vSphere Python SDK.

Signed-off-by: Fangchi Wang wfangchi@vmware.com

Why are these changes needed?
As mentioned in PR ray-project#39379 , we need a dedicated user guide for launching Ray clusters on vSphere. This change does that with a newly added vsphere.md, including a solution for Ray 2.7's deprecation of syncing to head node for Ray AIR, using VMware vSAN File Service.

---------

Signed-off-by: Fangchi Wang <wfangchi@vmware.com>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants