Skip to content

Comments

[VMware][WCP provider][Part 2/n]: Add vSphere WCP node provider#51138

Merged
edoakes merged 4 commits intoray-project:masterfrom
VamshikShetty:feature/add-wcp-provider
Mar 20, 2025
Merged

[VMware][WCP provider][Part 2/n]: Add vSphere WCP node provider#51138
edoakes merged 4 commits intoray-project:masterfrom
VamshikShetty:feature/add-wcp-provider

Conversation

@VamshikShetty
Copy link
Contributor

@VamshikShetty VamshikShetty commented Mar 6, 2025

Why are these changes needed?

This change is continuation on part 1 MR [#51029], where we did cleanup on previous experimental vSphere provider. In this MR we add the logic to deploy ray cluster in K8s vSphere Supervisor by extending k8s API.

Changes include:

  1. Update dummy node provider with valid logic to support node lifecycle management.
  2. Addition of cluster operator client to deal with vSphere Supervisor constructs such as:
    • vm-operator's custom resource to deploy vSphere VMs as ray nodes.
    • Ray cluster operator's (internal k8s controller) custom resource to track lifecycle of ray cluster.

Related issue number

Checks

  • [x ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ x] I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
@VamshikShetty VamshikShetty requested a review from a team as a code owner March 6, 2025 21:19
Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Copy link
Contributor

@roshankathawate roshankathawate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin85421
Copy link
Member

Hi @VamshikShetty, has this PR been reviewed internally or run in production at VMware? It's a big PR, and I would expect several iterations of reviews before your team approves it if it hasn't been reviewed internally.

@VamshikShetty
Copy link
Contributor Author

Hey @kevin85421
I understand your concern, so to put this MR in context we an internal repository which is a mirror of ray; where vSphere provider code went through months of evolution and this MR just brings that logic in front. We will be refining the provider code in future but first wanted to move these changes to official ray git repo and track future changes here.

cc: @roshankathawate

@roshankathawate
Copy link
Contributor

Hi @kevin85421 ,

As @VamshikShetty mentioned, we maintain thorough code review process. In next PR we will also provide unit test cases as well as architecture documents for your reference.

@kevin85421
Copy link
Member

Nice! Thank you for the explanation!

@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Mar 17, 2025
@kevin85421 kevin85421 self-assigned this Mar 17, 2025
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp. This PR has been reviewed internally at VMware for months, and it only touches vSphere-related code. cc @edoakes @jjyao would you mind merging this PR after CI passes?

@edoakes edoakes merged commit 5456d87 into ray-project:master Mar 20, 2025
5 of 6 checks passed
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
…project#51138)

This change is continuation on part 1 MR [ray-project#51029], where we did cleanup
on previous experimental vSphere provider. In this MR we add the logic
to deploy ray cluster in K8s vSphere Supervisor by extending k8s API.

Changes include:
1. Update dummy node provider with valid logic to support node lifecycle
management.
2. Addition of cluster operator client to deal with vSphere Supervisor
constructs such as:
* [vm-operator's](https://github.com/vmware-tanzu/vm-operator) custom
resource to deploy vSphere VMs as ray nodes.
* Ray cluster operator's (internal k8s controller) custom resource to
track lifecycle of ray cluster.

---------

Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
edoakes pushed a commit that referenced this pull request Apr 8, 2025
… vsphere wcp provider (#51666)

## Why are these changes needed?

This change is continuation of MR [#51138], where cluster lifecycle
management logic for experimental vSphere provider was added. In this MR
we add the architecture doc and unittests to validate the said provider.

Changes include:
1. Add architecture markdown documentation file.
2. Add unittests to validate logic in cluster_operator_client.py &
node_provider.py in vsphere's autoscaler.

## Related issue number

## Checks

- [x ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Co-authored-by: vs030455 <svamshik@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants