[VMware][WCP provider][Part 2/n]: Add vSphere WCP node provider#51138
[VMware][WCP provider][Part 2/n]: Add vSphere WCP node provider#51138edoakes merged 4 commits intoray-project:masterfrom
Conversation
Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
Signed-off-by: vs030455 <vamshikdshetty@gmail.com>
|
Hi @VamshikShetty, has this PR been reviewed internally or run in production at VMware? It's a big PR, and I would expect several iterations of reviews before your team approves it if it hasn't been reviewed internally. |
|
Hey @kevin85421 cc: @roshankathawate |
|
Hi @kevin85421 , As @VamshikShetty mentioned, we maintain thorough code review process. In next PR we will also provide unit test cases as well as architecture documents for your reference. |
|
Nice! Thank you for the explanation! |
…project#51138) This change is continuation on part 1 MR [ray-project#51029], where we did cleanup on previous experimental vSphere provider. In this MR we add the logic to deploy ray cluster in K8s vSphere Supervisor by extending k8s API. Changes include: 1. Update dummy node provider with valid logic to support node lifecycle management. 2. Addition of cluster operator client to deal with vSphere Supervisor constructs such as: * [vm-operator's](https://github.com/vmware-tanzu/vm-operator) custom resource to deploy vSphere VMs as ray nodes. * Ray cluster operator's (internal k8s controller) custom resource to track lifecycle of ray cluster. --------- Signed-off-by: vs030455 <vamshikdshetty@gmail.com> Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
… vsphere wcp provider (#51666) ## Why are these changes needed? This change is continuation of MR [#51138], where cluster lifecycle management logic for experimental vSphere provider was added. In this MR we add the architecture doc and unittests to validate the said provider. Changes include: 1. Add architecture markdown documentation file. 2. Add unittests to validate logic in cluster_operator_client.py & node_provider.py in vsphere's autoscaler. ## Related issue number ## Checks - [x ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: vs030455 <vamshikdshetty@gmail.com> Co-authored-by: vs030455 <svamshik@vmware.com>
Why are these changes needed?
This change is continuation on part 1 MR [#51029], where we did cleanup on previous experimental vSphere provider. In this MR we add the logic to deploy ray cluster in K8s vSphere Supervisor by extending k8s API.
Changes include:
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.