should not fetch runtime-info of vm from cached_nodes #40655

huchen2021 · 2023-10-25T09:23:38Z

Why are these changes needed?

Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data.
It should query by pyvmomi_sdk every time.

Test

We have internal automation pipelines passed. Also team tested the GPU functionalities manually.

Checks

[* ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[* ] I've run scripts/format.sh to lint the changes in this PR.
[ *] I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
[* ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [* ] Unit tests
- [ *] Release tests
- This PR is not tested :(

Signed-off-by: Chen Hui <huchen@vmware.com>

architkulkarni · 2023-10-25T16:22:10Z

Core, serve, rllib tests unrelated

…hed_nodes (ray-project#40655) Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data. It should query by pyvmomi_sdk every time. Signed-off-by: Chen Hui <huchen@vmware.com>

…sue and support GPU nodes (#40667) * [Cluster launcher] [vSphere] Fix the bug that multiple worker types doesn't work (#40487) Currently our code assumes that there is only one worker node type. In this change I fix the bug to let it support multiple worker node types. Signed-off-by: Chen Jing <jingch@vmware.com> Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com> * [cluster launcher] [vSphere Provider] Fix vc conn timout issue (#40516) Fixed the issue using SessionOrientedStub. A session-oriented stub adapter that will relogin to the destination if a session-oriented exception is thrown. --------- Signed-off-by: Chen Jing <jingch@vmware.com> * [cluster launcher] [vSphere Provider] Support GPU Ray nodes on vSphere (#40616) This is for supporting passthrough the GPU on vSphere ESXi host into the Ray nodes. --------- Signed-off-by: Chen Jing <jingch@vmware.com> * [cluster launcher] [vSphere] Do not fetch runtime-info of vm from cached_nodes (#40655) Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data. It should query by pyvmomi_sdk every time. Signed-off-by: Chen Hui <huchen@vmware.com> --------- Signed-off-by: Chen Jing <jingch@vmware.com> Signed-off-by: Chen Hui <huchen@vmware.com> Co-authored-by: Chen Jing <jingch@vmware.com> Co-authored-by: huchen2021 <85480625+huchen2021@users.noreply.github.com>

should not fetch runtime-info of vm from cached_nodes

d29fc0d

Signed-off-by: Chen Hui <huchen@vmware.com>

huchen2021 requested review from ericl, architkulkarni and a team as code owners October 25, 2023 09:23

architkulkarni approved these changes Oct 25, 2023

View reviewed changes

architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 25, 2023

architkulkarni merged commit 7f98a20 into ray-project:master Oct 25, 2023
33 of 37 checks passed

architkulkarni mentioned this pull request Oct 25, 2023

[Cluster launcher] [vSphere] Fix multiple worker_types and timeout issue and support GPU nodes #40667

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should not fetch runtime-info of vm from cached_nodes #40655

should not fetch runtime-info of vm from cached_nodes #40655

huchen2021 commented Oct 25, 2023 •

edited

Loading

architkulkarni commented Oct 25, 2023

should not fetch runtime-info of vm from cached_nodes #40655

should not fetch runtime-info of vm from cached_nodes #40655

Conversation

huchen2021 commented Oct 25, 2023 • edited Loading

Why are these changes needed?

Test

Checks

architkulkarni commented Oct 25, 2023

huchen2021 commented Oct 25, 2023 •

edited

Loading