Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should not fetch runtime-info of vm from cached_nodes #40655

Conversation

huchen2021
Copy link
Contributor

@huchen2021 huchen2021 commented Oct 25, 2023

Why are these changes needed?

Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data.
It should query by pyvmomi_sdk every time.

Test

We have internal automation pipelines passed. Also team tested the GPU functionalities manually.
image
image

Checks

  • [* ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [* ] I've run scripts/format.sh to lint the changes in this PR.
  • [ *] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • [* ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [* ] Unit tests
    • [ *] Release tests
    • This PR is not tested :(

Signed-off-by: Chen Hui <huchen@vmware.com>
@architkulkarni
Copy link
Contributor

Core, serve, rllib tests unrelated

@architkulkarni architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 25, 2023
@architkulkarni architkulkarni merged commit 7f98a20 into ray-project:master Oct 25, 2023
33 of 37 checks passed
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request Oct 25, 2023
…hed_nodes (ray-project#40655)

Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data.
It should query by pyvmomi_sdk every time.

Signed-off-by: Chen Hui <huchen@vmware.com>
vitsai pushed a commit that referenced this pull request Oct 26, 2023
…sue and support GPU nodes (#40667)

* [Cluster launcher] [vSphere] Fix the bug that multiple worker types doesn't work (#40487)

Currently our code assumes that there is only one worker node type.
In this change I fix the bug to let it support multiple worker node types.

Signed-off-by: Chen Jing <jingch@vmware.com>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>

* [cluster launcher] [vSphere Provider] Fix vc conn timout issue (#40516)

Fixed the issue using SessionOrientedStub. A session-oriented stub adapter that will relogin to the destination if a session-oriented exception is thrown.

---------

Signed-off-by: Chen Jing <jingch@vmware.com>

* [cluster launcher] [vSphere Provider] Support GPU Ray nodes on vSphere (#40616)

This is for supporting passthrough the GPU on vSphere ESXi host into the Ray nodes.

---------

Signed-off-by: Chen Jing <jingch@vmware.com>

* [cluster launcher] [vSphere] Do not fetch runtime-info of vm from cached_nodes (#40655)

Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data.
It should query by pyvmomi_sdk every time.

Signed-off-by: Chen Hui <huchen@vmware.com>

---------

Signed-off-by: Chen Jing <jingch@vmware.com>
Signed-off-by: Chen Hui <huchen@vmware.com>
Co-authored-by: Chen Jing <jingch@vmware.com>
Co-authored-by: huchen2021 <85480625+huchen2021@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants