-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vSphere Provider] Fix the bug that multiple worker types doesn't work #40487
[vSphere Provider] Fix the bug that multiple worker types doesn't work #40487
Conversation
Signed-off-by: Chen Jing <jingch@vmware.com>
|
||
worker_node_config["datastore"] = worker_datastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need datastore
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please forgive the original code which is hard to read.
Before the change, the logic is: if the datastore of the worker node is unset, use the head node's datastore by default.
But, what we actually want is (which is also published in our documentations): if the datastore of the worker node is unset, use the frozen VM's datastore.
So I make this change. We left the datastore empty for the worker node. It will automatically use the datastore of the frozen VM with this empty parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one minor question
Linkcheck and chaos test and HA tests are unrelated, this PR only touches the vSphere cluster launcher. |
…oesn't work (ray-project#40487) Currently our code assumes that there is only one worker node type. In this change I fix the bug to let it support multiple worker node types. Signed-off-by: Chen Jing <jingch@vmware.com> Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
…sue and support GPU nodes (#40667) * [Cluster launcher] [vSphere] Fix the bug that multiple worker types doesn't work (#40487) Currently our code assumes that there is only one worker node type. In this change I fix the bug to let it support multiple worker node types. Signed-off-by: Chen Jing <jingch@vmware.com> Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com> * [cluster launcher] [vSphere Provider] Fix vc conn timout issue (#40516) Fixed the issue using SessionOrientedStub. A session-oriented stub adapter that will relogin to the destination if a session-oriented exception is thrown. --------- Signed-off-by: Chen Jing <jingch@vmware.com> * [cluster launcher] [vSphere Provider] Support GPU Ray nodes on vSphere (#40616) This is for supporting passthrough the GPU on vSphere ESXi host into the Ray nodes. --------- Signed-off-by: Chen Jing <jingch@vmware.com> * [cluster launcher] [vSphere] Do not fetch runtime-info of vm from cached_nodes (#40655) Power-on-off status is runtime info of VM, should not fetch it from cached-nodes, which is probably dirty data. It should query by pyvmomi_sdk every time. Signed-off-by: Chen Hui <huchen@vmware.com> --------- Signed-off-by: Chen Jing <jingch@vmware.com> Signed-off-by: Chen Hui <huchen@vmware.com> Co-authored-by: Chen Jing <jingch@vmware.com> Co-authored-by: huchen2021 <85480625+huchen2021@users.noreply.github.com>
Description
Currently our code assumes that there is only one worker node type.
In this change I fix the bug to let it support multiple worker node types.
Test
Added UT assert to cover the new case.
Used a yaml snippet like this:
Verified that the workers are in the expected resource pool, the 2 different node types resources are expected:
Also the Ray cluster looks good from the dashboard: