-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update kuberay mcad integration doc #1373
Conversation
This PR should be review together with the PR. |
@@ -5,7 +5,7 @@ The multi-cluster-app-dispatcher is a Kubernetes controller providing mechanisms | |||
## Use case | |||
|
|||
MCAD allows you to deploy Ray cluster with a guarantee that sufficient resources are available in the cluster prior to actual pod creation in the Kubernetes cluster. It supports features such as: | |||
|
|||
- Integrates with upstream Kubernetes scheduling stack for features such co-scheduling, Packing on GPU dimension etc. | |||
- Ability to wrap any Kubernetes objects. | |||
- Increases control plane stability by JIT (Just-in Time) object creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, would you mind explaining this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asm582 may explain this JIT better.
My understanding is mcad creating object only when there is enough resource. In other words, there will not be any pending pods so the control plane stability is improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other thought would be, Mcad when paired the InstaScale operator can auto scale out enough k8s worker nodes to run a job and scales down afterwards; This kinda fits the Just in time concept of allocating right amount resources to a workstation on an assembly line to complete a task in order to reduce total inventory and cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's about right! MCAD will not create the underlying Ray resources until there are enough resources to schedule the pod. If InstaScale is enabled, InstaScale will scale up new nodes on your cluster -> MCAD will schedule Ray resources onto new nodes -> Ray Cluster runes -> Once Ray Cluster context is finished and appwrapper is deleted, InstaScale deletes nodes. At no point will there be pending pods/services/routes etc. on your cluster
docs/guidance/kuberay-with-MCAD.md
Outdated
Events: <none> | ||
``` | ||
|
||
As seen the second Ray cluster is queued with no pending pods created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every user allocates different CPU/memory resources to their Kubernetes clusters. If a user possesses a high-end workstation, would the RayCluster still be queued?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No likely won't be queue. My cluster has only has 16 cpus with some overhead already allocated so the 2nd AW queued. If the high-end workstation has 16+ cpus, the 2nd AppWrapper may not go over total allocatable cpus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a clear example so users can consistently reproduce the expected behavior across all environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also explain when MCAD creates the RayCluster CR (e.g., 5 CPUs)? Without this information, it's difficult for users to understand how MCAD works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin85421 I have added more commits. Do you think these 2 comments have been addressed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@kevin85421 I need to put this PR on hold because mcad repo is under refactoring. The example yamls will not work any more. |
Co-authored-by: Anish Asthana <anishasthana1@gmail.com> Signed-off-by: ted chang <htchang@us.ibm.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com>
a846441
to
45e6dee
Compare
45e6dee
to
a9027df
Compare
Hi @tedhtchang, I tested these steps using the kind cluster on my Mac M1 laptop, and it looked good! I was able to install the items and test the appwrappers.
Nice work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@kevin85421 This doc should be good to merge. |
@tedhtchang could you also update README? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not retry the doc again after Sept, but some folks tried it and approved this PR.
9c5776d
to
32bee1f
Compare
* Update kuberay mcad integration doc * Update docs/guidance/kuberay-with-MCAD.md Co-authored-by: Anish Asthana <anishasthana1@gmail.com> Signed-off-by: ted chang <htchang@us.ibm.com> * Update docs/guidance/kuberay-with-MCAD.md Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com> * Update docs/guidance/kuberay-with-MCAD.md Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com> * Update docs/guidance/kuberay-with-MCAD.md Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ted chang <htchang@us.ibm.com> * address review comments * address more comments * update content * fix memory spelling * Update README --------- Signed-off-by: ted chang <htchang@us.ibm.com> Co-authored-by: Anish Asthana <anishasthana1@gmail.com> Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Why are these changes needed?
Closes #1327 Improve the kuberay macad integration doc
Related issue number
#1327
Checks