-
Notifications
You must be signed in to change notification settings - Fork 647
[RayJob] Validate RayJob spec #1813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayJob] Validate RayJob spec #1813
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me, but why do we want this to be invalid?
|
|
I read that but I still don't understand, I think I'm missing some context. Why doesn't Kueue accept jobs with shutDownAfterFinishes = false, and secondly, why should this be enforced by KubeRay and not just by Kueue? |
|
Got it.
In my understanding, if users set
This is a good question. Kueue can prevent the suspension of a RayJob with cc @andrewsykim @astefanutti for more context about Kueue |
a5eb5e8 to
1471c46
Compare
This is mainly because we release the quota for the job when the job finishes, but leaving the cluster would mean those resources are still used.
Kueue does have a validating webhook for RayJob and does enforce this: https://github.com/kubernetes-sigs/kueue/blob/main/pkg/controller/jobs/rayjob/rayjob_webhook.go#L89-L92 |
|
@architkulkarni Does the explanation make sense to you? Thanks! |
|
|
Why are these changes needed?
#1783 (review)
Related issue number
Checks