-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] TTL Delete RayJob CRD After Job Termination #1944
Comments
@kevin85421 In terms of the solution direction, what if we can specify the submitter Job template, rather than the Pod template? Then you can set the |
@anyscalesam @jjyao how was this completed? I am failing to see how it was fixed. |
@MortalHappiness will take this issue. |
@kevin85421 Thanks. |
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#1944 Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness @kevin85421 it does not seem like the implementation in #2225 allows automatic deletion of the submitter after, let's say 1 week, like the TTL field. This issue does mention TTL. Am I missing something? I'd like for my peers who create the jobs to be able to view them for some time, but to automatically clean up the job after a week. Is that possible with the current implementation? |
@mickvangelderen I believe the |
Thanks for the reply. If that is the case, then is it still possible to delete the cluster head and workers (to free up resources) immediately after the job finishes, and then have the submitter be deleted after a week? |
There was a discussion in this thread about whether we want to control this behavior separately: #2097 (comment) As of now, I think you either delete the whole RayJob or only the cluster. I think we're looking for user feedback on whether we should allow controlling the behavior separately (deleting the cluster and deleting the whole RayJob). Is this something you would find useful? |
I do think it would be useful to control the deletion of the cluster and the submitter separately. However, I might be missing other solutions that would work for me and my team, and so I will give some more detail about how we are using ray. We have a tool that allows spawning work on a cluster. That work can be performed by a native K8S pod, or if the user so desires by ray workers through a RayJob to leverage the distributed computing facilities ray offers. We do not use a persistent Ray cluster because the persistent cluster would sometimes end up in an unreliable state, possible due to our unreliable hardware. RayJobs have been working great. We want our users to be able to view the logs of their work for about a week. Any data that must be persistent is stored in an external system. After one week, we want to clean up all jobs to keep things tidy and free up space. For K8S jobs, we use the Similarly, we would also like to set the Hopefully this clarifies why we are interested in this functionality. If you see a better solution direction to accomplish our objectives, please let me know. |
Do you only care about the submitter being deleted or do you also care about the RayJob resource itself being cleaned up?
This will be possible in KubeRay v1.2: #2091
I think fundamentally what you need is better tooling to persist the Ray job logs. Once you have this you don't need to care about how long the cluster stays around when the job is deleted. Although I can see value in being able to read the logs directly with kubectl. |
I might be confused with the terminology. Looking at the RayJob quickstart I want the RayCluster to be deleted immediately after the job finishes and I would like the logs to be available for a week. I thought the logs were tied to something referred to as "the submitter". I'm not sure if this submitter is a Pod, a Job, a Ray Job (notice the space) or something else. |
@kevin85421 I remembered that you want to make the logging part to become structured logging such that logs can be read from external tools. Is that feature related to log persistence mentioned in this thread? |
Search before asking
Description
Currently KubeRay enables the cluster to auto terminate after completion, but there is no mechanism to auto delete the Ray Job instance (owner).
This is beneficial for auto k8s cluster clean up, and behaves similar to a K8s jobs TTL.
Use case
Delete the Ray Job alongside controlled via an additional flag.
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: