Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayJob: don't delete submitter job when ShutdownAfterJobFinishes=true #1881

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

andrewsykim
Copy link
Contributor

Why are these changes needed?

When a RayJob is configured with ShutdownAfterJobFinishes=true, Kuberay will immediately delete the associated RayCluster and submitter job once the job completes. Users can tune how long the cluster and submitter job stays around with TTLSecondsAfterFinished. Note that the RayJob resource itself is never deleted automatically, this always needs external action to be removed.

Even without setting TTLSecondsAfterFinished, there is very little reason to delete the submitter job. The submitter job usually contains the most useful logs for the job and is not using any cluster resources once completed. In addition, the submitter job will always be cleaned up when the RayJob is eventually deleted due it's owner reference back to the RayJob. This PR proposes to never delete the submitter job once a job completes when ShutdownAfterJobFinishes=true. The only exception is for suspended RayJob where the submitter job needs to be stopped.

Related issue number

Closes #1832

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@andrewsykim
Copy link
Contributor Author

andrewsykim commented Jan 29, 2024

@kevin85421 thoughts? This is an alternative approach to #1832 which I think is much cleaner and practical for most users.

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
@kevin85421
Copy link
Member

I prefer to continue deleting the submitter's Job to free up computing resources. For RayJob, I expect users to submit and then not have to worry about it further. The TTLSecondsAfterFinished is utilized to ensure that logging tools have sufficient time to persist the logs. I don't expect users to need to read the logs in the Kubernetes Job.

  1. A scenario where users need to access the Kubernetes Job arises when the Ray job fails, and they need to troubleshoot. However, in most cases, the driver log isn't very useful for troubleshooting once the RayCluster is no longer present.
  2. If users wish to use RayJob for iterative Ray script development (submit, debug, fix, submit again), they should set ShutdownAfterJobFinishes to false. In addition, we don't encourage users to use RayJob for iterative development.

@anyscalesam anyscalesam added the enhancement New feature or request label Jan 29, 2024
@andrewsykim
Copy link
Contributor Author

I agree that there will definitely be many cases where you need the entire cluster to stay around.

I don't expect users to need to read the logs in the Kubernetes Job.

This part I'm not sure about, it depends on the Ray job right?

My main point is that a Kubernetes Job that is "Completed" does not actually use any compute resources. The node frees up requested resources for the submitter job once the job completes anyways. If the submitter job is not using any resources when completed, why delete it if it could contain useful logs?

@andrewsykim
Copy link
Contributor Author

Here's an example using a local Kind cluster with one node:

$ kubectl get no  
NAME                 STATUS   ROLES           AGE   VERSION
kind-control-plane   Ready    control-plane   17m   v1.24.0

While the job runs, you can see the requested resources on the node:

  Namespace                   Name                                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                       ------------  ----------  ---------------  -------------  ---
  default                     kuberay-operator-5987588ffc-tz46f                          100m (0%)     100m (0%)   512Mi (0%)       512Mi (0%)     16m
  default                     rayjob-sample-5tzjl                                        500m (0%)     1 (0%)      200Mi (0%)       1Gi (0%)       8s
  default                     rayjob-sample-raycluster-5dl9l-head-bj84k                  20 (15%)      0 (0%)      0 (0%)           0 (0%)         43s
  default                     rayjob-sample-raycluster-5dl9l-worker-small-group-76brh    200m (0%)     1 (0%)      256Mi (0%)       256Mi (0%)     43s
  kube-system                 coredns-6d4b75cb6d-g4bgd                                   100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     16m
  kube-system                 coredns-6d4b75cb6d-xgkg8                                   100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     16m
  kube-system                 etcd-kind-control-plane                                    100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         16m
  kube-system                 kindnet-sb2vz                                              100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      16m
  kube-system                 kube-apiserver-kind-control-plane                          250m (0%)     0 (0%)      0 (0%)           0 (0%)         16m
  kube-system                 kube-controller-manager-kind-control-plane                 200m (0%)     0 (0%)      0 (0%)           0 (0%)         16m
  kube-system                 kube-proxy-r8w7n                                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         16m
  kube-system                 kube-scheduler-kind-control-plane                          100m (0%)     0 (0%)      0 (0%)           0 (0%)         16m
  local-path-storage          local-path-provisioner-9cd9bd544-d772w                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         16m

With ShutdownAfterJobFinishes=true, the submitter job is left in "Completed" state and the RayCluster is deleted:

$ kubectl get po 
NAME                                READY   STATUS      RESTARTS   AGE
kuberay-operator-5987588ffc-tz46f   1/1     Running     0          18m
rayjob-sample-5tzjl                 0/1     Completed   0          85s

And checking the node again, the submitter job is not taking any resources:

  Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  default                     kuberay-operator-5987588ffc-tz46f             100m (0%)     100m (0%)   512Mi (0%)       512Mi (0%)     18m
  kube-system                 coredns-6d4b75cb6d-g4bgd                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     18m
  kube-system                 coredns-6d4b75cb6d-xgkg8                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     18m
  kube-system                 etcd-kind-control-plane                       100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         18m
  kube-system                 kindnet-sb2vz                                 100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      18m
  kube-system                 kube-apiserver-kind-control-plane             250m (0%)     0 (0%)      0 (0%)           0 (0%)         18m
  kube-system                 kube-controller-manager-kind-control-plane    200m (0%)     0 (0%)      0 (0%)           0 (0%)         18m
  kube-system                 kube-proxy-r8w7n                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         18m
  kube-system                 kube-scheduler-kind-control-plane             100m (0%)     0 (0%)      0 (0%)           0 (0%)         18m
  local-path-storage          local-path-provisioner-9cd9bd544-d772w        0 (0%)        0 (0%)      0 (0%)           0 (0%)         18m

@kevin85421
Copy link
Member

My main point is that a Kubernetes Job that is "Completed" does not actually use any compute resources. The node frees up requested resources for the submitter job once the job completes anyways. If the submitter job is not using any resources when completed, why delete it if it could contain useful logs?

Good point. I didn't realize that when a job enters 'Completed,' it doesn't use any compute resources. In that case, this PR makes sense to me.

@@ -289,6 +292,22 @@ var _ = Context("Inside the default namespace", func() {
time.Second*15, time.Millisecond*500).Should(Equal(rayv1.JobDeploymentStatusComplete), "jobDeploymentStatus = %v", myRayJob.Status.JobDeploymentStatus)
Expect(myRayJob.Status.EndTime.After(now)).Should(BeTrue(), "EndTime = %v, Now = %v", myRayJob.Status.EndTime, now)
})

It("job completed with ShutdownAfterJobFinishes=true, RayCluster should be deleted but not the submitter Job", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to split the test logic in the future instead of adding tests in a single Context block. All tests coupling together make us hard to maintain.

@kevin85421 kevin85421 merged commit acafbfe into ray-project:master Jan 30, 2024
23 checks passed
ryanaoleary pushed a commit to ryanaoleary/kuberay that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Allow TTL configuration of RayJob submitter job
3 participants