-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rayjob controller: update EndTime to always be the time when the job deployment transitions to Complete status #1872
Conversation
cc @kevin85421 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR differs somewhat from what I had in mind. Let me provide more details below. Feel free to let me know if it is still unclear.
The EndTime
must be set when a RayJob transitions to Complete
. If it is not set, this line will dereference a nil pointer. Currently, there are two possible cases that make the RayJob transitions to Complete
:
- The JobStatus is terminal.
- The submitter K8s Job fails (code).
My thought is that (1) remove the assignment of EndTime
in the above two cases (2) only set EndTime
when the RayJob transitions to Complete
(code).
@kevin85421 makes sense, I updated the implementation. PTAL |
Eventually( | ||
getRayJobDeploymentStatus(ctx, myRayJob), | ||
time.Second*15, time.Millisecond*500).Should(Equal(rayv1.JobDeploymentStatusComplete), "jobDeploymentStatus = %v", myRayJob.Status.JobDeploymentStatus) | ||
Expect(myRayJob.Status.EndTime.Time).Should(BeTemporally(">", now), "EndTime = %v, Now = %v", myRayJob.Status.EndTime, now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use something like myRayJob.Status.EndTime.After(now)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I agree this is cleaner than the "BeTemporally" checks
@@ -95,7 +95,8 @@ type RayJobStatus struct { | |||
// It is not guaranteed to be set in happens-before order across separate operations. | |||
// It is represented in RFC3339 form | |||
StartTime *metav1.Time `json:"startTime,omitempty"` | |||
// Represents time when the job was ended. | |||
// EndTime is the time when status.jobDeploymentStatus transitioned to 'Complete' status. | |||
// This occurs when the Ray job reaches a terminal state (SUCCEEDED, FAILED, STOPPED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two possibilities to transition to Complete at this moment. See #1872 (review) for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two possibilities to transition to Complete at this moment
Right, but both possibilities still result in the Ray job reaching a terminal state right? Unless I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. The second condition when submitter job fails, is irrelevant to the Ray job status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comments to also include when submitter job fails
…deployment transitions to Complete status Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Fixes #1849 |
…deployment transitions to Complete status (ray-project#1872)
Why are these changes needed?
See #1849
Related issue number
Checks