-
Notifications
You must be signed in to change notification settings - Fork 216
Owls 91212 - Merge fixes for the introspector retry behavior after the job times out and to capture WDT logs. #2613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e343010
4ba0b7a
eaf7651
2b4f1c6
14be2e6
47826d0
993dc36
87afabe
bd39be8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -137,7 +137,11 @@ public static boolean isComplete(V1Job job) { | |
| return false; | ||
| } | ||
|
|
||
| static boolean isFailed(V1Job job) { | ||
| /** | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Test if" is never a good Javadoc comment. This method is better described as "Returns true if the specified job has a failed status or condition." Then you don't need the @return. Also, it is bad style to describe a parameter simply by repeating its name. I usually say something like, "the job to be tested."
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in eaf7651. |
||
| * Returns true if the specified job has a failed status or condition. | ||
| * @param job job to be tested | ||
| */ | ||
| public static boolean isFailed(V1Job job) { | ||
| if (job == null) { | ||
| return false; | ||
| } | ||
|
|
@@ -173,8 +177,12 @@ private static String getStatus(V1JobCondition jobCondition) { | |
| return Optional.ofNullable(jobCondition).map(V1JobCondition::getStatus).orElse(""); | ||
| } | ||
|
|
||
|
|
||
| static String getFailedReason(V1Job job) { | ||
| /** | ||
| * Get the reason for job failure. | ||
| * @param job job | ||
| * @return Job failure reason. | ||
| */ | ||
| public static String getFailedReason(V1Job job) { | ||
| V1JobStatus status = job.getStatus(); | ||
| if (status != null && status.getConditions() != null) { | ||
| for (V1JobCondition cond : status.getConditions()) { | ||
|
|
@@ -298,7 +306,7 @@ void updatePacket(Packet packet, V1Job job) { | |
| // be available for reading | ||
| @Override | ||
| boolean shouldTerminateFiber(V1Job job) { | ||
| return isFailed(job) && ("DeadlineExceeded".equals(getFailedReason(job))); | ||
| return isJobTimedOut(job); | ||
| } | ||
|
|
||
| // create an exception to terminate the fiber | ||
|
|
@@ -328,6 +336,10 @@ public NextAction onSuccess(Packet packet, CallResponse<V1Job> callResponse) { | |
| } | ||
| } | ||
|
|
||
| public static boolean isJobTimedOut(V1Job job) { | ||
| return isFailed(job) && ("DeadlineExceeded".equals(getFailedReason(job))); | ||
| } | ||
|
|
||
| static class DeadlineExceededException extends Exception { | ||
| final V1Job job; | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main already has functionality for resetting the failure count. Why discard it in favor of the approach used in 3.3.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is resetting the failure count in the domain status. I believe this approach is same in latest 3.3.3 release. Previously we had 2 failure/retry counts (1) in the DomainStatus and (2) in-memory count in DomainPresenceInfo. After the fix for OWLS-90180, we can no longer rely on the in-memory state of the operator as the introspector job might have been created before the operator started. I have removed the in-memory retry count in DomainPresenceInfo and made changes to use the failure count in domain status instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear the reasoning; I'm just surprised by the size of the change needed to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I have used the same approach in PR 2580 which is merged into latest 3.3.3 release. Please let me know if you have other suggestions for this approach. Thanks.