-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-59793] Avoid hanging jobs with faulty SubTasks #4346
Conversation
The failure in the new test without the fix in the code: https://ci.jenkins.io/blue/organizations/jenkins/Core%2Fjenkins/detail/PR-4346/1/tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all reviews, note this issue has been happening also in http://ci.jenkins.io
@MRamonLeon my doubt here is: why has the issue has started appearing now? The metrics plugin does not seem to have important changes related to this.
|
||
// We get stalled waiting the finalization of the job | ||
FreeStyleBuild build = future.get(5, TimeUnit.SECONDS); | ||
assertTrue("The message is printed in the log", logs.getRecords().stream().anyMatch(r -> r.getThrown().getMessage().equals(ERROR_MESSAGE))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is really necessary (reaching this point would be enough to assert the fix?) and you could save all the logger rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not needed if the code passes the get
. It's a reminiscence from a lot of tests I've been doing. I can remove it.
// When using SubTaskContributor (FailingSubTaskContributor) the build never ends | ||
@Test | ||
@Issue("JENKINS-59793") | ||
public void buildDoesntFinishWhenSubTaskFails() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void buildDoesntFinishWhenSubTaskFails() throws Exception { | |
public void buildFinishesWhenSubTaskFails() throws Exception { |
I think the name is appropriate before the fix but after the fix it should say what's actually asserting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed with the other change
assertThat("Build should be actually scheduled by Jenkins", future, notNullValue()); | ||
|
||
// We get stalled waiting the finalization of the job | ||
FreeStyleBuild build = future.get(5, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also check the build result is a failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to assert that, the purpose of the test is to guarantee we pass this point, so let's keep the minimum needed as agreed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions for even more robustness and perhaps covering another edge case.
Your change makes things better (hence approving) so if you would rather merge your change first to get an improvement out and then examine my proposal in a subsequent PR that's totally fine
Co-authored-by: Stephen Connolly <stephen.alan.connolly@gmail.com>
Added @stephenc's suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I had to roll back to the original fix because the test was failing. Jenkins was shutting down after an unexpected exception in the test. I couldn't figure out why it was happening, so better we let this PR get merged and we improve it in a followup PR if desired. |
I plan to merge it tomorrow if no negative feedback |
OK, moving on given no negative feedback since yesterday. Thank you eveyone! |
[JENKINS-59793] Avoid hanging jobs with faulty SubTasks (cherry picked from commit 5f81077)
See JENKINS-59793.
I have created a test to reproduce the status reached by several private Jenkins instances which have thousand of Metric Plugin threads (QueueSubTaskMetrics) running. It's all because if a Job has faulty SubTasks (throwing an unmanaged exception), the
future
object is not set in the WorkUnitContext#synchronizeEnd method. It may also happen in WorkUnitContext#synchronizeStart but I have not reproduced this case).Whatever code that keeps waiting for the start or the end of the build will stay there forever. The Metrics Plugin is impacted by this here:
Called from: https://github.com/jenkinsci/metrics-plugin/blob/24bf92ebd59095d8f77b2552696b3f210a024bdc/src/main/java/jenkins/metrics/impl/JenkinsMetricProviderImpl.java#L944
I will let the CI fail and then I will push the fix.
Proposed changelog entries
Submitter checklist
* Use the
Internal:
prefix if the change has no user-visible impact (API, test frameworks, etc.)Desired reviewers
@daniel-beck @varyvol @batmat @alecharp @rsandell @fcojfernandez @stephenc