New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-57304] - Add call to oneOffExecutors.remove in Computer.removeExecutor #4329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also make sense to check the executor type before calling the method? Just for performance reasons, no strong opinion
AFAICT https://issues.jenkins-ci.org/browse/JENKINS-57304 is going to be fixed by this change, at least partially. |
I plan to merge it tomorrow if no negative feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may solve symptoms, but the fix does not look quite right. oneOffExecutors
is adjusted in startFlyWeightTask
and remove(OneOffExecutor)
, the latter of which is called from Executor.finish2
whenever this instanceof OneOffExecutor
. Perhaps finish2
is not being called reliably—there are other places in interrupt
and run
which bypass this—which could be breaking other things as well. I suspect
jenkins/core/src/main/java/hudson/model/Executor.java
Lines 320 to 325 in a8853c2
if (!owner.isOnline()) { | |
resetWorkUnit("went off-line before the task's worker thread started"); | |
owner.removeExecutor(this); | |
queue.scheduleMaintenance(); | |
return; | |
} |
The steps to reproduce are valuable and should make it possible to write a test reproducing the problem and verifying the fix.
(There is probably a more basic question here as to why it is even possible for the master node to be marked offline to begin with. I think some node monitors treat things like a lack of disk space on the master as a reason to mark it “offline”, which seems like an abuse of the API since controlling, say, a Pipeline build is just one of dozens of things the master may be doing which may require disk I/O. Really it should be up to some lower service wrapper layer to decide when Jenkins is out of hardware resources and should simply be shut down.)
@jglick
? jenkins/core/src/main/java/hudson/model/Executor.java Lines 220 to 224 in a8853c2
|
I agree that a test to reproduce the problem would be needed |
Something like that I guess—it should be safer and clearer to ensure that |
To be clear, I am not trying to block this PR, just raising some questions which might be used to improve it a bit. |
I already strengthened the queue and I was looking into |
@fcojfernandez has confirmed that the issue is solved with this PR. I already strengthened the finally2 method in the mentioned PR (#4346) and @jglick is not blocking the PR actually. The issue is becoming critical to some CloudBees customers and the test could be hard to do, the PR already has 3 approvals (+ 1 mine), could you @oleg-nenashev, please, go forward and merge it? Thank you. |
Given existing approvals, and Ramon's summary, I think we can move forward. This change seems to improve the situation. Given Jesse's comments are also said to be more "to improve it a bit.", I think we can as needed always file followups to strengthen things even more, if deemed necesary. I plan to merge this some time later today if nobody objects. Thank you everyone for the reviews! |
I agree with merging |
The followup: https://issues.jenkins-ci.org/browse/JENKINS-60348 |
Thank you everyone! Especially @lgoenner ❤️ |
[JENKINS-57304] - Add call to oneOffExecutors.remove in Computer.removeExecutor (cherry picked from commit cb424f3)
This PR attempts to stop "zombie executors" from appearing in the status list.
There are several issues describing this problem, for instance JENKINS-55484, JENKINS-55811, JENKINS-57304. I am not certain that this PR fixes all of these issues.
It is possible to reliably create these zombie executors:
The issue is the one-off executor not being removed in
jenkins/core/src/main/java/hudson/model/Executor.java
Line 322 in a8853c2
Proposed changelog entries