New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED JENKINS-20272] Don't monitor response on offline agents #2911
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this approach. isOffline()
returns true
when the agent is temporaryOffline
:
return temporarilyOffline || getChannel()==null; |
I would argue that we need to run all monitorings against temporaryOffline
agents. Otherwise they will be never marked as offline by the system monitoring, which will probably lead to failures once somebody enables them. Hence 🐛
@@ -49,6 +48,9 @@ | |||
public static final AbstractNodeMonitorDescriptor<Data> DESCRIPTOR = new AbstractAsyncNodeMonitorDescriptor<Data>() { | |||
@Override | |||
protected Callable<Data,IOException> createCallable(Computer c) { | |||
if (c.isOffline()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just check that the channel != null and not closing or closed
Otherwise if the agent is taken offline due to the returned metric it cannot come back on-line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original JIRA suggests not wanting to do this for agents that are marked offline manually too - at one point, I had a check here to make sure the OfflineCause
wasn't ResponseTimeMonitor.Data
. Think that's reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking OfflineCause would be best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original JIRA suggests not wanting to do this for agents that are marked offline manually too
I wrote that and I just looked at it again, and there's nothing about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I misinterpreted. =)
@oleg-nenashev So...any suggestions? I am not objecting to your objections. =) I just don't know if there's any way to actually address this ticket that can satisfy your objections - if that's the case, fine, then this is a won't fix, I just want to be sure. |
@abayer I think the best way to solve the ticket is to add an optional flag in monitor settings (E.g. "Do not monitor temporary offline nodes"). It would retain the current (correct) behavior and give an option to opt-out. @daniel-beck WDYT? |
No. This option will make no sense to anyone. This is how the MS Word settings dialog (ca. 2003) became such a joke. We need to try to fight this impulse to make everything configurable. A disconnected node doesn't need to be checked. That's what JENKINS-20272 discusses, and what should be uncontroversial. The motivation for going further with nodes marked offline is entirely unclear to me. |
I agree with DB. We should just not monitor nodes if the connection is off-line. |
Ok, I'll switch to just not monitoring disconnected agents. |
@abayer I am not sure if it is 100% correct. It will set a monitoring error.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinged @abayer . I would rather block the PR till he responds. Too late for 2.60.1 anyway, but it still can get into 2.60.2
@oleg-nenashev Not sure I understand your previous comment? |
@abayer Well, I need to re-review it. The cache has been already invalidated |
heh. |
@abayer
So, the end result of the monitor will be "failed to monitor", not "no need to monitor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if I disagree with the current change, I admit it's better than the original behavior.
So I approve it in order to get it in .3 after testing in the community
I will handle the fallout if it happens |
Description
See JENKINS-20272.
Details: Don't monitor response time on offline agents. We know they're offline. We don't need to check.
Changelog entries
Proposed changelog entries:
Submitter checklist
Desired reviewers
@reviewbybees @oleg-nenashev @daniel-beck