[FIXED JENKINS-20272] Don't monitor response on offline agents #2911

abayer · 2017-06-07T21:45:40Z

Description

See JENKINS-20272.

Details: Don't monitor response time on offline agents. We know they're offline. We don't need to check.

Changelog entries

Proposed changelog entries:

Entry 1: JENKINS-20272, Don't monitor response time on offline agents.

Submitter checklist

JIRA issue is well described
Link to JIRA ticket in description, if appropriate
Appropriate autotests or explanation to why this change has no tests
For new API and extension points: Link to the reference implementation in open-source (or example in Javadoc)

Desired reviewers

@reviewbybees @oleg-nenashev @daniel-beck

ghost · 2017-06-07T21:46:37Z

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

oleg-nenashev

I am not sure about this approach. isOffline() returns true when the agent is temporaryOffline:

jenkins/core/src/main/java/hudson/model/Computer.java

Line 636 in bd2ccfb

return temporarilyOffline || getChannel()==null;

. It means that the agent is online && just disabled due to whatever reason (should we finally change this confusing terminology?). In such case the channel stays active, and the agent may be reactivated at any time without preliminary checks.

I would argue that we need to run all monitorings against temporaryOffline agents. Otherwise they will be never marked as offline by the system monitoring, which will probably lead to failures once somebody enables them. Hence 🐛

stephenc · 2017-06-08T07:32:52Z

core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java

@@ -49,6 +48,9 @@
    public static final AbstractNodeMonitorDescriptor<Data> DESCRIPTOR = new AbstractAsyncNodeMonitorDescriptor<Data>() {
        @Override
        protected Callable<Data,IOException> createCallable(Computer c) {
+            if (c.isOffline()) {


Better to just check that the channel != null and not closing or closed

Otherwise if the agent is taken offline due to the returned metric it cannot come back on-line

The original JIRA suggests not wanting to do this for agents that are marked offline manually too - at one point, I had a check here to make sure the OfflineCause wasn't ResponseTimeMonitor.Data. Think that's reasonable?

I think checking OfflineCause would be best

The original JIRA suggests not wanting to do this for agents that are marked offline manually too

I wrote that and I just looked at it again, and there's nothing about this.

Then I misinterpreted. =)

abayer · 2017-06-14T19:41:25Z

@oleg-nenashev So...any suggestions? I am not objecting to your objections. =) I just don't know if there's any way to actually address this ticket that can satisfy your objections - if that's the case, fine, then this is a won't fix, I just want to be sure.

oleg-nenashev · 2017-06-16T13:29:41Z

@abayer I think the best way to solve the ticket is to add an optional flag in monitor settings (E.g. "Do not monitor temporary offline nodes"). It would retain the current (correct) behavior and give an option to opt-out. @daniel-beck WDYT?

daniel-beck · 2017-06-17T00:41:22Z

optional flag in monitor settings

No. This option will make no sense to anyone. This is how the MS Word settings dialog (ca. 2003) became such a joke. We need to try to fight this impulse to make everything configurable.

A disconnected node doesn't need to be checked. That's what JENKINS-20272 discusses, and what should be uncontroversial. The motivation for going further with nodes marked offline is entirely unclear to me.

stephenc · 2017-06-17T06:35:17Z

I agree with DB. We should just not monitor nodes if the connection is off-line.

abayer · 2017-06-19T15:31:50Z

Ok, I'll switch to just not monitoring disconnected agents.

oleg-nenashev · 2017-06-19T17:20:11Z

@abayer I am not sure if it is 100% correct. It will set a monitoring error.

if (d ==null) {
                     // if we failed to monitor, put in the special value that indicates a failure
                     e.setValue(d=new Data(get(c),-1L));
}

oleg-nenashev

Pinged @abayer . I would rather block the PR till he responds. Too late for 2.60.1 anyway, but it still can get into 2.60.2

abayer · 2017-06-29T15:52:27Z

@oleg-nenashev Not sure I understand your previous comment?

oleg-nenashev · 2017-07-03T06:59:48Z

@abayer Well, I need to re-review it. The cache has been already invalidated

abayer · 2017-07-03T15:46:44Z

heh.

oleg-nenashev · 2017-07-29T09:01:57Z

@abayer
OK, got to it. You return null, but null is not "fine". This line handles null as a monitoring failure:

jenkins/core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java

Line 63 in 08def67

e.setValue(d=new Data(get(c),-1L));

.

So, the end result of the monitor will be "failed to monitor", not "no need to monitor"

oleg-nenashev

Even if I disagree with the current change, I admit it's better than the original behavior.

So I approve it in order to get it in .3 after testing in the community

oleg-nenashev · 2017-07-30T12:43:25Z

I will handle the fallout if it happens

[FIXED JENKINS-20272] Don't monitor response on offline agents

e6ce0fb

oleg-nenashev requested changes Jun 8, 2017

View reviewed changes

stephenc suggested changes Jun 8, 2017

View reviewed changes

Updating to only not check if channel is null.

7bc0ffd

stephenc approved these changes Jun 19, 2017

View reviewed changes

daniel-beck approved these changes Jun 19, 2017

View reviewed changes

Fix broken test.

dbc91d5

oleg-nenashev added the on-hold This pull request depends on another event/release, and it cannot be merged right now label Jun 23, 2017

oleg-nenashev requested changes Jun 23, 2017

View reviewed changes

oleg-nenashev approved these changes Jul 29, 2017

View reviewed changes

oleg-nenashev added ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback and removed on-hold This pull request depends on another event/release, and it cannot be merged right now labels Jul 29, 2017

oleg-nenashev merged commit 5f125d1 into jenkinsci:master Jul 30, 2017

olivergondza mentioned this pull request May 22, 2018

[JENKINS-20272] - Disconnected nodes should not be disconnected repeatedly #3453

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIXED JENKINS-20272] Don't monitor response on offline agents #2911

[FIXED JENKINS-20272] Don't monitor response on offline agents #2911

abayer commented Jun 7, 2017

ghost commented Jun 7, 2017

oleg-nenashev left a comment •

edited

stephenc Jun 8, 2017

abayer Jun 9, 2017

stephenc Jun 9, 2017

daniel-beck Jun 15, 2017

abayer Jun 15, 2017

abayer commented Jun 14, 2017

oleg-nenashev commented Jun 16, 2017

daniel-beck commented Jun 17, 2017 •

edited

stephenc commented Jun 17, 2017

abayer commented Jun 19, 2017

oleg-nenashev commented Jun 19, 2017

oleg-nenashev left a comment

abayer commented Jun 29, 2017

oleg-nenashev commented Jul 3, 2017

abayer commented Jul 3, 2017

oleg-nenashev commented Jul 29, 2017

oleg-nenashev left a comment

oleg-nenashev commented Jul 30, 2017

[FIXED JENKINS-20272] Don't monitor response on offline agents #2911

[FIXED JENKINS-20272] Don't monitor response on offline agents #2911

Conversation

abayer commented Jun 7, 2017

Description

Changelog entries

Submitter checklist

Desired reviewers

ghost commented Jun 7, 2017

oleg-nenashev left a comment • edited

Choose a reason for hiding this comment

stephenc Jun 8, 2017

Choose a reason for hiding this comment

abayer Jun 9, 2017

Choose a reason for hiding this comment

stephenc Jun 9, 2017

Choose a reason for hiding this comment

daniel-beck Jun 15, 2017

Choose a reason for hiding this comment

abayer Jun 15, 2017

Choose a reason for hiding this comment

abayer commented Jun 14, 2017

oleg-nenashev commented Jun 16, 2017

daniel-beck commented Jun 17, 2017 • edited

stephenc commented Jun 17, 2017

abayer commented Jun 19, 2017

oleg-nenashev commented Jun 19, 2017

oleg-nenashev left a comment

Choose a reason for hiding this comment

abayer commented Jun 29, 2017

oleg-nenashev commented Jul 3, 2017

abayer commented Jul 3, 2017

oleg-nenashev commented Jul 29, 2017

oleg-nenashev left a comment

Choose a reason for hiding this comment

oleg-nenashev commented Jul 30, 2017

oleg-nenashev left a comment •

edited

daniel-beck commented Jun 17, 2017 •

edited