Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX JENKINS-41763] Show a column with the Java version of an agent #1

Merged
merged 8 commits into from May 29, 2017

Conversation

@batmat
Copy link
Member

batmat commented Feb 26, 2017

  • Updated to parent pom 2.x, and core 1.625.3 (I could probably go a bit lower, but didn't want to possibly fight too much with older dependencies and so on)
  • Added a JVM Version Monitor

Adds a "JVM Version column" :
image

And options to :

  • force an exact match
    • default is to consider only 1.8 in 1.8.112 or whatever, i.e. only first two groups of digits ; and
  • ask for no disconnection of agents with mismatching JVM
    • default is to disconnect though, more sensible IMO

image

@jenkinsci/code-reviewers

Copy link
Member

oleg-nenashev left a comment

🐛 for the wrong help and the "Agent version"

@@ -1,3 +1,6 @@
VersionMonitor.DisplayName=Version
VersionMonitor.DisplayName=Agent Version

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

"Remoting version" 🐛 . Just because agents may have different versioning (e.g. in Swarm plugin)

@@ -0,0 +1,7 @@
<div>
<p>Check this checkbox if you prefer to have a very non-permissive environment with regard to JVM Versions.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

🐛 copy-pasta, from another help box

<f:entry title="${%Exact Match for JVM Version}" field="exactMatch">
<f:checkbox />
</f:entry>
<f:entry title="${%Do NOT disconnect agent when incompatibility is found}" field="noDisconnect">

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

IMHO disconnect should be opt-in since having different Java version is a valid case in Jenkins

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

I disagree Oleg, it is just not worth it. At least I introduced a way to opt-out, which is not present yet for remoting. This is just too risky and plugins developer can easily make this fail IMO, and I've seen that exact kind of stack trace in production environment when people were mixing a JDK 8 on Master, and a JDK 7 on the agent. Cryptic java.lang.ClassFormatError for end-user.
Just putting a JDK 8 on the agent too fixed it for good.

I just tried with a dead simple example. From a Master running JDK8, to an agent running JDK7: I sent a Callable like this:

private static class SomeCallable extends MasterToSlaveCallable<String, IOException> {
    @Override
    public String call() throws IOException {
        return java.time.LocalDateTime.now().toString();
    }
}
java.util.concurrent.ExecutionException: java.lang.ClassFormatError: Failed to load hudson.plugin.versioncolumn.DemoNodeMonitor$SomeCallable
        at hudson.remoting.Channel$2.adapt(Channel.java:813)
        at hudson.remoting.Channel$2.adapt(Channel.java:808)
        at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
        at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:97)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
Caused by: java.lang.ClassFormatError: Failed to load hudson.plugin.versioncolumn.DemoNodeMonitor$SomeCallable
        at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:340)
        at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:274)
        at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at hudson.remoting.UserRequest.deserialize(UserRequest.java:184)
        at hudson.remoting.UserRequest.perform(UserRequest.java:98)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Thread.java:745)
        at ......remote call to agent(Native Method)
        at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
        at hudson.remoting.UserResponse.retrieve(UserRequest.java:220)
        at hudson.remoting.Channel$2.adapt(Channel.java:811)
        ... 4 more
Caused by: java.lang.UnsupportedClassVersionError: hudson/plugin/versioncolumn/DemoNodeMonitor$SomeCallable : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
        at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:338)
        at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:274)
        at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at hudson.remoting.UserRequest.deserialize(UserRequest.java:184)
        at hudson.remoting.UserRequest.perform(UserRequest.java:98)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at hudson.remoting.Engine$1$1.run(Engine.java:62)
        at java.lang.Thread.run(Thread.java:745)
import java.io.IOException;
import java.util.logging.Logger;

public class JVMVersionMonitor extends NodeMonitor {

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

IMHO we should create a better ValueNodeMonitor API so simplify the logic. Not in this PR, but creating NodeMonitors requires too much efforts

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

Having written also https://github.com/batmat/inodes-monitor-plugin/ in the past, I can only agree! This should be easier.

@@ -1,3 +1,4 @@
<?jelly escape-by-default='true'?>
<div>
This plugin shows the version of slaves column on "Manage Nodes" page.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

This description needs to be updated a bit 🐛

@@ -4,7 +4,7 @@
<parent>
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>plugin</artifactId>
<version>1.401</version>
<version>2.21</version>
</parent>

<artifactId>versioncolumn</artifactId>

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 27, 2017

Member

The name below should be updated IMHO. The original one was not good, with newer functionality it's getting worse

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

NIT: you should comment on the line above, not below. Because if not Github shows the line above, not below making reviewing reviews not very easy 😋

Going to try something, any great idea/proposal?

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

@oleg-nenashev what do you have in mind? I was about to use like "Jenkins Versions Node Monitors Plugin", but then I'm wondering if as this plugin is very old this might not confuse people.
Well at the same time, long time Jenkins will understand at some point, and it will clarify possibly for newcomers.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Feb 28, 2017

Member

Or maybe just System Node Monitors Plugin, Advanced Node Monitors Plugin or whatever. The description should help with details

@oleg-nenashev
Copy link
Member

oleg-nenashev commented Feb 27, 2017

Also CC @varmenise since she was working on similar things in the core

return Util.wrapToErrorSpan(version);
}
return (version == null) ? "" : version;
}

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

Note to self, actually not used anymore, even from jelly. To be removed.

</parent>

<artifactId>versioncolumn</artifactId>
<packaging>hpi</packaging>
<version>0.3-SNAPSHOT</version>
<name>Jenkins Version Column plugin</name>
<version>2.0-SNAPSHOT</version>

This comment has been minimized.

Copy link
@batmat

batmat Feb 27, 2017

Author Member

bumped the major version to clarify/express the core requirement bump and new features (though 100% compatible)

@batmat
Copy link
Member Author

batmat commented Feb 27, 2017

@oleg-nenashev I think I addressed your comments.

@cobexer
Copy link

cobexer commented Feb 28, 2017

@batmat Could you please NOT scream at users and switch the option around?
"Do NOT disconnect agent when incompatibility is found"
=>
"Dsconnect agents when incompatibility is found" and default it to checked instead of not checked, same for the help text.

Other than that:
very useful in making sure all nodes are compatible! I already tried it and it helped a lot!

@oleg-nenashev
Copy link
Member

oleg-nenashev commented Feb 28, 2017

Even after #1 (comment) , I also still think the behavior should be opt-in.

I do not buy the example regarding LocalDateTime, because this class exists starting from Java 8. Hence of course it is expected to fail. In Jenkins we expect plugin maintainers to retain compatibility with Java 7 tough they can create a Java 8-only plugin if they want. But IMHO this is not a reason to force admins to have this monitor && to cause a fallout in their setup once they enable it occasionally without clicking the checkbox.

@jglick
Copy link
Member

jglick commented Feb 28, 2017

Agreed with @oleg-nenashev. Agents must run a JVM at least as new as the minimum version supported for the current version of master—currently, that means Java 7. There is nothing unsupported about running the master on 8 and the agent on 7 (though of course we encourage 8 generally).

@jglick
Copy link
Member

jglick commented Feb 28, 2017

plugins developer can easily make this fail

Really? I do not think so. If you are using the 2.x parent POM and do not override java.level to be 8, your build will fail if you accidentally use an 8+ API: NoClassDefFoundError. (If you do override java.level to be 8, you will indeed get ClassFormatError, because nothing your plugin does can run on that agent even if it using 7- APIs. But that is a long-filed core RFE to allow a plugin to declare its minimum Java version.)

@batmat
Copy link
Member Author

batmat commented Mar 3, 2017

@jglick @oleg-nenashev right, I get your point. I actually especially developed that addition with the upcoming Java 8 core baseline bump in mind (now ~1 month away for first weekly), hence having it far more common over time to have the master with Java 8 JVM and at the Java 8 bytecode level. Maybe I could try to read some core class in the plugin to see if I'm really running a JDK8+ core, or not, and try to compare it with agent's. Hence why I was saying this would be "easy to make it fail": because with the JDK8 bump (and some plugins didn't wait for it as you know) this is going to become more and more common, hence causing issues.

So, even if I understand this should be OK, I'm not sure what we earn here choosing a permissive behaviour.

I am fine with making it permissive by default, though, people will be able to make it aggressive/non permissive if they want to. I'm just still unconvinced that having a non-permissive default wouldn't be simpler and less error-prone for the vast majority of users.

@batmat
Copy link
Member Author

batmat commented Mar 4, 2017

Giving this more thoughts, then I think it's indeed important to still enable by default what you're summarizing @jglick: disconnect agents when they're running a JVM whose level is lower than the bytecode level of the Master.

It will for instance disconnect agents still running Java 6 from Masters running 1.65x or 2.3x. Or will disconnect agents running Java 7 from upcoming upgraded core on Java 8, which was in the end my main goal.

Summarized/clarified in https://gist.github.com/batmat/f43ad859d5f8e93c00dd818444255d5e

WDYT @jglick @oleg-nenashev?

cc @daniel-beck as I remember we discussed versioncolumn plugin some time ago.

@oleg-nenashev
Copy link
Member

oleg-nenashev commented Mar 4, 2017

@batmat I'm still 👎 about disconnecting by default unless the version is KNOWN to be unsupported. So disconnecting Java 6 agent from the Jenkins 1.625 is fine, but there is no reason to disconnect Java 7 masters from the 2.46 server running on Java 8. Just because such configuration is formally supported and generally fine.

@MarkEWaite
Copy link
Contributor

MarkEWaite commented Mar 4, 2017

I'm more in favor of "fail fast" instead of having surprising bugs (that won't be fixed) like JENKINS-33907. That bug showed an unsupported class version exception but only when trying to use a wildcard in a branch name on a job running on an agent. However, I'm also ok with leaving it to not reject more aggressively just in case there are desirable cases where it works now and we don't mind that it continues working.

There are plenty of places in the git plugin where I feel like we have "happy accidents" that have been preserved as behaviors whether or not we like them. Many users seem to inadvertently depend on those "happy accidents" (or "undetected bugs that we can't ever fix", depending on my mood).

@daniel-beck
Copy link
Member

daniel-beck commented Mar 4, 2017

The plugin could make disconnecting major version differences the default for new installs, but upgrade to not disconnect. This way it won't "break" Jenkins on upgrades, but new instances will get safe behavior.

@batmat
Copy link
Member Author

batmat commented Mar 5, 2017

@oleg-nenashev I've been re-reading your message 4 times, and I think this is actually what I was proposing in my previous comment, and summarized hopefully more clearly in https://gist.github.com/batmat/f43ad859d5f8e93c00dd818444255d5e

no reason to disconnect Java 7 masters from the 2.46 server running on Java 8.

Did I say that somewhere (it's very possible I left some bad copy paste)? My example was with a future 2.60 Master, that an agent running Java 7 would indeed be disconnected from.


However, my initial take was: what do we seriously earn by being permissive? What do we actually lose being non-permissive?

IMO just more weird issues like @MarkEWaite linked, that many people suffered from, even @olivergondza commented there, and I suppose Oliver cannot be considered a Jenkins beginner.

I believe we may be favoring here technical aspects: "do not disconnect because that should always work" instead of saying "well, OK, this may be a bit paranoid but we'd be far more stable for no obvious cons".

I did see ClassFormatError on production's instances from customers. And they absolutely didn't know what to do. When I told them to upgrade their agent's JVM version, they didn't even complain, just did it, and this fixed their outage and they were happy.

You may be thinking: well the weird issue linked by Mark is very special, because the JGit plugin is probably trying to be smart and detect the JVM it's running in, then makes the whole thing fail. Right. But 1) the git plugin is installed 107k times as of February 2017, i.e. 80% of the Jenkins instances out there and 2) what would prevents any Java library to do the same magic, any time? This is not even in the plugin's code, but in a dependency of it, hence harder to fix.

About disconnection being an issue: I don't buy it. This would happen very early during the startup of the Master (or the connection of the agent when creating it), hence would probably not even fail one single build. And clicking on the disconnect node would immediately tell you why it was disconnected (and I could enrich the message & docs to make possible options to re-enable even clearer).
Having been a Jenkins admin in the past, I loved when Jenkins would take an agent offline because of low disk space. Even if sometimes some builds could probably have been able to go there for a while. IMO this is the same here: this would be valuable to just be non permissive, in favor of stability, even if sometimes this shouldn't be an issue.

Also: Java 8 core upgrade is coming. Java 7 has been there for years, I don't think we generate the stats about agents' JVM but my guess is that users running Java 6 for their agents are getting very rare.
When the core will have been upgraded to Java 8, and this will probably far more common to have Java 7 agents out there, then I'm afraid this issue may become more common (I hope to be proved wrong).

@oleg-nenashev
Copy link
Member

oleg-nenashev commented Mar 7, 2017

@batmat Maybe I misread the table.
If the second row is "no", then I'm fine with the approach. If the version is KNOWN to be unsupported, then it's fine to disconnect the agent by default.

@batmat batmat changed the title [FIX JENKINS-41763] Show a column with the Java version of an agent WIP [FIX JENKINS-41763] Show a column with the Java version of an agent Mar 30, 2017
@batmat batmat self-assigned this Mar 30, 2017
@batmat batmat changed the title WIP [FIX JENKINS-41763] Show a column with the Java version of an agent [FIX JENKINS-41763] Show a column with the Java version of an agent Apr 5, 2017
@batmat
Copy link
Member Author

batmat commented Apr 14, 2017

I believe I've addressed comments/concerns in the last commit. Can you please review the current code? Thank you

Copy link
Member

oleg-nenashev left a comment

The code looks good to me though I suppose there is some duplication with the logic available in the core (maybe API is restricted?).

The only important comment is to get rid of slave.jar references in UI and logs. The plugin does not really depend on this file.

<version>0.3-SNAPSHOT</version>
<name>Jenkins Version Column plugin</name>
<version>2.0-SNAPSHOT</version>
<name>Jenkins Versions Node Monitors plugin</name>

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

Maybe "Version"

@@ -28,16 +29,32 @@

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jenkins.version>1.625.3</jenkins.version>
<hpi-plugin.version>1.115</hpi-plugin.version>

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

Any need to define it explicitly?

@@ -72,7 +75,7 @@ public NodeMonitor newInstance(StaplerRequest req, JSONObject formData) throws F
}
};

private static final class SlaveVersion implements Callable<String, IOException> {
private static final class SlaveVersion extends MasterToSlaveCallable<String, IOException> {

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

Duplicates method in the core, NIT for this PR

VersionMonitor.OfflineCause=This node is offline because it uses old slave.jar
VersionMonitor.MarkedOffline=Making {0} offline temporarily due to the use of old slave.jar
VersionMonitor.DisplayName=Remoting Version
VersionMonitor.OfflineCause=This node is offline because it uses an old slave.jar

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

I would say "an Old Remoting version".
There is no guarantee that the file is named slave.jar (or exist at all)

This comment has been minimized.

Copy link
@daniel-beck

daniel-beck Apr 14, 2017

Member

I don't think we expose the term "remoting" anywhere (but perhaps the changelog) and we shouldn't start here.

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

"old agent version"? It gets quite more ambiguous, but slave.jar is also wrong

JVMVersionMonitor.MarkedOffline=Making {0} offline temporarily due to using an incompatible JVM version between agent and master (master={1}, agent={2})
JVMVersionMonitor.RUNTIME_GREATER_OR_EQUAL_MASTER_BYTECODE=Agent runtime must be greater or equal than the Master bytecode level (strongly recommended minimum)
JVMVersionMonitor.MAJOR_MINOR_MATCH=Agent major.minor must be equal to Master major.minor JVM version (paranoid version)
JVMVersionMonitor.EXACT_MATCH=Agent JVM version must be exactly the same as Master JVM version (paranoid++ version)

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Apr 14, 2017

Member

line break

@batmat
Copy link
Member Author

batmat commented May 29, 2017

As this PR has been sitting here for quite long, and in the last update Oleg approved it, I'm going to merge & release this.

I will try and address the non critical comments in a followup PR.

Thanks a lot for your reviews, especially @oleg-nenashev

@batmat batmat merged commit 0dae258 into jenkinsci:master May 29, 2017
1 check passed
1 check passed
Jenkins This pull request looks good
Details
@batmat batmat deleted the batmat:JENKINS-41763 branch May 29, 2017
@batmat
Copy link
Member Author

batmat commented Aug 7, 2017

FTR this got released as 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.