Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not swallow IOException in case it is not recoverable #41

Merged
merged 1 commit into from
Jan 28, 2017

Conversation

olivergondza
Copy link
Member

There are 2 changes here:

  • More failure causes is considered recoverable.
  • In case the exception is considered nonrecoverable, it is thrown instead of ignored.

https://issues.jenkins-ci.org/browse/JENKINS-41163
https://issues.jenkins-ci.org/browse/JENKINS-26379

@olivergondza
Copy link
Member Author

@jenkinsci/code-reviewers, pretty please. I would like to get this released soon.

Copy link
Member

@mc1arke mc1arke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what IOExceptions we'd expect to be recovering from after this change - the point of the retry seems to be to overcome any network issues which this change now seems to be preventing, so why still have the retry?

// Premature connection close
private boolean isRecoverable(String message) {
List<String> knownFailures = Arrays.asList(
"Connection refused", "Connection reset", "Connection timed out", "No route to host", "Premature connection close"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this list not be made a static field in this class rather than re-generating it for every failed connection attempt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be

@olivergondza
Copy link
Member Author

I'm not clear on what IOExceptions we'd expect to be recovering from after this change - the point of the retry seems to be to overcome any network issues which this change now seems to be preventing, so why still have the retry?

I do not follow. It is supposed to retry on exception that is considered recoverable and throws those that are not right away.

@scoheb
Copy link

scoheb commented Jan 19, 2017

LGTM

…low IOException in case it is not recoverable.
@mc1arke
Copy link
Member

mc1arke commented Jan 19, 2017

Sorry, I'd read the logic the wrong way round!

However, we're being very specific about the exceptions we're allowing to try and recover from. Are any exceptions outside of the list you've specified likely to be transient conditions that the next connection attempt may overcome?

The IllegalArgumentException that's being shown in the logs is misleading ('This may be a bug in Jenkins'), and should be handled better, but I'm not convinced that white-listing certain messages is a good way to go.

@olivergondza
Copy link
Member Author

olivergondza commented Jan 19, 2017

The IllegalArgumentException that's being shown in the logs is misleading ('This may be a bug in Jenkins'), and should be handled better, but I'm not convinced that white-listing certain messages is a good way to go.

What you are referring to is probably:

[05/20/16 13:01:45] [SSH] Opening SSH connection to 192.168.1.36:22.
No route to host
ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Connection is not established!
at com.trilead.ssh2.Connection.getRemainingAuthMethods(Connection.java:1030)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPublicKeyAuthenticator.getRemainingAuthMethods(TrileadSSHPublicKeyAuthenticator.java:88)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPublicKeyAuthenticator.canAuthenticate(TrileadSSHPublicKeyAuthenticator.java:80)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:212)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:172)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1212)
...

This is no longer possible with this patch. The old code retried iff it knew it was recoverable and continued execution if it was not considered recoverable causing vague exception of first connection use (TrileadSSHPublicKeyAuthenticator.getRemainingAuthMethods()) in, say, No route to host situation. This code will either fail fast, retry until success or retry and throw if unsuccessful (on connection.connect()). I share your dislike to exception message processing but I do not see a better way to go.

@mc1arke
Copy link
Member

mc1arke commented Jan 19, 2017

I'd probably prefer to see a flag set inside the try block if connection.connect() doesn't throw an exception, and then not attempt the authentication if that flag isn't set, whilst logging/reporting any failures on each iteration of attempting to connect.

I'm not clear on what errors we think we're going to fail-fast with (you've covered off a large number of the normal network failures on your recoverable list), or the impact of re-attempting connection if they occur, so I'd just log them and continue trying unless there's a security or performance impact on doing so. This means we're consistent on re-trying for all network failures, but give the user some indication of what they are, and then don't print the java.lang.IllegalStateException: Connection is not established! message if we've attempted all the connection attempts without success,

@olivergondza
Copy link
Member Author

I'd probably prefer to see a flag set inside the try block if connection.connect() doesn't throw an exception, and then not attempt the authentication if that flag isn't set, whilst logging/reporting any failures on each iteration of attempting to connect.

That is not necessary. The only way for that retry loop to ever finish is by successful connection or throwing an exception.

... but give the user some indication of what they are, and then don't print the java.lang.IllegalStateException: Connection is not established! message if we've attempted all the connection attempts without success,

See above. That is what this fix does exactly.

I'm not clear on what errors we think we're going to fail-fast with (you've covered off a large number of the normal network failures on your recoverable list), or the impact of re-attempting connection if they occur, so I'd just log them and continue trying unless there's a security or performance impact on doing so.

I do not know where to find an authority to provide the necessary guidance here. The retry code was in effect for a while so I see this as a step to more complete list that will likely be a subject of further refinement. I have been using this for testing and in production for a couple of days and it work better.

@olivergondza
Copy link
Member Author

I will go an and release this later this week unless there are objections.

@zxiiro
Copy link

zxiiro commented Jan 26, 2017

We tried building this patch and updating our Jenkins to use ssh-slaves plugin with this patch included but still get failure as below. The key is available on the server though and I'm able to manually login to the system through my own SSH client.

ERROR: Server rejected the 1 private key(s) for jenkins (credentialId:test-credential/method:publickey)
[01/26/17 17:50:49] [SSH] Authentication failed.
hudson.AbortException: Authentication failed.
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1231)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:724)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:719)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[01/26/17 17:50:49] Launch failed - cleaning up connection

@olivergondza
Copy link
Member Author

@zxiiro, that seems unrelated to this patch. The connection.connect() succeeded but the authentication failed - as I understand it, this was the same before the patch.

@zxiiro
Copy link

zxiiro commented Jan 26, 2017

@olivergondza it does seem to be a different issue then. I'll continue to investigate and open a new issue. Did something change in OpenStack plugin around authentication? because this works if we revert OpenStack plugin to 2.11.

@olivergondza
Copy link
Member Author

This is a change on openstack side of things: jenkinsci/openstack-cloud-plugin#137

@olivergondza olivergondza merged commit d833545 into jenkinsci:master Jan 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants