-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix [JENKINS-44568]: restore old ways of handling ssh disconnection #66
Conversation
I am a little bit confused because this gives me the impression that it disconnects right after receiving the data which would make the idea of polling quite useless. I doubt we can afford to be disconnected too often or for long periods of time because we will lose events, especially because not all servers do have the ability to play missed ones. If this is fixing the current problem it means that somehow we can endup with stalled SSH sessions, ones that do report as connected but which in fact are not receiving any data. |
Reconnection is performed if calling reconnect(). In this method, sshConnection.disconnect() is called. By this, sshConnection.isConnected() returns false. Also sshConnection.isClosed() returns true, Finally null is set to sshConnction inside of finally block in run(). sshConnection has possibility to be reused for executing other SSH commands on a generated SSH Channel. So it should not be updated in this location. Gerrit stream is output of a SSH command performed on a SSH channel. So please handle SSH Channel rather than SSH Connection. Updating line 439 to "if (channel != null) {" would help you. |
@ssbarnea, this code that I copy-pasted from old version must be called only in case of broken ssh connection or -1 from read (means @rinrinne, maybe I understood the bug in wrong way, but here that we have:
So, I can only assume that:
I can replace line 439, but I don't see how it could help. But I agree that there is some misalignment between Regarding the rest:
Correct me if I'm wrong, but it must be called explicitly. Before it was done implicitly by calling
Do you mean current PR or upstream code? In current PR I call it, just to restore old behaviour (and trigger reconnection implicitly). If you think that |
Based on mawinter69 investigation
Sorry for late reply. Maybe #67 is root cause.
-1 indicates connection is broken. 0 means buffer.read() reads nothing from InputStream. It is caused by:
In this case, I guess that '2' caused this issue.
Replacing line 439 makes channel disconnecting without checking channel status if channel has been created.
Before introduced 57163bb, SshChannel was managed within SshConnection. So needed SshConnection.disconnect(). But after introducing non-blocking operation, SshChannel can be accessed within GerritConnection. So no need to treat SshConnection.disconnect() for each Gerrit stream-events. Regarding JSch source code, channel.connect() has actual connection to remote service.
nullfyWatchdog() is called within finally block. It means that while loop for consuming stream is already exited. If GerritConnection.shutdown() is not called explicitly, consuming loop will be re-entered. This is the same as reconnection process.
Means upstream code. So I submit my comment to this PR instead of each lines in your commit. |
So I now have three different approaches claiming to fix JENKINS-44568 and JENKINS-44414; #66, #68 and #69 So which one should I go for? |
+1 to @mawinter69. #66 handles connection but it is not essentially needed. As I mentioned, this may affect other channels created from the same connection. |
I think somebody should give a try to #69 in real world or we can ask @mawinter69 to add a more unit tests (basically, it's the same test cases that were introduced in #69). I agree that my PR doesn't solve the problem and I will close it after the real fix will be merged. But also I introduced JVM parameter to change the default size of |
#68 was merged. Close this one. |
If I understood the source code correctly, since 57163bb re-connect is broken.
Because we do not "close" the sshConnection, as it was done before. So
sshConnection
object assumes that connection still exists.As I mentioned in https://issues.jenkins-ci.org/browse/JENKINS-44568, I'm not sure that this PR really fixes the stuff, because I didn't have time to test it, so, let's wait for some feedback.