Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-22722] - Make AsynchronousCommandTransport tolerant against Socket timeouts #86

Merged
merged 2 commits into from
Jun 9, 2016

Conversation

oleg-nenashev
Copy link
Member

This change prevents failures of the remoting channel on SocketTimeout exceptions if the channel is in the operational state after it. The original behavior can be restored by setting a system property.

It should be a more graceful fix of the issue, which @fengxx was trying to fix in #68

@reviewbybees @KostyaSha @fengxx @jtnord

* interrupted on any {@link SocketTimeoutException}.
* @since TODO
*/
private static boolean RDR_FAIL_ON_SOCKET_TIMEOUT = Boolean.getBoolean(RDR_SOCKET_TIMEOUT_PROPERTY_NAME);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. As I said in the SystemProperties conversation, it was just the first baby step. We will need to move the logic outside Jenkins core into a shared lib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KostyaSha deleted his comments

@jtnord
Copy link
Member

jtnord commented Jun 6, 2016

🐝 👍

@andresrc
Copy link

andresrc commented Jun 6, 2016

🐝

@jglick
Copy link
Member

jglick commented Jun 6, 2016

🐝 if you have found some means of testing this at least interactively. Maybe there is some Docker or iptables magic you can use to simulate this kind of condition?

@oleg-nenashev
Copy link
Member Author

@jglick
I've simulated it in the following way:

  • Disable PingThreads
  • Reduce SocketTimeout to 1 minute
  • Launch the build with "sleep 100500" on the slave.

@fengxx
Copy link

fengxx commented Jun 7, 2016

👍

@oleg-nenashev
Copy link
Member Author

Had a soak testing of the change with 1-second Socket timeout for 48 hours.
Slave and master logs were flooded by warnings, but there was no channel breakage.

Integrating the PR

@oleg-nenashev oleg-nenashev merged commit 25373d7 into jenkinsci:master Jun 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants