Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH Launch failing #57

Closed
djl197 opened this issue Jul 11, 2014 · 12 comments
Closed

SSH Launch failing #57

djl197 opened this issue Jul 11, 2014 · 12 comments

Comments

@djl197
Copy link

djl197 commented Jul 11, 2014

All,

I have got my Docker instance all setup and the docker plugin is spawning a container on my remote machine.
However when it tries to SSH launch I get the following:
[07/11/14 11:16:05] [SSH] Opening SSH connection to REMOTE_HOST_IP:49231.
java.io.IOException: There was a problem while connecting to grass.tandbergtv.lan:49231
at com.trilead.ssh2.Connection.connect(Connection.java:818)
at com.trilead.ssh2.Connection.connect(Connection.java:687)
at com.trilead.ssh2.Connection.connect(Connection.java:587)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1132)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:648)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:642)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at com.trilead.ssh2.transport.ClientServerHello.readLineRN(ClientServerHello.java:31)
at com.trilead.ssh2.transport.ClientServerHello.(ClientServerHello.java:68)
at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:466)
at com.trilead.ssh2.Connection.connect(Connection.java:758)
... 9 more
[07/11/14 11:16:05] [SSH] Connection closed.
[07/11/14 11:16:05] Launch failed - cleaning up connection

I have checked the docker instance is working.
And after it has been started by Jenkins I am able to manually ssh into the unit.
ssh jenkins@REMOTE_HOST_IP -p 49231

I have also inspected the container whether started manually or via Jenkins and when comparing them there is nothing obviously different and the fact that I can manually SSH into both of them in the same way seems to suggest the issue is not with Docker.

I have double/triple checked the credentials and they are correct.
Is anyone else havig this issue?
I am running:
Jenkins 1.570
with SSH Plugin 1.6 and SSH Credentials 1.6.1

Any help appreciated.
Dan

@djl197
Copy link
Author

djl197 commented Jul 11, 2014

If I create a slave node manually and enter the same details.
Then when the slave is launched via Docker - it is able to connect to it and run.
So I am even more confused as the why the SSHLauncher call in getSSHLauncher is not working :-(

@djl197 djl197 closed this as completed Jul 11, 2014
@djl197 djl197 reopened this Jul 11, 2014
@thomassuckow
Copy link
Contributor

There has been an ongoing issue where Jenkins attempts to SSH before the docker container is ready, retrying seems to be hit an miss. If I had to guess I would say this is related. In my setup, frequently it can take several minutes/attempts of bringing a container up and down for Jenkins to connect.

@djl197
Copy link
Author

djl197 commented Jul 11, 2014

I think I have solved this now :-) Will post some updates once I am sure

@djl197 djl197 closed this as completed Jul 11, 2014
@colegleason
Copy link

Any update on this? I'm also experiencing this issue.

@djl197
Copy link
Author

djl197 commented Jul 16, 2014

I have played around a bit - and I now have a configurable delay before attempting to SSH to the container.
That seems to have solved it for me - so in the plugin I have a per container setting (default 30seconds). With this change it works a whole lot better and generally gets in first attempt rather than failing 3 times and then rerunning the container.

@thomassuckow
Copy link
Contributor

I wonder if we could open our own tcp connection with a short timeout, if it connects then call it good, if it fails sleep 1 second.

I also wonder what the EC2 plugin does...

@colegleason
Copy link

@djl197 I'd love to see that delay back in the master plugin!

@Intrepidd
Copy link

Hello, still seing this painful issue, anyone came up with a valid solution ?

@KostyaSha
Copy link
Member

@thomassuckow did you have chance to look on ec2 plugin (or other)?

@thomassuckow
Copy link
Contributor

EC2 seems to go the custom route.
https://github.com/jenkinsci/ec2-plugin/blob/master/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java

I get the feeling though that some people are running docker on slower machines and it takes a while for SSH to come up. In which case, we probably just have to be more tolerant of retrying. We used to have a lot more issues with containers failing to start at all which made the shorter tries more beneficial, but that it behind us now (knock on wood).

I have not noticed this issue for a while now, though I am also running the bleeding edge. A lot has changed since 0.8.

@KostyaSha
Copy link
Member

@thomassuckow should we suggest using some slave reconnect plugin (if it exist) and will it work?

@thomassuckow
Copy link
Contributor

I think we just need to tune how we call SSH Slaves Plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants