Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2 dynamically created slaves dropping offline #125

Open
tfoote opened this issue Mar 17, 2016 · 3 comments
Open

ec2 dynamically created slaves dropping offline #125

tfoote opened this issue Mar 17, 2016 · 3 comments

Comments

@tfoote
Copy link
Member

tfoote commented Mar 17, 2016

4 of 5 slaves are "offline"

Connection was broken

java.io.IOException: Sorry, this connection is closed.
    at com.trilead.ssh2.transport.TransportManager.ensureConnected(TransportManager.java:587)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:660)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
    at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
    at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:677)
    at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:429)
    at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
    at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:94)
    at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:66)
    at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:45)
    at hudson.remoting.Channel.send(Channel.java:582)
    at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:261)
    at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:18)
    at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Created by SYSTEM

 Slave docker slave autoprovisioned (i-62945dd7)

Connection was broken

java.io.IOException: Sorry, this connection is closed.
    at com.trilead.ssh2.transport.TransportManager.ensureConnected(TransportManager.java:587)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:660)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
    at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
    at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Caused by: java.net.SocketException: Connection timed out
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:677)
    at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:429)
    at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
    at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
    at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:93)
    at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:66)
    at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:45)
    at hudson.remoting.Channel.send(Channel.java:582)
    at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:261)
    at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:18)
    at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Created by SYSTEM
 Slave docker slave autoprovisioned (i-8aa56c3f)

Connection was broken

java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Created by SYSTEM
Slave docker slave autoprovisioned (i-a1975e14)

Connection was broken

java.io.IOException: Sorry, this connection is closed.
    at com.trilead.ssh2.transport.TransportManager.ensureConnected(TransportManager.java:587)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:660)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
    at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
    at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
    at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
    at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
    at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
    at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:677)
    at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:429)
    at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
    at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
    at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:93)
    at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:66)
    at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:45)
    at hudson.remoting.Channel.send(Channel.java:582)
    at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:261)
    at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:18)
    at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Created by SYSTEM

The node is dropping offline and online. There appeared to be a correlation to me viewing the executor and it coming back online.
executor_dropping_offline

The big gap was ended this morning by clicking reconnect from the slave monitor.

@tfoote
Copy link
Member Author

tfoote commented Mar 17, 2016

For more completeness here's the other offline slaves recent history:

load_14
load_3f
executor_d7

And the one that' was online looks like it had trouble too:

load_48

@tfoote
Copy link
Member Author

tfoote commented Mar 17, 2016

There's a related issue here: https://issues.jenkins-ci.org/browse/JENKINS-26798 that the jenkins-slave does not tear down the "offline" slaves. Clicking "delete slave" successfully tears down and cleans up the slave instance on EC2.

@tfoote
Copy link
Member Author

tfoote commented Mar 17, 2016

It's currently set to -5 timeout, which should poll at T-5 for the end of the billing hour. We clearly overshot that for most of these machines. I'll try a timeout of 15 minutes after idle. Less optimized for billing but maybe will have different behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant