You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
every 1 or 2 days, grpc-interop1 worker goes offline and cannot be ssh'ed to.
The workaround it to restart the VM and it starts working again.
One sign of the worker going offline, the CPU load goes up to 100%.
My theory is that the problem is caused by some issue with docker - because this machine spins up a LOT of short lived docker containers (in the past we've seen some issues with docker on linux workers when running too many docker containers in parallel).
The text was updated successfully, but these errors were encountered:
Last build before the worker goes offline usually shows this:
ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
I added worker grpc-interop2 that is exactly the same grpc-interop1 worker, but instead of Debian 7.8 image, it is using Ubuntu 14.04 LTS. So far, I haven't seen the issue again. Closing this for now and will reopen is seen again.
every 1 or 2 days, grpc-interop1 worker goes offline and cannot be ssh'ed to.
The workaround it to restart the VM and it starts working again.
One sign of the worker going offline, the CPU load goes up to 100%.
My theory is that the problem is caused by some issue with docker - because this machine spins up a LOT of short lived docker containers (in the past we've seen some issues with docker on linux workers when running too many docker containers in parallel).
The text was updated successfully, but these errors were encountered: