Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

yamux: keepalive failed: i/o deadline reached #70

Closed
devimc opened this issue May 17, 2018 · 4 comments · Fixed by #71, kata-containers/agent#263 or #91
Closed

yamux: keepalive failed: i/o deadline reached #70

devimc opened this issue May 17, 2018 · 4 comments · Fixed by #71, kata-containers/agent#263 or #91
Assignees
Labels
bug Incorrect behaviour

Comments

@devimc
Copy link

devimc commented May 17, 2018

see kata-containers/agent#231

@devimc devimc added the bug Incorrect behaviour label May 17, 2018
@devimc devimc self-assigned this May 17, 2018
devimc pushed a commit to devimc/kata-proxy that referenced this issue May 17, 2018
We don't know how much time a container can be paused, hence connection
write timeout should be big enough to don't close the connection while
the container is paused.

fixes kata-containers/agent#231
fixes kata-containers#70

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-proxy that referenced this issue May 17, 2018
We don't know how much time a container can be paused, hence connection
write timeout should be disabled to don't close the connection while
the container is paused.

fixes kata-containers/agent#231
fixes kata-containers#70

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-proxy that referenced this issue May 17, 2018
We don't know how much time a container can be paused, hence connection
write timeout should be disabled to don't close the connection while
the container is paused.

fixes kata-containers/agent#231
fixes kata-containers#70

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-proxy that referenced this issue May 17, 2018
We don't know how much time a container can be paused, hence connection
write timeout should be disabled to don't close the connection while
the container is paused.

fixes kata-containers/agent#231
fixes kata-containers#70

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc
Copy link
Author

devimc commented May 30, 2018

still occurs but now using docker update, I found a way to reproduce it see kata-containers/runtime#352

@devimc devimc reopened this May 30, 2018
@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2018

Found an occurrence of this on a CI job:

• Failure [167.604 seconds]
check yamux IO timeout
/home/jenkins/jenkins_slave/workspace/kata-containers-tests-ubuntu-17-10-PR/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:45
  pause, wait and unpause a container
  /home/jenkins/jenkins_slave/workspace/kata-containers-tests-ubuntu-17-10-PR/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:67
    check yamux IO connection
    /home/jenkins/jenkins_slave/workspace/kata-containers-tests-ubuntu-17-10-PR/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:68
      should keep alive [It]
      /home/jenkins/jenkins_slave/workspace/kata-containers-tests-ubuntu-17-10-PR/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:69

      Expected
          <int>: 0
      to equal
          <int>: -1

http://kata-jenkins-ci.westus2.cloudapp.azure.com/job/kata-containers-tests-ubuntu-17-10-PR/400/consoleFull

@aduenasd
Copy link

aduenasd commented Jun 8, 2018

Hi,
I'm also hitting this issue when running
kata-containers/tests/metrics/storage/blogbench.sh
on bare-metal (host Ubuntu and Fedora).

@devimc
Copy link
Author

devimc commented Jun 9, 2018

@aduenasd tanks for the information, I was able to reproduce.
This patch kata-containers/agent#263 will fix this issue

devimc pushed a commit to devimc/kata-agent that referenced this issue Jun 9, 2018
Disable yamux keep alive in channel and client.
yamux keep alive feature closes the connection with
proxy and agent when it's unable to ping them.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this issue Jun 11, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this issue Jun 18, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this issue Jun 26, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
sboeuf pushed a commit to devimc/kata-agent that referenced this issue Jul 3, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
sboeuf pushed a commit to sboeuf/proxy that referenced this issue Jul 16, 2018
We are trying to disable the feature keepalive introduced by Yamux
both on the client (kata-proxy) and server (kata-agent) sides. The
reason being we don't want to get Yamux errors in case we pause the
VM. The proxy side has already been disabled and we are about to
disable it on the agent side too. Problem is, we sometimes run into
a weird issue where the communication between the proxy and the agent
hangs.

It's related to the emulated serial port created by Qemu which is not
getting out of its sleeping loop for some cases. This issue is still
under investigation, but a simple fix is to actually write more data
to the serial port to wake it up. This workaround is needed since
disabling Yamux keepalive solves several issues, particularly one
related to our long running soak tests.

That's why this commit enables a simple "keepalive" feature, except
it does not check for any error. The idea being to simply sending
something out through this serial port.

Fixes kata-containers#70

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
sboeuf pushed a commit to sboeuf/proxy that referenced this issue Jul 17, 2018
We are trying to disable the feature keepalive introduced by Yamux
both on the client (kata-proxy) and server (kata-agent) sides. The
reason being we don't want to get Yamux errors in case we pause the
VM. The proxy side has already been disabled and we are about to
disable it on the agent side too. Problem is, we sometimes run into
a weird issue where the communication between the proxy and the agent
hangs.

It's related to the emulated serial port created by Qemu which is not
getting out of its sleeping loop for some cases. This issue is still
under investigation, but a simple fix is to actually write more data
to the serial port to wake it up. This workaround is needed since
disabling Yamux keepalive solves several issues, particularly one
related to our long running soak tests.

That's why this commit enables a simple "keepalive" feature, except
it does not check for any error. The idea being to simply sending
something out through this serial port.

Fixes kata-containers#70

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
sboeuf pushed a commit to sboeuf/proxy that referenced this issue Jul 17, 2018
We are trying to disable the feature keepalive introduced by Yamux
both on the client (kata-proxy) and server (kata-agent) sides. The
reason being we don't want to get Yamux errors in case we pause the
VM. The proxy side has already been disabled and we are about to
disable it on the agent side too. Problem is, we sometimes run into
a weird issue where the communication between the proxy and the agent
hangs.

It's related to the emulated serial port created by Qemu which is not
getting out of its sleeping loop for some cases. This issue is still
under investigation, but a simple fix is to actually write more data
to the serial port to wake it up. This workaround is needed since
disabling Yamux keepalive solves several issues, particularly one
related to our long running soak tests.

That's why this commit enables a simple "keepalive" feature, except
it does not check for any error. The idea being to simply sending
something out through this serial port.

Fixes kata-containers#70

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
sboeuf pushed a commit to devimc/kata-agent that referenced this issue Jul 18, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
sboeuf pushed a commit to devimc/kata-agent that referenced this issue Jul 18, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

Depends-on: github.com/kata-containers/proxy#91

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
egernst pushed a commit that referenced this issue Aug 23, 2018
We don't know how much time a container can be paused, hence connection
write timeout should be disabled to don't close the connection while
the container is paused.

fixes kata-containers/agent#231
fixes #70

Signed-off-by: Julio Montes <julio.montes@intel.com>
egernst pushed a commit to kata-containers/agent that referenced this issue Aug 30, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

Depends-on: github.com/kata-containers/proxy#91

fixes kata-containers/proxy#70
fixes #231

Signed-off-by: Julio Montes <julio.montes@intel.com>
jshachm pushed a commit to jshachm/agent that referenced this issue Nov 22, 2018
yamux client runs in the proxy side, sometimes the client is handling
other requests and it's not able to response to the ping sent by the
server and the communication is closed. To avoid IO timeouts in the
communication between agent and proxy, keep alive should be disabled.

Depends-on: github.com/kata-containers/proxy#91

fixes kata-containers/proxy#70
fixes kata-containers#231

Signed-off-by: Julio Montes <julio.montes@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour
Projects
None yet
3 participants