Skip to content
This repository has been archived by the owner on Jul 28, 2021. It is now read-only.

Connecting to vsock sometimes times out #124

Open
beweedon opened this issue Aug 30, 2017 · 5 comments
Open

Connecting to vsock sometimes times out #124

beweedon opened this issue Aug 30, 2017 · 5 comments

Comments

@beweedon
Copy link
Contributor

When running a process and trying to connect to stdio, we can get the following stack trace:

failed creating stdout Connection: failed connecting the VsockConnection: failed connect() to 00000002.40000081: connection timed out
      github.com/Microsoft/opengcs/service/gcs/transport.(*VsockTransport).Dial
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/transport/vsock.go:27
      github.com/Microsoft/opengcs/service/gcs/bridge.connectStdio
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/bridge/connection.go:84
      github.com/Microsoft/opengcs/service/gcs/bridge.(*bridge).runExternalProcess
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/bridge/bridge.go:323
      github.com/Microsoft/opengcs/service/gcs/bridge.(*bridge).execProcess
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/bridge/bridge.go:221
      github.com/Microsoft/opengcs/service/gcs/bridge.(*bridge).loop
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/bridge/bridge.go:88
      github.com/Microsoft/opengcs/service/gcs/bridge.(*bridge).CommandLoop
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/bridge/bridge.go:52
      main.main
        /home/serviceb/golang/src/github.com/Microsoft/opengcs/service/gcs/main.go:60
      runtime.main
        /usr/lib/go-1.6/src/runtime/proc.go:188
      runtime.goexit
        /usr/lib/go-1.6/src/runtime/asm_amd64.s:1998
      at <ScriptBlock>, C:\test\LinuxContainer.Tests.ps1: line 192

The timeout occurs in the virtsock library. If we want to be able to configure the timeout ourselves, it might make sense for virtsock to have some way of exposing sockopts, so we could set timeout there. Another potential option is just retrying on timeout in the GCS.

@gupta-ak
Copy link
Member

gupta-ak commented Sep 7, 2017

I tried adding a retry and it looked it the issue was solved, so maybe we can do that for the time being. In the future, the virtsock library could be changed to have a DialTimeout and be implemented with non blocking sockets like Go's net package.

@beweedon
Copy link
Contributor Author

beweedon commented Sep 7, 2017

I was talking to Dexuan, and the underlying issue of the timeouts could potentially be a vsock bug. He said timeouts should only occur when:

  • there is a burst of too many connect requests from the Linux VM, more than the backlog parameter to the listen syscall
  • the ServiceID on the host is not being listen()'ed
  • the host accept()s the connect request but then closes the connection immediately

These scenarios seem unlikely, although I suppose not impossible. When we get another repro (I'm suddenly unable to get any), we can look in the syslog and see if there's any obvious error output there.

We can mask the error with the retries in the meantime if we need, of course, just FYI that the error could be something in the vsock/hvsock implementation itself.

@gupta-ak
Copy link
Member

gupta-ak commented Sep 7, 2017

Yeah, it makes sense if it's a vsock bug. The timeout happens way too quickly to be a real timeout.

@rn
Copy link
Contributor

rn commented Sep 9, 2017

More than happy to take patches to the virtsock go bindings if that helps.

@gupta-ak
Copy link
Member

A temporary workaround (#133) has been merged to master and Rolf also has a similar workaround in his fork. I'm keeping this issue open until the underlying vsock/hvsock driver is fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants