Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: Dial hangs in kexLoop indefinitely - ignoring ClientConfig.Timeout #51926

Open
pjbgf opened this issue Mar 24, 2022 · 3 comments
Open
Labels
NeedsInvestigation
Milestone

Comments

@pjbgf
Copy link

@pjbgf pjbgf commented Mar 24, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.8 linux/amd64

Does this issue reproduce with the latest release?

Yes, as this is library related using version:

golang.org/x/crypto v0.0.0-20220321153916-2c7772ba3064

I can confirm the issue also happens with previous versions:

golang.org/x/crypto@v0.0.0-20220315160706-3147a52a75dd

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/USER/.cache/go-build"
GOENV="/home/USER/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/USER/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/USER/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.8"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build762703969=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The application implements a golang ssh transport that hangs indefinitely at ssh.Dial every so often.
The current timeout is set to 30 seconds, which ssh.Dial does not uphold (https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/ssh.go#L251-L255 https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/init.go#L30).

This is a low concurrency (2-4 parallel workers) application which creates multiple ssh connections to execute simple git operations.

The ssh.Dial uses the ssh.ClientConfig as below:

        ssh.ClientConfig{
		User:    username,
		Auth:    []ssh.AuthMethod{ssh.PublicKeys(key)},
		Timeout: 30 * time.Second,
	}

Actual code can be seen at:
https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/ssh.go#L166

What did you expect to see?

The ssh.Dial operation error if the Dial operation took longer than the pre-configured timeout.

What did you see instead?

The goroutine hangs indefinitely. pprof shows the culprit being:

goroutine 25748 [select, 50 minutes]:
golang.org/x/crypto/ssh.(*handshakeTransport).kexLoop(0xc0003398c0)
	golang.org/x/crypto@v0.0.0-20220321153916-2c7772ba3064/ssh/handshake.go:268 +0x485
created by golang.org/x/crypto/ssh.newClientTransport
	golang.org/x/crypto@v0.0.0-20220321153916-2c7772ba3064/ssh/handshake.go:135 +0x23d
@gopherbot gopherbot added this to the Unreleased milestone Mar 24, 2022
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Mar 24, 2022

CC @FiloSottile

@mknyszek mknyszek added the NeedsInvestigation label Mar 24, 2022
@seankhliao seankhliao changed the title x/crypto: ssh.Dial hangs in kexLoop indefinitely - ignoring ClientConfig.Timeout x/crypto/ssh: Dial hangs in kexLoop indefinitely - ignoring ClientConfig.Timeout Mar 24, 2022
@pjbgf
Copy link
Author

@pjbgf pjbgf commented Mar 25, 2022

I have noticed that this seems to happen only on subsequent/concurrent connections. I could not reproduce it ever happening on the first connection. So potentially this is somehow related to #27140.

pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 25, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 25, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 25, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 25, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 25, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
pjbgf added a commit to pjbgf/source-controller that referenced this issue Mar 28, 2022
The underlying SSH connections are kept open and are reused
across several SSH sessions. This is due to upstream issues in
which concurrent/parallel SSH connections may lead to instability.

golang/go#51926
golang/go#27140
Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
@pjbgf
Copy link
Author

@pjbgf pjbgf commented May 26, 2022

By ensuring the session's StdoutPipe is serviced quickly, seems to resolve the problem, as mentioned on crypto/ssh comments:

https://github.com/golang/crypto/blob/eb4f295cb31f7fb5d52810411604a2638c9b19a2/ssh/session.go#L553-L558

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants