Skip to content

x/crypto/ssh: Dial hangs in kexLoop indefinitely - ignoring ClientConfig.Timeout #51926

Open
@pjbgf

Description

@pjbgf

What version of Go are you using (go version)?

$ go version
go version go1.17.8 linux/amd64

Does this issue reproduce with the latest release?

Yes, as this is library related using version:

golang.org/x/crypto v0.0.0-20220321153916-2c7772ba3064

I can confirm the issue also happens with previous versions:

golang.org/x/crypto@v0.0.0-20220315160706-3147a52a75dd

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/USER/.cache/go-build"
GOENV="/home/USER/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/USER/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/USER/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.8"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build762703969=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The application implements a golang ssh transport that hangs indefinitely at ssh.Dial every so often.
The current timeout is set to 30 seconds, which ssh.Dial does not uphold (https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/ssh.go#L251-L255 https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/init.go#L30).

This is a low concurrency (2-4 parallel workers) application which creates multiple ssh connections to execute simple git operations.

The ssh.Dial uses the ssh.ClientConfig as below:

        ssh.ClientConfig{
		User:    username,
		Auth:    []ssh.AuthMethod{ssh.PublicKeys(key)},
		Timeout: 30 * time.Second,
	}

Actual code can be seen at:
https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/managed/ssh.go#L166

What did you expect to see?

The ssh.Dial operation error if the Dial operation took longer than the pre-configured timeout.

What did you see instead?

The goroutine hangs indefinitely. pprof shows the culprit being:

goroutine 25748 [select, 50 minutes]:
golang.org/x/crypto/ssh.(*handshakeTransport).kexLoop(0xc0003398c0)
	golang.org/x/crypto@v0.0.0-20220321153916-2c7772ba3064/ssh/handshake.go:268 +0x485
created by golang.org/x/crypto/ssh.newClientTransport
	golang.org/x/crypto@v0.0.0-20220321153916-2c7772ba3064/ssh/handshake.go:135 +0x23d

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions