Description
Update: I have found a super trivial way to reproduce it, see the step at the very bottom
Preamble: I think it's relevant to #23829
What version of Go are you using (go version
)?
$ go _version_ go version go1.13.6 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
GOHOSTARCH="amd64"
GOHOSTOS="linux"
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/ivan/.cache/go-build" GOENV="/home/ivan/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/srv/work/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/ivan/.local/share/umake/go/go-lang" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/ivan/.local/share/umake/go/go-lang/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build903857222=/tmp/go-build -gno-record-gcc-switches"
What did you do?
package main
import (
"context"
"fmt"
"net/http"
"os"
"os/signal"
"syscall"
)
func main() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGTERM, os.Interrupt)
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {})
srv := &http.Server{
Addr: ":8080",
Handler: mux,
}
go func() {
fmt.Println(srv.ListenAndServe())
}()
<-sig
fmt.Println(srv.Shutdown(context.Background()))
}
With the following code it's possible to get a case when the new connection in accepted, then closed within milliseconds, which makes the shutdown not graceful
What did you expect to see?
A graceful shutdown
What did you see instead?
How to reproduce:
Just ab -t 20 -c 5 -A foo:foo http://localhost:8080/
is enough to capture the case. The former is much more rare, the latter is quite frequent.
What you can see on screenshots: the connection is accepted, then as per https://go-review.googlesource.com/c/go/+/121419/ it should be treated as "new" (but it's not due to race), and is closed.
In first case the client had chance to send a request.
In second - it was closed even before that.
In both cases it happens with the Connection reset by peer
on the client side, which should never be the case with non-keepalived connections and graceful termination.
Both pcap files are available at: https://www.dropbox.com/s/2co7wkucfey90zz/bug.tgz?dl=0
My assumption on why it's the race and how it happens:
- the connection is accepted with
rw, e := l.Accept()
successfully - A moment later a shutdown request is made
- Before the
c.setState(c.rwc, StateNew)
statement is reached the server completes shutdown - Because the connection is not tracked -
http.Server
is not aware of it - The process quits, network sockets are closed by kernel
UPDATE:
It can be easier reproducible by adding time.Sleep(3 * time.Second)
right before c := srv.newConn(rw)
Steps:
- Patch the
http/server.go
withtime.Sleep(3 * time.Second)
beforec := srv.newConn(rw)
- Start the app
- Make a request
- Send ctrl-c
Expected: the request should complete
Actual:
➜ curl -v --user foo:foo http://localhost:8080
* Rebuilt URL to: http://localhost:8080/
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
* Server auth using Basic with user 'foo'
> GET / HTTP/1.1
> Host: localhost:8080
> Authorization: Basic Zm9vOmZvbw==
> User-Agent: curl/7.58.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* stopped the pause stream!
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
And a corresponding pcap: https://www.dropbox.com/s/5vsvwoow4ydcs3f/with-timeout.tgz?dl=0
And a corresponding wireshark screenshot: