Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: graceful shutdown race condition #36819

Open
zerkms opened this issue Jan 28, 2020 · 7 comments
Open

net/http: graceful shutdown race condition #36819

zerkms opened this issue Jan 28, 2020 · 7 comments
Assignees
Labels
NeedsInvestigation

Comments

@zerkms
Copy link

@zerkms zerkms commented Jan 28, 2020

Update: I have found a super trivial way to reproduce it, see the step at the very bottom

Preamble: I think it's relevant to #23829

What version of Go are you using (go version)?

$ go _version_
go version go1.13.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOHOSTARCH="amd64"
GOHOSTOS="linux"

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ivan/.cache/go-build"
GOENV="/home/ivan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/srv/work/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ivan/.local/share/umake/go/go-lang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ivan/.local/share/umake/go/go-lang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build903857222=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"os"
	"os/signal"
	"syscall"
)

func main() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGTERM, os.Interrupt)

	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {})

	srv := &http.Server{
		Addr:    ":8080",
		Handler: mux,
	}

	go func() {
		fmt.Println(srv.ListenAndServe())
	}()

	<-sig
	fmt.Println(srv.Shutdown(context.Background()))
}

With the following code it's possible to get a case when the new connection in accepted, then closed within milliseconds, which makes the shutdown not graceful

What did you expect to see?

A graceful shutdown

What did you see instead?

Screenshot from 2020-01-28 11-29-15

Screenshot from 2020-01-28 13-04-16

How to reproduce:

Just ab -t 20 -c 5 -A foo:foo http://localhost:8080/ is enough to capture the case. The former is much more rare, the latter is quite frequent.

What you can see on screenshots: the connection is accepted, then as per https://go-review.googlesource.com/c/go/+/121419/ it should be treated as "new" (but it's not due to race), and is closed.
In first case the client had chance to send a request.
In second - it was closed even before that.

In both cases it happens with the Connection reset by peer on the client side, which should never be the case with non-keepalived connections and graceful termination.

Both pcap files are available at: https://www.dropbox.com/s/2co7wkucfey90zz/bug.tgz?dl=0

My assumption on why it's the race and how it happens:

  1. the connection is accepted with rw, e := l.Accept() successfully
  2. A moment later a shutdown request is made
  3. Before the c.setState(c.rwc, StateNew) statement is reached the server completes shutdown
  4. Because the connection is not tracked - http.Server is not aware of it
  5. The process quits, network sockets are closed by kernel

UPDATE:

It can be easier reproducible by adding time.Sleep(3 * time.Second) right before c := srv.newConn(rw)

Steps:

  1. Patch the http/server.go with time.Sleep(3 * time.Second) before c := srv.newConn(rw)
  2. Start the app
  3. Make a request
  4. Send ctrl-c

Expected: the request should complete
Actual:

➜ curl -v --user foo:foo http://localhost:8080
* Rebuilt URL to: http://localhost:8080/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
* Server auth using Basic with user 'foo'
> GET / HTTP/1.1
> Host: localhost:8080
> Authorization: Basic Zm9vOmZvbw==
> User-Agent: curl/7.58.0
> Accept: */*
> 
* Recv failure: Connection reset by peer
* stopped the pause stream!
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

And a corresponding pcap: https://www.dropbox.com/s/5vsvwoow4ydcs3f/with-timeout.tgz?dl=0

And a corresponding wireshark screenshot:

Screenshot from 2020-01-28 14-02-22

@zerkms zerkms changed the title http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response net/http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response Jan 28, 2020
@zerkms zerkms changed the title net/http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response net/http: graceful shutdown race condition Jan 28, 2020
@ALTree ALTree added the NeedsInvestigation label Feb 3, 2020
@odeke-em odeke-em self-assigned this Feb 5, 2020
@AlexanderYastrebov
Copy link
Contributor

@AlexanderYastrebov AlexanderYastrebov commented Oct 17, 2021

#48642 seems to be a duplicate, I've proposed a fix https://golang.org/cl/353714 for it

@AlexanderYastrebov
Copy link
Contributor

@AlexanderYastrebov AlexanderYastrebov commented May 9, 2022

#48642 was fixed

@zerkms
Copy link
Author

@zerkms zerkms commented May 9, 2022

		c := srv.newConn(rw)
		if !srv.trackConn(c) {
			rw.Close()
			return ErrServerClosed
		}

ooooooh, this change makes the solution non-graceful either, right?

A graceful shutdown should either:

  1. Serve the accepted connection, or
  2. Don't accept it

But with the code above it accepts the connection then closes it.

@AlexanderYastrebov
Copy link
Contributor

@AlexanderYastrebov AlexanderYastrebov commented May 10, 2022

I think you are right, moreover the problem seems to be a duplicate of #33313 which was fixed by https://golang.org/cl/213442 but now the "fix" for #48642 https://golang.org/cl/353714 reintroduced it back - client gets Connection reset by peer.

Patch the http/server.go with time.Sleep(3 * time.Second) before c := srv.newConn(rw)

To avoid patching the runtime a delay could be achieved via ConnContext:

package main

import (
	"context"
	"fmt"
	"net"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGTERM, os.Interrupt)

	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {})

	srv := &http.Server{
		Addr:    ":8080",
		Handler: mux,
		ConnContext: func(ctx context.Context, c net.Conn) context.Context {
			time.Sleep(3 * time.Second)
			return ctx
		},
	}

	go func() {
		fmt.Println(srv.ListenAndServe())
	}()

	<-sig
	fmt.Println(srv.Shutdown(context.Background()))
}

The proper fix for #48642 should close the connection but only when server is Closed and not gracefully Shutdown.

@gopherbot
Copy link

@gopherbot gopherbot commented May 10, 2022

Change https://go.dev/cl/405454 mentions this issue: Revert "net/http: close accepted connection"

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
gopherbot pushed a commit that referenced this issue May 11, 2022
This reverts CL 353714.

The change closes accepted connection also in graceful shutdown which
breaks the fix for #33313 (and apparent duplicate #36819).

The proper fix should close accepted connection only if server is closed
but not in graceful shutdown.

Updates #48642

Change-Id: I2f7005f3f3037e6563745731bb2693923b654004
GitHub-Last-Rev: f6d885a
GitHub-Pull-Request: #52823
Reviewed-on: https://go-review.googlesource.com/c/go/+/405454
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 27, 2022
Previous attempt https://golang.org/cl/353714 also incorrectly closed
accepted connection while in shutdown.

Fixes golang#48642
Updates golang#33313, golang#36819
@gopherbot
Copy link

@gopherbot gopherbot commented May 27, 2022

Change https://go.dev/cl/409154 mentions this issue: net/http: close accepted connection when Closed

@gopherbot
Copy link

@gopherbot gopherbot commented May 31, 2022

Change https://go.dev/cl/409537 mentions this issue: net/http: wait for listeners to exit in Server.Close and Shutdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

5 participants