Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: using Transport.IdleConnTimeout may lead to goroutine leak in an edge case #25621

Closed
aryszka opened this issue May 29, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@aryszka
Copy link

commented May 29, 2018

What version of Go are you using (go version)?

1.10.2

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOOS="linux"
GOARCH="amd64"

What did you do?

I ran an HTTP short-polling client and the polling timeout value was equal to the IdleConnTimeout of the used http.Transport.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"runtime"
	"time"
)

func main() {
	// IdleConnTimeout close to the polling timeout
	pollTO := 30 * time.Millisecond
	connTO := pollTO

	// avoid the leak by not setting IdleConnTimeout:
	// connTO = 0
	// or by using a value different from the poll timeout:
	// connTO = pollTO + time.Millisecond

	// start a trivial server, it can be in a separate process:
	go func() {
		if err := http.ListenAndServe(":8080", http.HandlerFunc(
			func(w http.ResponseWriter, _ *http.Request) {
				w.Write([]byte("Hello, world!"))
			},
		)); err != nil {
			log.Fatalln(err)
		}
	}()

	// create a client with the IdleConnTimeout configured:
	c := &http.Client{
		Transport: &http.Transport{
			IdleConnTimeout: connTO,
		},
	}

	// run periodic requests, count the requests, goroutines and the heap in use:
	var (
		m runtime.MemStats
		reqs int
	)

	for {
		func() {
			req, err := http.NewRequest("GET", "http://localhost:8080", nil)
			if err != nil {
				log.Fatalln(err)
			}

			rsp, err := c.Do(req)
			if err != nil {
				log.Fatalln(err)
			}

			defer rsp.Body.Close()
			if _, err := ioutil.ReadAll(rsp.Body); err != nil {
				log.Fatalln(err)
			}

			reqs++
			runtime.ReadMemStats(&m)
			fmt.Print("\r",
				time.Now().Format("15:04:05"),
				", requests: ", reqs,
				", goroutines: ", runtime.NumGoroutine(),
				", heap in use: ", m.HeapInuse,
			)

			time.Sleep(pollTO)
		}()
	}
}

What did you expect to see?

I expected the existing connection from the pool either reused or not, without other issues (even if having the same timeouts for the short-polling and the IdleConnTimeout is quite an edge case).

What did you see instead?

I saw a steadily growing number of goroutines and memory usage.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone May 29, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 10, 2018

A belated thanks for the fun bug report. Sorry for missing this earlier.

I'm able to reproduce it and have started debugging. It's not immediately obvious what's happening, but I'll keep looking.

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

One problem is the rsp.Body.Close() shouldn't be deferred since we don't ever exit the block. That doesn't fix the underlying issue.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

@fraenkel, no that's not the issue. It is in a func scope above, so the defer does work. But the bug also repros with an immediate Body.Close before the sleep.

I've found that it's related to the Transport's idle connection management, but then I had to do something else. I'll look at it again today.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

Each increase of 3 goroutines (the Transport persistConn's readLoop+writeLoop + the server side handling it) happens right after each time this "not idle" case sometimes fires:

func (pc *persistConn) closeConnIfStillIdle() {
// ....
        if _, ok := t.idleLRU.m[pc]; !ok {
                // Not idle.                                                                                                                                                 
                return
        }

So whatever is removing it from that map isn't also closing the conn, which would break the readLoop's Peek which would make that goroutine return, closing the writeLoop, and the server goroutine would end when the conn closed also.

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

The cases which fail always are connections pulled from getIdleConn.

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

So from what I see, we call getIdleConn, and then see there is an idleTimer associated with it so its removed from the LRU but not returned. Then the closeConnIfStillIdle is invoked and decides its not idle, therefore we lose the pconn and its never closed.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

Yeah, that check just seems wrong. Sending a CL with explanation.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 11, 2018

Change https://golang.org/cl/123315 mentions this issue: net/http: fix rare Transport leak, remove incorrect defensive logic

@gopherbot gopherbot closed this in 631402f Jul 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.