New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Dial is slow on localhost #23366

Open
levrik opened this Issue Jan 7, 2018 · 23 comments

Comments

Projects
None yet
@levrik

levrik commented Jan 7, 2018

See #23366 (comment) for more details!

What version of Go are you using (go version)?

go version go1.9.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config

What did you do?

now := time.Now()
net.Dial("tcp", "localhost:3000")
fmt.Println(time.Since(now))

This code outputs around 1s.
When replacing localhost with 127.0.0.1 it takes around 2ms, sometimes also <1ms.
google.de:80 is around 60ms.

Is this expected to take so much longer with localhost? Took me several hours to find out that this was the root cause of multiple issues in my application.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Jan 7, 2018

@davecheney

This comment has been minimized.

Contributor

davecheney commented Jan 7, 2018

@levrik

This comment has been minimized.

levrik commented Jan 7, 2018

@davecheney As I said google.de:80 is working fast (60ms) so I don't think it's an DNS issue.
I will continue to investigate.

@levrik

This comment has been minimized.

levrik commented Jan 7, 2018

@davecheney Just tried under macOS, same network. 2ms for localhost:3000 and 220µs for 127.0.0.1:3000. Maybe an Windows issue?

Edit:
Tested in WSL. About 800µs for localhost:3000 and 200µs for 127.0.0.1:3000. Not sure if requests from WSL go through the Windows DNS stack.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jan 7, 2018

Ephemeral port exhaustion maybe?

@davecheney

This comment has been minimized.

Contributor

davecheney commented Jan 7, 2018

@odeke-em odeke-em changed the title from net: net.Dial slow on localhost to net: Dial is slow on localhost Jan 7, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented Jan 8, 2018

@levrik - Perhaps you can write a similar program in a different programming language and test it in the same box ? That will eliminate whether its an issue with your networking stack or with Go.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Jan 8, 2018

@as

This comment has been minimized.

Contributor

as commented Jan 8, 2018

(with a stopwatch, unless ping supports a one shot mode on windows)

ping -n 1 localhost

@levrik

This comment has been minimized.

levrik commented Jan 8, 2018

@davecheney @agnivade Let me try later this evening.

@levrik

This comment has been minimized.

levrik commented Jan 13, 2018

Sorry for not responding.
Just tried a ping to localhost. Took below 1ms.

@kkinnunen

This comment has been minimized.

kkinnunen commented Mar 15, 2018

My connections were taking exactly 1s each with

  • dialing to localhost:port
  • listening on tcp, localhost:port

Worked around with using dialing and listening on "tcp", 127.0.0.1:port

net.Listen network (tcp, tcp4) and address of ":port", "localhost:port", "127.0.0.1:port" seem affect the outcome (to my adhoc testing). For example:

  • dialing to localhost:port
  • listening on "tcp", ":0" -> fast
  • listening on "tcp4", ":0" ->slow

go version
go version go1.10 windows/amd64

@andybons andybons added this to the Unplanned milestone Mar 15, 2018

@ewxrjk

This comment has been minimized.

ewxrjk commented Mar 19, 2018

I encountered the same issue, and I've done some digging. The first clue was that if I tried to connect to a port that has no listener at all the delay was 2 seconds, not 1 second.

The delay isn't in the name lookup. But the results of the name lookup do matter: here, localhost results in two addresses, ::1 and 127.0.0.1. In this case net.Dial attempts to connect to ::1 first and 127.0.0.1 second.

The delay arises from failed connection attempts. Specifically, the call to fd.pfd.ConnectEx takes almost exactly a second when it fails (for any address, not just local ones). You can see this by modifying the implementation to print before/after timestamps.

In the localhost case, if you're listening on both IPv4 and IPv6, then it succeeds immediately. If you're listening on just IPv4 then you get failed connection attempted and a 1s delay followed by success. If you're listening on neither then you get two failed connection attempts and thus a 2s delay.

The same issue affects PuTTY. tcpdump reveals that what's going on is that Windows apparently does not believe the RST bits it gets back from the target of the connection (even if, apparently, it is itself the target) and keeps trying for up to a second.

I doubt that Go can do anything about this. For application developers the answer is to bind to both IPv4 and IPv6 addresses, so that it doesn't matter what connection order clients use.

A couple of references:

@levrik

This comment has been minimized.

levrik commented Mar 20, 2018

Ah, that's really interesting.
The issue I have here is that I'm not in control of the target servers. I built a development proxy which also proxies websockets besides normal HTTP. If you use, for example, create-react-app which uses sockjs for the connection to its live reload server it will fall back first to eventsource and after that to polling. that is caused by some internal dynamic timeout that is calculated based on a normal HTTP call (which is fast). I think it is fine to just use 127.0.0.1 in that case, at least for now.

@agnivade

This comment has been minimized.

Member

agnivade commented May 2, 2018

@levrik - So, would it be fair to say that this issue can be closed ?

@gopherbot

This comment has been minimized.

gopherbot commented Jun 19, 2018

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Jun 19, 2018

@levrik

This comment has been minimized.

levrik commented Jun 19, 2018

@agnivade Oh sorry. Completely forgot to anwser. Yes. Maybe it'd be an idea to put information about that into the Go docs?

@josharian josharian reopened this Jun 19, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented Jun 19, 2018

A PR will be much appreciated.

/cc @bradfitz

@golang golang deleted a comment Jul 6, 2018

sunjayBhatia added a commit to cloudfoundry/diego-logging-client that referenced this issue Jul 13, 2018

Set loggregator address to 127.0.0.1 for Windows
Investigation story: https://www.pivotaltracker.com/story/show/158945682

This Golang issue has the required context: golang/go#23366 (comment)
- the loggregator agent only listens on localhost on ipv4
- net.Dial tries two addresses when it receives a localhost address: ::1 and 127.0.0.1
- according to the issue, the dial to ::1 fails and causes a second delay
- our addition of the 1 second dial timeout and blocking dial caused us to see this on AWS and vSphere but not GCP because GCP stemcells have ipv6 disabled on all interfaces

[#158991284](https://www.pivotaltracker.com/story/show/158991284)
@inliquid

This comment has been minimized.

inliquid commented Nov 24, 2018

Got same here with http package and some profiling tests, f.e. this code seemed broken and program just hanged until localhost was changed to 127.0.0.1

func main() {
	ctrl.Setup()

	go func() {
		if err := http.ListenAndServe("localhost:3000", &poms.GZipServer{}); err != nil {
			log.Fatalln("Error listening:", err)
		}
	}()

	f, err := os.Create("cpu.prof")
	if err != nil {
		log.Fatalln("Can't create profiler dump file:", err)
	}

	pprof.StartCPUProfile(f)

	for i := 0; i < 1000; i++ {
		if _, err := http.Get("http://localhost:3000/api/purchaseOrders/1"); err != nil {
			log.Println("Error sending request:", err)
		}
	}

	pprof.StopCPUProfile()
}

Note, that in this example both listener and client supposed to bind to same host/interface: localhost, and there is no way to get an idea of what is happening. No errors, no feedback on failed [::1] attempts, what so ever. After replacing localhost with 127.0.0.1 everything worked.

I expect to receive some errors back from Get when there are timed out or refused connect attempts on different interfaces, including attempts to connect on IPv6/IPv4.

@as

This comment has been minimized.

Contributor

as commented Nov 25, 2018

@inliquid you have a transactional race in your program.

There is no guarantee that by the time your listener goroutine resolves the host part of the socket and starts the http.ListenAndServe logic, your client hasn't already tried 1000 times to connect and failed.

I wouldn't be surprised if changing it to the IPv4 address made this process quicker, allowing your server to listen before the clients started clienting.

Another question is how you determine success. Nothing happens if the client relieves no error.

I suggest creating your own listener and attaching it to the http server in two separate steps. This would allow you to implement your own net.Listener and net.Conn for the connections and add whatever debugging output you want.

@inliquid

This comment has been minimized.

inliquid commented Nov 25, 2018

@as well, in fact I don't have race problem. I could have such problem as you suggested, but I don't. If there were any - I would see an error as a result. I did a lot of experiments before realized what is happening, including running server and client in 2 separate processes and emulating http server to see is http.Get even working. Problem is that http.Get returns nothing in this case.

@ewxrjk

This comment has been minimized.

ewxrjk commented Nov 25, 2018

@inliquid there is not much Go can do about the connect issue. It is just reflecting the behavior of the underlying platform. The best strategies are either (1) bind to both loopback addresses, or (2) bind to just one of them but also only connect to that one.

In general binding to names rather than addresses is risky: if the name resolves to multiple addresses then it will bind to only one of those addresses. There's already a warning about this under net.Listen.

@inliquid

This comment has been minimized.

inliquid commented Nov 25, 2018

I agree in general, however, I think it's clear that localhost in Windows is exceptional case and since it has nothing to do with DNS or hosts I think it must be treated in exceptional manner, explicitly, avoiding such situations were program behavior may change unexpectedly. Maybe just make "localhost" == "127.0.0.1", or even deny "localhost" and require explicit address, smth like

  • "localhost_v4" == "127.0.0.1"
  • "localhost_v6" == "[::1]"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment