Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Dial only tries the first address #9801

Closed
bobrik opened this issue Feb 7, 2015 · 12 comments

Comments

Projects
None yet
4 participants
@bobrik
Copy link

commented Feb 7, 2015

package main

import (
    "flag"
    "log"
    "net"
)

func main() {
    host := flag.String("host", "", "host to connect")
    flag.Parse()

    if *host == "" {
        flag.PrintDefaults()
        return
    }

    ips, err := net.LookupIP(*host)
    if err != nil {
        log.Fatal(err)
    }

    log.Println("ips:", ips)

    c, err := net.Dial("tcp", *host+":80")
    if err != nil {
        log.Fatal(err)
    }

    log.Println("connected to:", c.RemoteAddr())
}

Everything is great when the first ip is available:

# go run /tmp/test.go -host docker-registry.s3.ceph.pretender.local
2015/02/07 15:52:15 ips: [192.168.2.62 192.168.2.71 192.168.2.72 192.168.2.74 192.168.2.75 192.168.2.30 192.168.2.60 192.168.2.61]
2015/02/07 15:52:15 connected to: 192.168.2.62:80

But when ip is not available, net.Dial fails without retrying the next ip:

# go run /tmp/test.go -host docker-registry.s3.ceph.pretender.local
2015/02/07 15:52:32 ips: [192.168.2.62 192.168.2.71 192.168.2.72 192.168.2.74 192.168.2.75 192.168.2.30 192.168.2.60 192.168.2.61]
2015/02/07 15:52:32 dial tcp 192.168.2.62:80: connection refused
exit status 1

Happened with docker: moby/moby#10614

@mikioh mikioh changed the title net.Dial only tries the first address net: Dial only tries the first address Feb 7, 2015

@adg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2015

It would be surprising if net.Dial tried the other addresses when the first connection attempt fails. Telnet doesn't do that. A web browser won't do it either.

If you want to try multiple the other addresses after the first one fails, use net.LookupAddr and write the loop yourself.

@adg adg closed this Feb 7, 2015

@bobrik

This comment has been minimized.

Copy link
Author

commented Feb 7, 2015

@adg I must have got some really broken telnet:

# telnet docker-registry.s3.ceph.pretender.local 666
Trying 192.168.2.74...
telnet: connect to address 192.168.2.74: Connection refused
Trying 192.168.2.75...
telnet: connect to address 192.168.2.75: Connection refused
Trying 192.168.2.30...
telnet: connect to address 192.168.2.30: Connection refused
Trying 192.168.2.60...
telnet: connect to address 192.168.2.60: Connection refused
Trying 192.168.2.61...
telnet: connect to address 192.168.2.61: Connection refused
Trying 192.168.2.62...
telnet: connect to address 192.168.2.62: Connection refused
Trying 192.168.2.71...
telnet: connect to address 192.168.2.71: Connection refused
Trying 192.168.2.72...
telnet: connect to address 192.168.2.72: Connection refused
@bobrik

This comment has been minimized.

Copy link
Author

commented Feb 7, 2015

In /etc/hosts on my mac:

192.168.0.5     whatever.org
192.168.0.6     whatever.org

On machine that owns 192.168.0.5 and 192.168.0.6 after trying Safari:

17:09:09.433404 IP 192.168.0.3.62853 > 192.168.0.6.9999: Flags [S], seq 478414988, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887026686 ecr 0,sackOK,eol], length 0
17:09:09.434728 IP 192.168.0.3.62854 > 192.168.0.5.9999: Flags [S], seq 3248702626, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887026687 ecr 0,sackOK,eol], length 0

And after trying Firefox:

17:10:45.426919 IP 192.168.0.3.63025 > 192.168.0.6.9999: Flags [S], seq 349371721, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887112794 ecr 0,sackOK,eol], length 0
17:10:45.429567 IP 192.168.0.3.63026 > 192.168.0.5.9999: Flags [S], seq 2048252876, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887112796 ecr 0,sackOK,eol], length 0
17:10:45.431033 IP 192.168.0.3.63027 > 192.168.0.6.9999: Flags [S], seq 2956890338, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887112798 ecr 0,sackOK,eol], length 0
17:10:45.432150 IP 192.168.0.3.63028 > 192.168.0.5.9999: Flags [S], seq 2893786781, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 887112799 ecr 0,sackOK,eol], length 0

Chrome only generates one SYN, that's true.

Telnet on mac and on linux behaves the same. Wget tries different ips, curl does the same.

@adg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2015

Well colour me surprised!

Note that, at least in telnet's case, you can tell that it is not the system-level connect call (equivalent to net.Dial) that is doing the retries, because it prints "Trying $IP...". The retry is application-level.

IOW, net.Dial is too low-level to retry different addresses.

@bobrik

This comment has been minimized.

Copy link
Author

commented Feb 7, 2015

Looks like connect doesn't know anything about hostnames, while net.Dial does. Dealing with retries could be too low-level for syscall.Connect, but for net.Dial? It already knows about hostnames and ipv6/ipv4 priorities.

Are http, tls, syslog, rpc, smtp and textproto too low level for retries?

@adg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2015

Dealing with retries could be too low-level for syscall.Connect, but for net.Dial?

I think so. Go's standard library doesn't provide anything cross-platform that is lower level than net.Dial.

Are http, tls, syslog, rpc, smtp and textproto too low level for retries?

I don't think the http package, for example, is too high level for retries. But it depends what part. Analogous is following redirects in HTTP requests: the http.Client type transparently follows redirects by default, but the lower-level http.RoundTripper does not (and it would be inconvenient if it did, because it is the low-level interface for making HTTP requests).

I see net.Dial as closer to http.RoundTripper than http.Client, in the absence of anything less magical.

@adg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2015

One solution is to provide a net.Dialer implementation that retries when multiple A records are found.

@bobrik

This comment has been minimized.

Copy link
Author

commented Feb 7, 2015

Do you think it should be 3rd party library?

Let's update net.Dial() docs anyway, it's better to be explicit about retries. Not sure about http.DefaultClient, http.Get and friends.

@adg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2015

Maybe the net.Dialer type should have a new field to enable the behaviour you need. It already does a bunch of clever things. (cc @mikioh)

The Dial docs should have an additional sentence,, something like: "Using the Dial function is equivalent to calling the Dial method on the zero value of Dialer." That way the user knows to look at Dialer to see exactly what it should do.

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2015

This issue contains several use cases, and I'm still not sure what would be the best bet. For what it's worth, we need to address following issues and find out some compromises for them eventually.

  1. Effective way to handle multiple address families and application-level sessions/streams to transport connections mappings in the net/http package or external well-cooked packages
    1.1. For multiple address families, see #8847
    1.2. For multiple streams-connections mappings, see #8946, #8910, #8296, #6785, #4677 and other
  2. Effective way to handle multiple address families in the crypto/tls package or external well-cooked packages
    2.1. For SAN (subject alternative name) with multiple address families, not addressed yet
  3. Effective way to handle multiple address families and endpoints in the net package or external well-cooked packages
    3.1. For multiple address families, see #8453, #8455
    3.2. For multiple endpoints that are mapped to a single name (hostname, UTF-8 encoded netname, or mDNS/DNS registered name), not addressed yet
@bobrik

This comment has been minimized.

Copy link
Author

commented Feb 8, 2015

I originally tried to create an issue to address 3.2. Can you reopen this issue or create a separate one?

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2015

Please open a new issue with a concrete use case and/or an API proposal. As @adg mentioned above, adding something fancy stuff that can accept user's strategy/discipline for controlling multiple endpoints (because there's no one-size-fits-all way to this sort of issue, to a system on top the internet) to net.Dialer might be acceptable if people think it's worth having.

@golang golang locked and limited conversation to collaborators Jun 25, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.