Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Dialer{DualStack: true}.Dial("tcp", "DNS reg-name") mishandles and cancels the winner connection #15035

Closed
mikioh opened this issue Mar 31, 2016 · 7 comments

Comments

Projects
None yet
3 participants
@mikioh
Copy link
Contributor

commented Mar 31, 2016

The following snippet fails mostly with tip:

package main

import (
    "log"
    "net"
    "time"
)

func main() {
    c, err := (&net.Dialer{DualStack: true}).Dial("tcp", "golang.org:80")
    if err != nil {
        log.Fatal(err)
    }
    defer c.Close()
    time.Sleep(3 * time.Second)
    req := []byte("GET /robots.txt HTTP/1.0\r\nHost: golang.org\r\n\r\n")
    if _, err := c.Write(req); err != nil {
        log.Fatal(err)
    }
    log.Println("done")
}

# go run snippet.go
write tcp 192.168.86.22:32962->216.58.197.209:80: i/o timeout

A quick look: It seems like dialParallel in dial.go cancels all racers, not only losers but a winner connection. This issue occurs only on tip (Go 1.7 devel).

func dialParallel(ctx *dialContext, ...) {
        cancel := make(chan struct{})
(snip)
        for !primaryResult.done || !fallbackResult.done {
                select {
                 case res := <-results:
(snip)
                        // On success, cancel the other racer (if one exists.)
                        if res.error == nil && cancel != nil {
                                close(cancel)
                                cancel = nil
                        }

/CC @pmarks-net

@mikioh mikioh added this to the Go1.7 milestone Mar 31, 2016

@mikioh

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2016

@pmarks-net,

Sorry, I completely overlooked this case during the review.

@pmarks-net

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2016

This snippet fails in a similar way, using the go1.6 release:

package main

import (
        "log"
        "net"
        "time"
)

func main() {
        cancel := make(chan struct{})
        c, err := (&net.Dialer{Cancel: cancel}).Dial("tcp", "golang.org:80")
        if err != nil {
                log.Fatal(err)
        }
        close(cancel)
        defer c.Close()
        time.Sleep(3 * time.Second)
        req := []byte("GET /robots.txt HTTP/1.0\r\nHost: golang.org\r\n\r\n")
        if _, err := c.Write(req); err != nil {
                log.Fatal(err)
        }
        log.Println("done")
}

The Dialer documentation says:

// Cancel is an optional channel whose closure indicates that
// the dial should be canceled. 

I would interpret "the dial should be canceled" to mean that after Dial returns, closing the channel has no effect. Instead, it seems to travel back in time to kill the existing connection.

Was Cancel intended to apply just to the Dial operation, or to the entire lifetime of the connection?

@mikioh

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2016

I think both cases are slightly different. The former, which I mentioned, is just about how to handle the hidden channel for cancellation of multiple dial racers. The latter is for customer's dial cancellation.

Was Cancel intended to apply just to the Dial operation, or to the entire lifetime of the connection?

I believe the former.

@pmarks-net

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2016

I believe the former.

Then this is a pre-existing bug in dialTCP, which dialParallel is bringing to light. It's true that dialParallel cancels both racers, but only after one of them has successfully returned a connection.

So if cancelation were not retroactive, then dialParallel would work correctly.

@mikioh

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2016

So if cancelation were not retroactive

Yup, agreed.

@pmarks-net

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2016

I think the bug is here:
https://github.com/golang/go/blob/master/src/net/fd_unix.go#L108

The <-cancel and <-done cases are racing each other after connect returns. Instead, connect needs to wait for the goroutine to acknowledge the done event before returning.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 4, 2016

CL https://golang.org/cl/21497 mentions this issue.

@gopherbot gopherbot closed this in 869e576 Apr 5, 2016

@golang golang locked and limited conversation to collaborators Apr 5, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.