net: deadlock in lookupIP upon error #24178
Comments
I can actually see 3 writes to the
Do you have a small example to demonstrate the problem? My hope is to use the code as part of the change to fix this issue.
Yes, this will work. But that might break in the future, if someone adds more code that fails and forget to add write to Would you like to send this fix yourself? This https://golang.org/doc/contribute.html is how to contribute. You could also send a Pull Request. Thank you Alex |
@alexbrainman I clarified the problem and added demo code that shows this leak & eventual lock. The problem occurs when calling this function with a cancelled context and invalid domain. |
@Kleissner nice test, thank you. I adjusted your test to run faster and crash instead of hang: package main
import (
"context"
"fmt"
"net"
"time"
)
const testDomain = "ty.laiwu.gov.cn"
func main() {
const google = "www.google.com"
_, err := net.LookupHost(google)
if err != nil {
panic(err)
}
fmt.Println("Start DNS Test, making lookups with context cancelled to a domain that will fail, causing leaks in lookupIP and eventually deadlock")
ctx, cancel := context.WithCancel(context.Background())
cancel()
for n := 0; n < 1000; n++ {
address := getIP(ctx, testDomain)
fmt.Printf("%d: %s\n", n, address)
time.Sleep(time.Millisecond * 1)
}
_, err = net.LookupHost(google)
if err != nil {
panic(err)
}
fmt.Println("End DNS Test")
}
func getIP(ctx context.Context, Domain string) (addr string) {
addrs, err := net.DefaultResolver.LookupHost(ctx, Domain)
if err != nil {
return err.Error()
}
if len(addrs) > 0 {
return addrs[0]
}
return "None"
}
Sure, have a go, hopefully with a test. Whenever you can. If you cannot I will fix this myself when I have time. Just let me know. Thank you. Alex |
Can you @alexbrainman come up with a fix? Unfortunately I don't have enough time to write a proper fix :( |
Sure thing. I will put it on my TODO list, which is very long at this moment.
That is quite OK. Someone else will fix this - we have beginning of a test, which is the hardest part. Alex |
Change https://golang.org/cl/111718 mentions this issue: |
Latest version go1.10 windows/amd64
tl;dr: The function LookupHost leaks Go routines/handles and eventually completely deadlocks when you query an invalid domain with context cancelled
There is an obvious deadlock in the function net.lookupIP (file lookup_windows.go), this code: https://github.com/golang/go/blob/master/src/net/lookup_windows.go#L79
Here is the problem (commented also inline below in function):
While the function will return, the created Go routine deadlocks forever and will never release the thread that it acquired with acquireThread. Eventually if you make enough (500) IP lookups that fail, the function will lock forever and not return.
Potential solutions:
Code to demonstrate this bug
First: place this in lookup_windows.go:117 (right before the "ch <- ret" statement) so you can see the ongoing leaks:
Then you'll see the leaks in this special case. NOTES: This is Windows specific! Also your DNS provider/cache may brick this demo code; with the below domain it is expected that GetAddrInfoW fails.
Output, you will see that net.threadLimit will eventually be full and then the function will lock forever:
The text was updated successfully, but these errors were encountered: