Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: acquireThread might block for a long time #63978

Closed
mateusz834 opened this issue Nov 7, 2023 · 7 comments
Closed

net: acquireThread might block for a long time #63978

mateusz834 opened this issue Nov 7, 2023 · 7 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mateusz834
Copy link
Member

mateusz834 commented Nov 7, 2023

This was previously reported to the go security team.

When the cgo resolver is being used, we call the acquireThread function to limit the amount of concurrent cgo calls running and also to limit the amount of threads that will be ever created (by the net package) by the runtime for cgo calls. The limit is currently capped to 500.

go/src/net/cgo_unix.go

Lines 148 to 151 in 9d836d4

func cgoLookupHostIP(network, name string) (addrs []IPAddr, err error) {
acquireThread()
defer releaseThread()

go/src/net/net.go

Lines 674 to 689 in 9d836d4

// Limit the number of concurrent cgo-using goroutines, because
// each will block an entire operating system thread. The usual culprit
// is resolving many DNS names in separate goroutines but the DNS
// server is not responding. Then the many lookups each use a different
// thread, and the system or the program runs out of threads.
var threadLimit chan struct{}
var threadOnce sync.Once
func acquireThread() {
threadOnce.Do(func() {
threadLimit = make(chan struct{}, concurrentThreadsLimit())
})
threadLimit <- struct{}{}
}

// concurrentThreadsLimit returns the number of threads we permit to
// run concurrently doing DNS lookups via cgo. A DNS lookup may use a
// file descriptor so we limit this to less than the number of
// permitted open files. On some systems, notably Darwin, if
// getaddrinfo is unable to open a file descriptor it simply returns
// EAI_NONAME rather than a useful error. Limiting the number of
// concurrent getaddrinfo calls to less than the permitted number of
// file descriptors makes that error less likely. We don't bother to
// apply the same limit to DNS lookups run directly from Go, because
// there we will return a meaningful "too many open files" error.
func concurrentThreadsLimit() int {
var rlim syscall.Rlimit
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rlim); err != nil {
return 500
}
r := rlim.Cur
if r > 500 {
r = 500
} else if r > 30 {
r -= 30
}
return int(r)
}

This might be problematic for services that connect to a user-provided hostnames.
The user-provided domain might be running on an "evil" nameserver that might trip timeouts (respond with SERVFAIL after ~5s). I have not tested this deeply, but I assume it would be possible to make the getaddrinfo block for 5-30s (assuming mostly default configuration in resolv.conf).

1-3 (Resolvers Count) * 2 (default attempts count) * 5s (default timeout).

Obviously with MITM it is simpler to trip timeouts.

We don't use the cgo resolver much these days on unix systems (except desktop linux, because of systemd nsswitch modules we don't support in the go resolver). On windows and darwin the cgo resolver is the default.

  • I think this should be at least made clear by the docs, currently we only note that:

    go/src/net/net.go

    Lines 49 to 50 in 9d836d4

    By default the pure Go resolver is used, because a blocked DNS request consumes
    only a goroutine, while a blocked C call consumes an operating system thread.

    This should probably mention that we have an internal cgo threads limit.

  • Also we probably should not force the use of cgo resolver for ".local" subdomains, so that use of the cgo resolver cannot be forced by an external user (consider a service that connects to a user-provided hostname). The go resolver should be able to resolve ".local" domains when there are no other nss modules in use than files and dns.

    go/src/net/conf.go

    Lines 341 to 347 in 9d836d4

    if canUseCgo && stringsHasSuffixFold(hostname, ".local") {
    // Per RFC 6762, the ".local" TLD is special. And
    // because Go's native resolver doesn't do mDNS or
    // similar local resolution mechanisms, assume that
    // libc might (via Avahi, etc) and use cgo.
    return hostLookupCgo, dnsConf
    }

  • Support context cancellation in acquireThread, so that when the thread limit is reached we don't call cgo stuff when the context is already cancelled (don't queue bunch of cgo calls when context is done).

I will send CLs for this.

CC @ianlancetaylor @golang/security @rolandshoemaker

@mateusz834 mateusz834 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 7, 2023
@mateusz834 mateusz834 added this to the Backlog milestone Nov 7, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/539360 mentions this issue: net: support context cancellation in acquireThread

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/540555 mentions this issue: net: don't force cgo resolver for .local subdomain queries

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/539361 mentions this issue: net: mention the cgo thread limit in package docs

@mateusz834 mateusz834 self-assigned this Nov 7, 2023
@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 7, 2023
@mknyszek
Copy link
Contributor

@bcmills Can you provide more context on why you added the compiler/runtime label? In triage I'm not sure we're seeing how this should be addressed in the compiler and/or runtime. Thanks.

@bcmills
Copy link
Contributor

bcmills commented Nov 15, 2023

Huh — I don't remember. 😅

I think I misinterpreted acquireThread as a thing that acquires actual threads from the runtime and/or poller, but in practice it appears to only acquire semaphore tokens limiting implicitly-aquired threads. Sorry for the triage noise!

@bcmills bcmills removed the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 15, 2023
gopherbot pushed a commit that referenced this issue Feb 20, 2024
Updates #63978

Change-Id: I39a5c812b4f604baf4ca5ffcff52b8dc17d4990d
GitHub-Last-Rev: 4ab6e26
GitHub-Pull-Request: #63990
Reviewed-on: https://go-review.googlesource.com/c/go/+/539361
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
gopherbot pushed a commit that referenced this issue Feb 21, 2024
acquireThread is already waiting on a channel, so
it can be easily wired up to support context cancellation.
This change will make sure that contexts that are
cancelled at the acquireThread stage (when the limit of
threads is reached) do not queue unnecessarily and cause
an unnecessary cgo call that will be soon aborted by
the doBlockingWithCtx function.

Updates #63978

Change-Id: I8ae4debd51995637567d8f51c6f1ed60f23d6c0c
GitHub-Last-Rev: 4189b9f
GitHub-Pull-Request: #63985
Reviewed-on: https://go-review.googlesource.com/c/go/+/539360
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Feb 26, 2024
The cgo resolver sends DNS queries for .local subdomain
lookups, just as we do in the go resolver.
We don't need to fallback to the cgo resolver for this
domains when nsswitch.conf uses only file and dns modules.

This has a benefit that we select a consistent resolver,
that is only based on the system configuration, regardless
of the queried domain.

Updates #63978

Change-Id: I9166103adb94d7ab52992925f413f361130e7c52
GitHub-Last-Rev: e2bc587
GitHub-Pull-Request: #63986
Reviewed-on: https://go-review.googlesource.com/c/go/+/540555
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@rolandshoemaker
Copy link
Member

@mateusz834 was there anything left to do here? I think all of the proposed changes ended up landing.

@mateusz834 mateusz834 modified the milestones: Backlog, Go1.23 May 29, 2024
@mateusz834
Copy link
Member Author

@rolandshoemaker right, closing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants