Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestLookupHostCancel failure with "too many open files" on netbsd-arm-bsiegert builder #50537

Open
bcmills opened this issue Jan 10, 2022 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Jan 10, 2022

--- FAIL: TestLookupHostCancel (5.41s)
    lookup_test.go:962: lookup www.google.com on 192.168.87.1:53: dial udp 192.168.87.1:53: socket: too many open files
FAIL
FAIL	net	25.064s

It's not clear to me whether this is a platform bug in the native resolver (CC @bsiegert), or if the parallel net tests are actually leaving too many file descriptors open when this test executes (compare #46279; CC @bradfitz @ianlancetaylor).

greplogs --dashboard -md -l -e 'lookup www\.google\.com .*: too many open files'

2022-01-07T18:20:24-ade5488/netbsd-arm-bsiegert

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 10, 2022
@bcmills bcmills added this to the Backlog milestone Jan 10, 2022
@bcmills bcmills changed the title net: TestLookupHostCancel failures with "too many open files" on netbsd-arm-bsiegert builder net: TestLookupHostCancel failure with "too many open files" on netbsd-arm-bsiegert builder Jan 10, 2022
@bcmills
Copy link
Member Author

bcmills commented Jan 10, 2022

Leaving on the backlog since there has only been one such failure so far.

If and when we see more of these, we might get a better idea of whether it is a platform bug or a test bug. (But note that more platforms will be skipping this test in order to resolve #50191.)

@bsiegert
Copy link
Contributor

bsiegert commented Jan 10, 2022

It could also be the number of open files in the system. On NetBSD, there is a global maximum that can be changed via sysctl. I'll increase this number on the 32-bit ARM builder.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 10, 2022

@bsiegert That error should report ENFILE ("too many open files in system") but the test is failing with EMFILE ("too many open files").

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 10, 2022

This isn't the native resolver; if it were the error wouldn't say socket. This is the Go resolver.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 10, 2022

As far as I can tell from the code, the fact that the context is canceled before calling DefaultResolver.LookupHost should mean that it does not create any UDP sockets. The DNS lookup will eventually call sysDialer.dialSerial which checks the context in the loop. So the loop in the test shouldn't create any file descriptors. (Verified by running the test on Linux and counting the number of calls to the socket system call.) The error is happening for the single valid lookup, without a canceled context, that the test runs at the end.

If that is correct, then it's weird that the test takes 5 seconds to run.

And it means that the failure doesn't have much to do with TestLookupHostCancel as such.

But I may have made some mistake in this analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants