Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestLookupHostCancel failure with "too many open files" on netbsd-arm-bsiegert builder #50537

Closed
bcmills opened this issue Jan 10, 2022 · 6 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jan 10, 2022

#!watchflakes
post <- pkg == "net" && test == "TestLookupHostCancel" && `socket: too many open files`
--- FAIL: TestLookupHostCancel (5.41s)
    lookup_test.go:962: lookup www.google.com on 192.168.87.1:53: dial udp 192.168.87.1:53: socket: too many open files
FAIL
FAIL	net	25.064s

It's not clear to me whether this is a platform bug in the native resolver (CC @bsiegert), or if the parallel net tests are actually leaving too many file descriptors open when this test executes (compare #46279; CC @bradfitz @ianlancetaylor).

greplogs --dashboard -md -l -e 'lookup www\.google\.com .*: too many open files'

2022-01-07T18:20:24-ade5488/netbsd-arm-bsiegert

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 10, 2022
@bcmills bcmills added this to the Backlog milestone Jan 10, 2022
@bcmills bcmills changed the title net: TestLookupHostCancel failures with "too many open files" on netbsd-arm-bsiegert builder net: TestLookupHostCancel failure with "too many open files" on netbsd-arm-bsiegert builder Jan 10, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Jan 10, 2022

Leaving on the backlog since there has only been one such failure so far.

If and when we see more of these, we might get a better idea of whether it is a platform bug or a test bug. (But note that more platforms will be skipping this test in order to resolve #50191.)

@bsiegert
Copy link
Contributor

It could also be the number of open files in the system. On NetBSD, there is a global maximum that can be changed via sysctl. I'll increase this number on the 32-bit ARM builder.

@ianlancetaylor
Copy link
Contributor

@bsiegert That error should report ENFILE ("too many open files in system") but the test is failing with EMFILE ("too many open files").

@ianlancetaylor
Copy link
Contributor

This isn't the native resolver; if it were the error wouldn't say socket. This is the Go resolver.

@ianlancetaylor
Copy link
Contributor

As far as I can tell from the code, the fact that the context is canceled before calling DefaultResolver.LookupHost should mean that it does not create any UDP sockets. The DNS lookup will eventually call sysDialer.dialSerial which checks the context in the loop. So the loop in the test shouldn't create any file descriptors. (Verified by running the test on Linux and counting the number of calls to the socket system call.) The error is happening for the single valid lookup, without a canceled context, that the test runs at the end.

If that is correct, then it's weird that the test takes 5 seconds to run.

And it means that the failure doesn't have much to do with TestLookupHostCancel as such.

But I may have made some mistake in this analysis.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 30, 2023
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Status: Done
Development

No branches or pull requests

4 participants