-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run tests in parallel if we can open enough files #450
Conversation
A Linux box tends to default to a soft limit of 1024 open files. This means that simply marking all tests as parallel will quickly result in "socket: too many open files" errors. Unfortunately, running all tests sequentially is also too slow, especially since most tests sleep for at least a second or two. As a middle ground, run tests in parallel if the limit is high enough. See the added code and inline documentation. When running "go test" with a small open file limit, the tests will simply run sequentially. Otherwise, they will run in parallel given GOMAXPROCS. With a high enough "ulimit -Sn" on my Linux laptop, this brings "go test" down from ~220s to ~36s.
func TestMain(m *testing.M) { | ||
// For all tests to work reliably in parallel, | ||
// we require being able to open many files at a time. | ||
// Many environments have lower limits, | ||
// such as Linux's "ulimit -n" soft limit defaulting to 1024. | ||
// On a machine with 16 CPU threads, 8k seems to be enough. | ||
const wantFileLimit = 8 << 10 | ||
files := make([]*os.File, 0, wantFileLimit) | ||
for i := 0; i < wantFileLimit; i++ { | ||
file, err := os.Open("pubsub_test.go") | ||
if err != nil { | ||
if i == 0 { | ||
panic(err) // the file doesn't exist? | ||
} | ||
fmt.Fprintf(os.Stderr, "Skipping parallel runs of tests; open file limit %d is below %d.\n", | ||
i, wantFileLimit) | ||
fmt.Fprintf(os.Stderr, "On Linux, consider running: ulimit -Sn %d\n", | ||
wantFileLimit*2) | ||
canRunInParallel = false | ||
break | ||
} | ||
files = append(files, file) | ||
} | ||
for _, file := range files { | ||
file.Close() | ||
} | ||
os.Exit(m.Run()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is UNSPEAKABLY ugly.
The correct way to do this is to have an init function in the test, which tries to set the file descriptor limit to whatever it needs and sets the canRunInParallel variable accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. We could do this kind of check for every test, but that feels very repetitive, and also far too specific. Are we going to measure exactly how many FDs each tests consumes?
The limits also don't apply to each test in isolation. It's roughly "number of FDs used at a time per test on average" multiplied by GOMAXPROCS. So I think a global approximation is good enough overall. Right now CI uses 2k, and I locally found 8k to be plenty with 16 CPU threads, so that seems fine to me, at least.
I also thought about doing the equivalent of ulimit -Sn
directly, but note that that's not portable, so it would require code behind // +build linux
that would not work on other systems. It also seems a bit weird for tests to lift the soft limit on their own; it feels like something the user should be in control of, like what the CI config does.
os.Exit(m.Run()) | ||
} | ||
|
||
var canRunInParallel = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should default to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checking code always runs, does it make a difference? :)
look, this is a gross hack.
Just use a test init function to set the variable, and it is much more
preferable to do the ulimit thing. If it is platform dependent there are
probably ways for each platform that we care, and there ought to be some
library abstracting that.
…On Mon, Sep 20, 2021, 18:28 Daniel Martí ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pubsub_test.go
<#450 (comment)>
:
> +func TestMain(m *testing.M) {
+ // For all tests to work reliably in parallel,
+ // we require being able to open many files at a time.
+ // Many environments have lower limits,
+ // such as Linux's "ulimit -n" soft limit defaulting to 1024.
+ // On a machine with 16 CPU threads, 8k seems to be enough.
+ const wantFileLimit = 8 << 10
+ files := make([]*os.File, 0, wantFileLimit)
+ for i := 0; i < wantFileLimit; i++ {
+ file, err := os.Open("pubsub_test.go")
+ if err != nil {
+ if i == 0 {
+ panic(err) // the file doesn't exist?
+ }
+ fmt.Fprintf(os.Stderr, "Skipping parallel runs of tests; open file limit %d is below %d.\n",
+ i, wantFileLimit)
+ fmt.Fprintf(os.Stderr, "On Linux, consider running: ulimit -Sn %d\n",
+ wantFileLimit*2)
+ canRunInParallel = false
+ break
+ }
+ files = append(files, file)
+ }
+ for _, file := range files {
+ file.Close()
+ }
+ os.Exit(m.Run())
+}
I'm not sure I understand. We could do this kind of check for every test,
but that feels very repetitive, and also far too specific. Are we going to
measure exactly how many FDs each tests consumes?
The limits also don't apply to each test in isolation. It's roughly
"number of FDs used at a time per test on average" multiplied by
GOMAXPROCS. So I think a global approximation is good enough overall. Right
now CI uses 2k, and I locally found 8k to be plenty with 16 CPU threads, so
that seems fine to me, at least.
I also thought about doing the equivalent of ulimit -Sn directly, but
note that that's not portable, so it would require code behind // +build
linux that would not work on other systems. It also seems a bit weird for
tests to lift the soft limit on their own; it feels like something the user
should be in control of, like what the CI config does.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#450 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4SUVGQXFIOONH2J5TB3UC5HILANCNFSM5EMFRWPA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Sorry, but I'm still not understanding that approach. I think a global one is reasonable, as I tried to explain in #450 (comment).
I would agree, but it doesn't seem to exist. I'm more than happy to use one if you point me towards it. Otherwise, just making it work on Linux doesn't feel particularly better than what this PR has right now.
All that said, I'll close the PR if you don't feel like this is worth your time. I honestly thought making the tests not take four minutes would be worthwhile for everyone. Any solution we come up with will be relatively hacky in one way or another, because of varying environments and defaults. |
look, i am not merging this code, the open files hack is horrible.
If you are looking for an easy way to parallelize the tests, then use an
env variable and trst it from an init function in the test.
…On Tue, Sep 21, 2021, 16:39 Daniel Martí ***@***.***> wrote:
Just use a test init function to set the variable
Sorry, but I'm still not understanding that approach. I think a global one
is reasonable, as I tried to explain in #450 (comment)
<#450 (comment)>
.
there ought to be some library abstracting that.
I would agree, but it doesn't seem to exist. I'm more than happy to use
one if you point me towards it. Otherwise, just making it work on Linux
doesn't feel particularly better than what this PR has right now.
look, this is a gross hack.
All that said, I'll close the PR if you don't feel like this is worth your
time. I honestly thought making the tests not take four minutes would be
worthwhile for everyone. Any solution we come up with will be relatively
hacky in one way or another, because of varying environments and defaults.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#450 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4STZDGGDMEPEJPAVAADUDCDHZANCNFSM5EMFRWPA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
(see commit message)