run tests in parallel if we can open enough files #450

mvdan · 2021-09-20T15:08:26Z

(see commit message)

A Linux box tends to default to a soft limit of 1024 open files. This means that simply marking all tests as parallel will quickly result in "socket: too many open files" errors. Unfortunately, running all tests sequentially is also too slow, especially since most tests sleep for at least a second or two. As a middle ground, run tests in parallel if the limit is high enough. See the added code and inline documentation. When running "go test" with a small open file limit, the tests will simply run sequentially. Otherwise, they will run in parallel given GOMAXPROCS. With a high enough "ulimit -Sn" on my Linux laptop, this brings "go test" down from ~220s to ~36s.

vyzo · 2021-09-20T15:23:07Z

pubsub_test.go

+func TestMain(m *testing.M) {
+	// For all tests to work reliably in parallel,
+	// we require being able to open many files at a time.
+	// Many environments have lower limits,
+	// such as Linux's "ulimit -n" soft limit defaulting to 1024.
+	// On a machine with 16 CPU threads, 8k seems to be enough.
+	const wantFileLimit = 8 << 10
+	files := make([]*os.File, 0, wantFileLimit)
+	for i := 0; i < wantFileLimit; i++ {
+		file, err := os.Open("pubsub_test.go")
+		if err != nil {
+			if i == 0 {
+				panic(err) // the file doesn't exist?
+			}
+			fmt.Fprintf(os.Stderr, "Skipping parallel runs of tests; open file limit %d is below %d.\n",
+				i, wantFileLimit)
+			fmt.Fprintf(os.Stderr, "On Linux, consider running: ulimit -Sn %d\n",
+				wantFileLimit*2)
+			canRunInParallel = false
+			break
+		}
+		files = append(files, file)
+	}
+	for _, file := range files {
+		file.Close()
+	}
+	os.Exit(m.Run())
+}


this is UNSPEAKABLY ugly.

The correct way to do this is to have an init function in the test, which tries to set the file descriptor limit to whatever it needs and sets the canRunInParallel variable accordingly.

I'm not sure I understand. We could do this kind of check for every test, but that feels very repetitive, and also far too specific. Are we going to measure exactly how many FDs each tests consumes?

The limits also don't apply to each test in isolation. It's roughly "number of FDs used at a time per test on average" multiplied by GOMAXPROCS. So I think a global approximation is good enough overall. Right now CI uses 2k, and I locally found 8k to be plenty with 16 CPU threads, so that seems fine to me, at least.

I also thought about doing the equivalent of ulimit -Sn directly, but note that that's not portable, so it would require code behind // +build linux that would not work on other systems. It also seems a bit weird for tests to lift the soft limit on their own; it feels like something the user should be in control of, like what the CI config does.

vyzo · 2021-09-20T15:23:21Z

pubsub_test.go

+	os.Exit(m.Run())
+}
+
+var canRunInParallel = true


should default to false.

The checking code always runs, does it make a difference? :)

vyzo · 2021-09-20T16:32:30Z

look, this is a gross hack. Just use a test init function to set the variable, and it is much more preferable to do the ulimit thing. If it is platform dependent there are probably ways for each platform that we care, and there ought to be some library abstracting that.

…

On Mon, Sep 20, 2021, 18:28 Daniel Martí ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pubsub_test.go <#450 (comment)> : > +func TestMain(m *testing.M) { + // For all tests to work reliably in parallel, + // we require being able to open many files at a time. + // Many environments have lower limits, + // such as Linux's "ulimit -n" soft limit defaulting to 1024. + // On a machine with 16 CPU threads, 8k seems to be enough. + const wantFileLimit = 8 << 10 + files := make([]*os.File, 0, wantFileLimit) + for i := 0; i < wantFileLimit; i++ { + file, err := os.Open("pubsub_test.go") + if err != nil { + if i == 0 { + panic(err) // the file doesn't exist? + } + fmt.Fprintf(os.Stderr, "Skipping parallel runs of tests; open file limit %d is below %d.\n", + i, wantFileLimit) + fmt.Fprintf(os.Stderr, "On Linux, consider running: ulimit -Sn %d\n", + wantFileLimit*2) + canRunInParallel = false + break + } + files = append(files, file) + } + for _, file := range files { + file.Close() + } + os.Exit(m.Run()) +} I'm not sure I understand. We could do this kind of check for every test, but that feels very repetitive, and also far too specific. Are we going to measure exactly how many FDs each tests consumes? The limits also don't apply to each test in isolation. It's roughly "number of FDs used at a time per test on average" multiplied by GOMAXPROCS. So I think a global approximation is good enough overall. Right now CI uses 2k, and I locally found 8k to be plenty with 16 CPU threads, so that seems fine to me, at least. I also thought about doing the equivalent of ulimit -Sn directly, but note that that's not portable, so it would require code behind // +build linux that would not work on other systems. It also seems a bit weird for tests to lift the soft limit on their own; it feels like something the user should be in control of, like what the CI config does. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#450 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAI4SUVGQXFIOONH2J5TB3UC5HILANCNFSM5EMFRWPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mvdan · 2021-09-21T13:38:57Z

Just use a test init function to set the variable

Sorry, but I'm still not understanding that approach. I think a global one is reasonable, as I tried to explain in #450 (comment).

there ought to be some library abstracting that.

I would agree, but it doesn't seem to exist. I'm more than happy to use one if you point me towards it. Otherwise, just making it work on Linux doesn't feel particularly better than what this PR has right now.

look, this is a gross hack.

All that said, I'll close the PR if you don't feel like this is worth your time. I honestly thought making the tests not take four minutes would be worthwhile for everyone. Any solution we come up with will be relatively hacky in one way or another, because of varying environments and defaults.

vyzo · 2021-09-23T11:04:41Z

look, i am not merging this code, the open files hack is horrible. If you are looking for an easy way to parallelize the tests, then use an env variable and trst it from an init function in the test.

…

On Tue, Sep 21, 2021, 16:39 Daniel Martí ***@***.***> wrote: Just use a test init function to set the variable Sorry, but I'm still not understanding that approach. I think a global one is reasonable, as I tried to explain in #450 (comment) <#450 (comment)> . there ought to be some library abstracting that. I would agree, but it doesn't seem to exist. I'm more than happy to use one if you point me towards it. Otherwise, just making it work on Linux doesn't feel particularly better than what this PR has right now. look, this is a gross hack. All that said, I'll close the PR if you don't feel like this is worth your time. I honestly thought making the tests not take four minutes would be worthwhile for everyone. Any solution we come up with will be relatively hacky in one way or another, because of varying environments and defaults. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#450 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAI4STZDGGDMEPEJPAVAADUDCDHZANCNFSM5EMFRWPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

vyzo requested changes Sep 20, 2021

View reviewed changes

mvdan closed this Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run tests in parallel if we can open enough files #450

run tests in parallel if we can open enough files #450

mvdan commented Sep 20, 2021

vyzo Sep 20, 2021

mvdan Sep 20, 2021

vyzo Sep 20, 2021

mvdan Sep 20, 2021

vyzo commented Sep 20, 2021 via email

mvdan commented Sep 21, 2021

vyzo commented Sep 23, 2021 via email

run tests in parallel if we can open enough files #450

run tests in parallel if we can open enough files #450

Conversation

mvdan commented Sep 20, 2021

vyzo Sep 20, 2021

Choose a reason for hiding this comment

mvdan Sep 20, 2021

Choose a reason for hiding this comment

vyzo Sep 20, 2021

Choose a reason for hiding this comment

mvdan Sep 20, 2021

Choose a reason for hiding this comment

vyzo commented Sep 20, 2021 via email

mvdan commented Sep 21, 2021

vyzo commented Sep 23, 2021 via email