Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go/internal/test: test concurrency limited by sequential printing #61233

Open
howardjohn opened this issue Jul 7, 2023 · 6 comments
Open
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Milestone

Comments

@howardjohn
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.20 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Ran go test -debug-trace=trace -debug-actiongraph=actiongraph ./...

What did you expect to see?

Tests are run with maximum concurrency

What did you see instead?

test run tasks are started but not doing any work, blocking other work from progressing.


Example simple reproducer:

for i in a b c d; do
	mkdir -p $i
	cat <<EOF > "$i/${i}_test.go"
package $i

import (
	"testing"
	"time"
)

func TestSleep(t *testing.T) {
	if "$i" == "b" {
		time.Sleep(time.Second)
	}
}
EOF
done
for i in e f g; do
	mkdir -p $i
	cat <<EOF > "$i/${i}.go"
package $i

EOF
done
GOMAXPROCS=2 go test -debug-trace=trace ./...

Basically this just makes a few packages, one with a slow test, and a few empty packages.

When we run this, I would expect to have one worker blocked on the b test (sleep 1), while the rest execute in the other worker and quickly complete.

Instead, we see this:
2023-07-07_11-06-03

This looks odd - the e package doesn't have any tests, so why is test run e taking 1.3s?

The reason seems to be related to https://go-review.git.corp.google.com/c/go/+/448357 (although I am not certain it worked different before that CL). One the runTestActor starts, it is immediately blocked on <-r.prev. This, in turn, means it is blocked until the previous test (d) finishes.

Ultimately, this blocks an entire worker doing nothing.

A real world example of this can be found here (warning: 170mb). You can see it occurring in a few places, most prominently around 300s we have almost all 16 workers blocked, most/all on "no test files" packages.

I think based on the way actions are pushed into the ready queue, any action that has zero remaining dependencies may be pushed onto the queue. In some of my testing, I saw the occasionally some of the very last packages in the test ordering would get scheduled first - blocking an entire worker for almost the entire process.


I believe this new logic is only need for -json mode. One option is to just enable it only if -json is enabled.

I tested this locally against the same repo as the "real world example" above.
With Go 1.20, the tests took 70s to execute. With the patch, they took 30s.

With the same patch, the simple reproducer looks as expected:
2023-07-07_11-15-34

Before trace After trace (warning: 150mb)


Another approach is to add each test run action as a dependency for the previous, to serialize the ordering. However, this makes things far worse, as this is serializing things much stricter -- the current serialization just ensures the starts are serialized, while this approach would make sure the entire test execution is as well - not what we want.

In my testing this made things worse.

A similar approach may be to make another action type test start, but I didn't test this out

@howardjohn
Copy link
Contributor Author

I opened https://go-review.git.corp.google.com/c/go/+/508515 with approach 1

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/508515 mentions this issue: cmd/go: only serialize test start events in -json mode

@bcmills bcmills added ToolSpeed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Jul 10, 2023
@bcmills bcmills added this to the Backlog milestone Jul 10, 2023
@bcmills bcmills changed the title internal/test: test concurrency limited by sequential printing cmd/go/internal/test: test concurrency limited by sequential printing Jul 10, 2023
@howardjohn
Copy link
Contributor Author

I know this isn't the most productive comment but its a bit frustrating to have found a regression causing tests to take 2x as long, and submit a fix, and not have any comment at all within 2 months.

Is there anyway to move this forward?

@ianlancetaylor
Copy link
Contributor

CC @bcmills @matloob

@bcmills
Copy link
Contributor

bcmills commented Sep 1, 2023

Sorry for the delay — I think I had mentally lumped this in with #60203, although it's clearly a separate issue. 😅

I've replied on the CL.

@evanj
Copy link
Contributor

evanj commented Oct 2, 2023

In case it is helpful: I think we have run into this with the tests for https://github.com/DataDog/datadog-agent (thanks to @hush-hush for debugging this). Going from Go 1.19 to Go 1.20 makes the tests about 20-30% slower (an extra ~17 seconds for a 60 second test run). We can help verify a fix once one is ready, although it looks like the conversation on https://go.dev/cl/508515 has stalled (last comment Sept 6th).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Projects
None yet
Development

No branches or pull requests

5 participants