Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move go-libp2p-autonat to p2p/host/autonat #1273

Merged
merged 162 commits into from
Jan 2, 2022
Merged

move go-libp2p-autonat to p2p/host/autonat #1273

merged 162 commits into from
Jan 2, 2022

Conversation

marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 20, 2021

There's no reason autonat should live in a separate repo.

This is also a prerequisite for moving the swarm into this repo, as autonat depends on go-libp2p-swarm/testing.

@vyzo
Copy link
Contributor

vyzo commented Dec 20, 2021

do it!

@marten-seemann
Copy link
Contributor Author

Looks like tests are flaky. Let's fix in the autonat repo and then move it here.

@marten-seemann
Copy link
Contributor Author

The problem is not that the test are flaky on their own - they're only flaky because go test runs tests in packages in parallel. I could imagine that the tricks we play for the holepunching test mess with the public / private detection here.

@vyzo
Copy link
Contributor

vyzo commented Dec 20, 2021

sounds plausible; any wsy to put constraints on what runs in parallel?

@mvdan
Copy link
Contributor

mvdan commented Dec 20, 2021

I do think we can blame at least one test for being flaky. At least it's flaky when it runs in parallel to itself, even under separate binaries and processes. Enter the new ./p2p/host/autonat package, or the old root package, with Go 1.17.5, and run:

go test -c -o pkg.test && stress ./pkg.test -test.run=TestAutoNATPrivate

In both the old and new repos, I get a failure rate of just under 5%. Low enough that the flake probably didn't cause issues in the smaller module, perhaps.

$ go test -c -o pkg.test && stress ./pkg.test -test.run=TestAutoNATPrivate
go: downloading github.com/libp2p/go-libp2p-core v0.12.0
5s: 32 runs so far, 0 failures

/tmp/go-stress-20211220T193854-3125905374
--- FAIL: TestAutoNATPrivate (2.01s)
    autonat_test.go:118: unexpected NAT status: 0
FAIL


ERROR: exit status 1


/tmp/go-stress-20211220T193854-1717313164
--- FAIL: TestAutoNATPrivate (2.01s)
    autonat_test.go:118: unexpected NAT status: 0
FAIL


ERROR: exit status 1

10s: 64 runs so far, 2 failures (3.12%)

I'm not sure if the test is simply flakey by design, or if it starts to flake when run in parallel with other tests in other processes - perhaps it's using the network in some way that clashes globally, such as listening on ports or sending broadcast messages.

In any case, I don't think we can blame go test ./... for this one - it's just that, the more tests you run in parallel, the higher the chances that flakes will show up :)

@mvdan
Copy link
Contributor

mvdan commented Dec 20, 2021

I should have mentioned that the tool in question is https://pkg.go.dev/golang.org/x/tools/cmd/stress. It's just a small tool to run many parallel invocations of a command, and find failures. One of its main intended use cases is to uncover flakes.

If you struggle to reproduce, note that you can give extra flags to stress to increase the parallelism and much more.

@marten-seemann
Copy link
Contributor Author

@mvdan Thank you for helping debug this.
Turns out that this was an actual race condition (only in the test code though), which is highly timing-dependent: libp2p/go-libp2p-autonat#120.
I suppose the higher CPU load caused by running all packages in parallel caused this to be triggered more reliably here than in go-libp2p-autonat.

I will update this PR once this fix has been merged.

p2p/host/autonat/test/go.mod Outdated Show resolved Hide resolved
@marten-seemann marten-seemann merged commit 767629f into master Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet