-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TestSnapshotV3RestoreMultiMemberAdd flakes (leaks) #12870
Merged
ptabor
merged 1 commit into
etcd-io:master
from
ptabor:20210416-fix-flake-TestSnapshotV3RestoreMultiMemberAdd-master
Apr 16, 2021
Merged
Fix TestSnapshotV3RestoreMultiMemberAdd flakes (leaks) #12870
ptabor
merged 1 commit into
etcd-io:master
from
ptabor:20210416-fix-flake-TestSnapshotV3RestoreMultiMemberAdd-master
Apr 16, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gyuho
approved these changes
Apr 16, 2021
- most important: unix's socket transport should not keep idle connections. For top-level Transport we close them using: https://github.com/etcd-io/etcd/blob/f3c518025e284d3d070a63d5969c69176f247b33/server/etcdserver/api/rafthttp/transport.go#L226 but currently we don't have access to close them witing the nest (unix) transport. Short idle deadline is good enough. - Use dialContext (instead of dial) to make sure context is passed down the stack - Make sure Context is cancelled as soon as the operation is done in pipeline - nit: use dedicated method to yeld goroutines. Tested with: ``` d=$(date +"%Y%m%d_%H%M") (cd tests && go test --timeout=60m ./integration/snapshot -run TestSnapshotV3RestoreMultiMemberAdd -v --count=180 2>&1 | tee log_${d}.log) ``` There were transports & cmux leaked: ``` leak.go:118: Test appears to have leaked a Transport: internal/poll.runtime_pollWait(0x7f6c5c3784c8, 0x72, 0xffffffffffffffff) /usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55 internal/poll.(*pollDesc).wait(0xc003296298, 0x72, 0x0, 0x18, 0xffffffffffffffff) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5 net.(*netFD).Read(0xc003296280, 0xc0031f60a8, 0x18, 0x18, 0x18, 0xc0009056e2, 0x203000) /usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f net.(*conn).Read(0xc000010258, 0xc0031f60a8, 0x18, 0x18, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/net/net.go:183 +0x91 github.com/soheilhy/cmux.(*bufferedReader).Read(0xc0003d24e0, 0xc0031f60a8, 0x18, 0x18, 0xc0003d24d0, 0xc0009056e2, 0xc000278400) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/buffer.go:53 +0x12d github.com/soheilhy/cmux.hasHTTP2Preface(0x1367e20, 0xc0003d24e0, 0x7f6c5c699f40) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/matchers.go:195 +0x8a github.com/soheilhy/cmux.matchersToMatchWriters.func1(0x7f6c5c699f40, 0xc000010258, 0x1367e20, 0xc0003d24e0, 0xc000010258) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:128 +0x39 github.com/soheilhy/cmux.(*cMux).serve(0xc003228690, 0x138c410, 0xc000010258, 0xc00327f740, 0xc0059ba860) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:192 +0x1e7 created by github.com/soheilhy/cmux.(*cMux).Serve /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:179 +0x191 internal/poll.runtime_pollWait(0x7f6c5c60f3f0, 0x72, 0xffffffffffffffff) /usr/lib/google-golang/src/runtime/netpoll.go:222 +0x55 internal/poll.(*pollDesc).wait(0xc000d53018, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /usr/lib/google-golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/internal/poll/fd_unix.go:166 +0x1d5 net.(*netFD).Read(0xc000d53000, 0xc000cfd000, 0x1000, 0x1000, 0x3, 0x3, 0x1000000000001) /usr/lib/google-golang/src/net/fd_posix.go:55 +0x4f net.(*conn).Read(0xc00031a570, 0xc000cfd000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/google-golang/src/net/net.go:183 +0x91 net/http.(*persistConn).Read(0xc00093b320, 0xc000cfd000, 0x1000, 0x1000, 0x577750, 0x60, 0x0) /usr/lib/google-golang/src/net/http/transport.go:1933 +0x77 bufio.(*Reader).fill(0xc005702fc0) /usr/lib/google-golang/src/bufio/bufio.go:101 +0x108 bufio.(*Reader).Peek(0xc005702fc0, 0x1, 0xc00077c660, 0xc003b082a0, 0xc000d08de0, 0x5ae586, 0x11dd6c0) /usr/lib/google-golang/src/bufio/bufio.go:139 +0x4f net/http.(*persistConn).readLoop(0xc00093b320) /usr/lib/google-golang/src/net/http/transport.go:2094 +0x1a8 created by net/http.(*Transport).dialConn /usr/lib/google-golang/src/net/http/transport.go:1754 +0xdaa net/http.(*persistConn).writeLoop(0xc00093b320) /usr/lib/google-golang/src/net/http/transport.go:2393 +0xf7 created by net/http.(*Transport).dialConn /usr/lib/google-golang/src/net/http/transport.go:1755 +0xdcf sync.runtime_Semacquire(0xc0059ba868) /usr/lib/google-golang/src/runtime/sema.go:56 +0x45 sync.(*WaitGroup).Wait(0xc0059ba860) /usr/lib/google-golang/src/sync/waitgroup.go:130 +0x65 github.com/soheilhy/cmux.(*cMux).Serve.func1(0xc003228690, 0xc0059ba860) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:158 +0x56 github.com/soheilhy/cmux.(*cMux).Serve(0xc003228690, 0x13698c0, 0xc00377a0f0) /home/ptab/private/golang/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:173 +0x115 go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func1(0xc0007cc360, 0x122b75f) /home/ptab/corp/etcd/server/embed/etcd.go:518 +0x2b9 go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func3(0xc00036d080, 0xc0059330a0) /home/ptab/corp/etcd/server/embed/etcd.go:549 +0x182 created by go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers /home/ptab/corp/etcd/server/embed/etcd.go:543 +0x73a --- FAIL: TestSnapshotV3RestoreMultiMemberAdd (17.74s) ```
ptabor
force-pushed
the
20210416-fix-flake-TestSnapshotV3RestoreMultiMemberAdd-master
branch
from
April 16, 2021 18:17
e1d3344
to
17b9823
Compare
Thank you for review ! |
ptabor
deleted the
20210416-fix-flake-TestSnapshotV3RestoreMultiMemberAdd-master
branch
April 16, 2021 19:49
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
most important: unix's socket transport should not keep idle
connections. For top-level Transport we close them using:
etcd/server/etcdserver/api/rafthttp/transport.go
Line 226 in f3c5180
but currently we don't have access to close them witing the nest (unix) transport. Short idle deadline is good enough.
Use dialContext (instead of dial) to make sure context is passed down the stack
Make sure Context is cancelled as soon as the operation is done in pipeline
nit: use dedicated method to yeld goroutines.
Tested with:
There were transports & cmux leaked: