-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data races and enable -race in CI #634
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These tests cause data races because they modify data that is passed to a goroutine. These are not data races that exist in serf itself, so skipping them for now to get tests running with the race detector. Also fix one test that introduces a data race.
The memberState stored in failedMembers is a pointer, so the lock must be held until after the fields are read from the struct. ``` WARNING: DATA RACE Write at 0x00c0003facd8 by goroutine 15: github.com/hashicorp/serf/serf.(*Serf).handleNodeJoin() /home/daniel/pers/code/serf/serf/serf.go:962 +0x3d8 github.com/hashicorp/serf/serf.(*eventDelegate).NotifyJoin() /home/daniel/pers/code/serf/serf/event_delegate.go:12 +0x55 github.com/hashicorp/memberlist.(*Memberlist).aliveNode() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/state.go:1123 +0x957 github.com/hashicorp/memberlist.(*Memberlist).handleAlive() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/net.go:677 +0x58e github.com/hashicorp/memberlist.(*Memberlist).packetHandler() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/net.go:452 +0x236 Previous read at 0x00c0003facd8 by goroutine 37: github.com/hashicorp/serf/serf.(*Serf).reconnect() /home/daniel/pers/code/serf/serf/serf.go:1645 +0x2c9 github.com/hashicorp/serf/serf.(*Serf).handleReconnect() /home/daniel/pers/code/serf/serf/serf.go:1571 +0x57 Goroutine 15 (running) created at: github.com/hashicorp/memberlist.newMemberlist() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/memberlist.go:218 +0xa29 github.com/hashicorp/memberlist.Create() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/memberlist.go:228 +0x3c github.com/hashicorp/serf/serf.Create() /home/daniel/pers/code/serf/serf/serf.go:410 +0x1494 github.com/hashicorp/serf/serf.TestSerf_Coordinates() /home/daniel/pers/code/serf/serf/serf_test.go:2664 +0x3c4 testing.tRunner() /usr/lib/go/src/testing/testing.go:1193 +0x202 Goroutine 37 (running) created at: github.com/hashicorp/serf/serf.Create() /home/daniel/pers/code/serf/serf/serf.go:423 +0x1704 github.com/hashicorp/serf/serf.TestSerf_Coordinates() /home/daniel/pers/code/serf/serf/serf_test.go:2664 +0x3c4 testing.tRunner() /usr/lib/go/src/testing/testing.go:1193 +0x202 ```
Serf.state was being accessed without acquiring the stateLock, which caused a data race when any other goroutine changes the state. `handleNodeLeaveIntent` acquires the memberLock, so we need to be careful of a deadlock here. From what I can tell there is no place where we acquire both locks at this time. Out of caution I have moved the stateLock acquire to before the memberLock acquire, but maybe a better fix would be to check the state after the memberLock is acquired? ``` WARNING: DATA RACE Write at 0x00c0003b4138 by goroutine 60: github.com/hashicorp/serf/serf.(*Serf).Leave() /home/daniel/pers/code/serf/serf/serf.go:750 +0x3e9 github.com/hashicorp/serf/serf.TestSerf_Coordinates() /home/daniel/pers/code/serf/serf/serf_test.go:2732 +0xc1a testing.tRunner() /usr/lib/go/src/testing/testing.go:1193 +0x202 Previous read at 0x00c0003b4138 by goroutine 98: github.com/hashicorp/serf/serf.(*Serf).handleNodeLeaveIntent() /home/daniel/pers/code/serf/serf/serf.go:1111 +0xa39 github.com/hashicorp/serf/serf.(*delegate).MergeRemoteState() /home/daniel/pers/code/serf/serf/delegate.go:255 +0x464 github.com/hashicorp/memberlist.(*Memberlist).mergeRemoteState() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/net.go:1178 +0x779 github.com/hashicorp/memberlist.(*Memberlist).pushPullNode() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/state.go:653 +0x244 github.com/hashicorp/memberlist.(*Memberlist).Join() /home/daniel/go/pkg/mod/github.com/hashicorp/memberlist@v0.2.4/memberlist.go:264 +0x2ad github.com/hashicorp/serf/serf.(*Serf).reconnect() /home/daniel/pers/code/serf/serf/serf.go:1654 +0x4c4 github.com/hashicorp/serf/serf.(*Serf).handleReconnect() /home/daniel/pers/code/serf/serf/serf.go:1571 +0x57 Goroutine 60 (running) created at: testing.(*T).Run() /usr/lib/go/src/testing/testing.go:1238 +0x5d7 testing.runTests.func1() /usr/lib/go/src/testing/testing.go:1511 +0xa6 testing.tRunner() /usr/lib/go/src/testing/testing.go:1193 +0x202 testing.runTests() /usr/lib/go/src/testing/testing.go:1509 +0x612 testing.(*M).Run() /usr/lib/go/src/testing/testing.go:1417 +0x3b3 main.main() _testmain.go:263 +0x236 Goroutine 98 (running) created at: github.com/hashicorp/serf/serf.Create() /home/daniel/pers/code/serf/serf/serf.go:423 +0x1704 github.com/hashicorp/serf/serf.TestSerf_Coordinates() /home/daniel/pers/code/serf/serf/serf_test.go:2664 +0x3c4 testing.tRunner() /usr/lib/go/src/testing/testing.go:1193 +0x202 ```
The i.stop bool was read from different goroutines without a lock.
because we are skipping a few tests when the race detector is enabled.
56b493f
to
67ab3fe
Compare
There was a data race because a pointer to a struct was re-used in another Serf instance Using t.Log as an io.Writer for logging is problematic, because goroutines can outlive the test, which causes a panic. Tests should log to stderr, not t.Log
rboyer
approved these changes
Jun 9, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This was referenced Jul 20, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Branched from #633
Fixes #622
Best viewed by individual commit. Full details in the commit messages.
I fixed one of the data races in the
cmd
packages, but there are more. So for now we only run the race detector for theserf
package.