server: fix data race in RaftCluster #1272

rleungx · 2018-10-15T07:43:23Z

What problem does this PR solve?

When the leader drops which could be caused by some reasons (e.g. cannot read from disk), it will elect a new leader which will call createRaftCluster to reassign cachedCluster, coordinator and running in RaftCluster. At that time, if we read RaftCluster without lock, the data race may happens. This PR closes #1270.

What is changed and how it works?

This PR adds some locks when reading from cachedCluster and coordinator in order to prevent reading and writing RaftCluster at the same time.

Check List

Tests

Unit test
Integration test

Related changes

Need to be included in the release notes

disksing · 2018-10-15T07:53:05Z

Would you explain more details of what's wrong before and how does this PR fix it?

disksing · 2018-10-15T07:53:34Z

Do we need to backport it to 2.1 and 2.0 branch?

rleungx · 2018-10-15T08:51:00Z

@disksing I‘m not sure how much impact it will cause, since it happens rarely.

nolouch · 2018-10-16T03:03:35Z

server/cluster_worker.go

@@ -27,6 +27,8 @@ import (

 // HandleRegionHeartbeat processes RegionInfo reports from client.
 func (c *RaftCluster) HandleRegionHeartbeat(region *core.RegionInfo) error {
+	c.RLock()


this lock is in the hot path, can we bench it or remove it?

disksing · 2018-10-23T09:19:37Z

@nolouch @rleungx Any updates?

rleungx · 2018-10-26T08:45:40Z

@disksing After running a benchmark, it doesn't have much impact on performance.

rleungx requested review from nolouch and disksing October 15, 2018 07:43

nolouch reviewed Oct 16, 2018

View reviewed changes

disksing approved these changes Oct 16, 2018

View reviewed changes

rleungx force-pushed the fix-raft-cluster-race branch from 2d2e946 to 69e8056 Compare October 26, 2018 07:11

nolouch approved these changes Oct 26, 2018

View reviewed changes

fix data race in RaftCluster

960ec67

rleungx force-pushed the fix-raft-cluster-race branch from 69e8056 to 960ec67 Compare October 26, 2018 08:40

nolouch merged commit 58773a9 into tikv:master Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: fix data race in RaftCluster #1272

server: fix data race in RaftCluster #1272

rleungx commented Oct 15, 2018 •

edited

Loading

disksing commented Oct 15, 2018

disksing commented Oct 15, 2018

rleungx commented Oct 15, 2018

nolouch Oct 16, 2018

disksing commented Oct 23, 2018

rleungx commented Oct 26, 2018

server: fix data race in RaftCluster #1272

server: fix data race in RaftCluster #1272

Conversation

rleungx commented Oct 15, 2018 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

disksing commented Oct 15, 2018

disksing commented Oct 15, 2018

rleungx commented Oct 15, 2018

nolouch Oct 16, 2018

Choose a reason for hiding this comment

disksing commented Oct 23, 2018

rleungx commented Oct 26, 2018

rleungx commented Oct 15, 2018 •

edited

Loading